Bash Sort Lines (sort)
Organize and arrange text lines in order
📊 What is the sort Command?
The sort command arranges lines of text in alphabetical or numerical order. It helps organize data from files or command output, making information easier to read and analyze.
# Sort lines alphabetically
sort names.txt
Sort Options
Alphabetical Sort
Default sorting by letters
sort file.txt
Numerical Sort
Sort numbers correctly
sort -n numbers.txt
Reverse Order
Sort in descending order
sort -r file.txt
Unique Lines
Remove duplicate entries
sort -u file.txt
🔹 Basic Sorting
The sort command organizes lines of text from files or standard input into a specified order. By default, it performs a lexicographic (dictionary) sort, comparing entire lines character-by-character from left to right based on the system's locale settings. This is ideal for alphabetizing lists of words or strings. For instance, sort file.txt will rearrange all lines in file.txt in ascending order. This fundamental operation is the basis for more advanced sorting techniques and is widely used in shell scripting and data preparation to structure output before further analysis or reporting.
# Create a sample file
echo -e "banana\napple\ncherry\ndate" > fruits.txt
# Sort the file
sort fruits.txt
Output:
apple banana cherry date
🔹 Numerical Sorting
To sort data numerically, you must use the -n or --numeric-sort option with the sort command. Without this flag, sort treats numbers as plain text, leading to incorrect ordering like 10 appearing before 2 because '1' comes before '2' in character comparison. The -n flag instructs sort to interpret the initial numeric value of each line, ensuring proper ascending order (e.g., 2, 5, 10, 21). This is crucial for processing log files with line numbers, version numbers, or any dataset where numerical value, not text representation, determines the correct sequence.
# Create a file with numbers
echo -e "10\n2\n100\n25\n3" > numbers.txt
# Sort numerically
sort -n numbers.txt
# Sort in reverse numerical order
sort -nr numbers.txt
Output (sort -n):
2 3 10 25 100
🔹 Reverse Sorting
The -r or --reverse option inverts the output order of the sort command. When combined with basic or numeric sorting, it produces a descending sequence. For alphabetical sorts, this means Z to A. For numeric sorts (using -n), it lists values from highest to lowest. This is exceptionally useful for generating top-N lists, such as finding the largest files in a directory (ls -l | sort -rnk5) or displaying the most recent entries in a timestamped log. It provides a quick way to prioritize data without requiring additional post-processing steps.
# Sort in reverse alphabetical order
sort -r fruits.txt
# Combine with numerical sort
sort -nr numbers.txt
Output (sort -r):
date cherry banana apple
🔹 Remove Duplicates
The -u or --unique flag performs a dual function: it sorts input and then removes any duplicate adjacent lines. This is a powerful data-cleaning tool that eliminates redundancy in files like lists, configuration entries, or logs. For example, sort -u emails.txt will output a sorted list of unique email addresses. It is more efficient than piping sort to uniq in many cases, as it combines both operations. However, it's important to note that sorting is required first, as -u only removes duplicates that are consecutive in the sorted output.
# Create file with duplicates
echo -e "apple\nbanana\napple\ncherry\nbanana" > duplicates.txt
# Sort and remove duplicates
sort -u duplicates.txt
Output:
apple banana cherry
🔹 Sort by Column
To sort structured data like CSV or tab-delimited files by a specific field, use the -k (key) option. This option defines a sort key based on column position. You must often pair it with -t to specify the field delimiter. For instance, sort -t',' -k2,2n data.csv sorts a comma-separated file numerically by its second column. The syntax -k2,2 means the key starts and ends at column 2. You can specify complex ranges like -k2,4 for sorting on columns 2 through 4. This precision is essential for managing multi-field records in system administration and data science workflows.
# Create CSV file
echo -e "John,25\nAlice,30\nBob,20" > ages.csv
# Sort by second column (age)
sort -t',' -k2 -n ages.csv
Output:
Bob,20 John,25 Alice,30
🔹 Common Sort Options
The sort utility offers a robust suite of options that can be combined to address complex data organization challenges. Beyond basic and numeric sorting, flags like -M sort by month names, -h sort human-readable numbers (e.g., 2K, 1G), and -V perform natural version sorting. The -o option allows output to a file (even the input file). Options for controlling the sort algorithm, like --parallel for speed or --stable for a stable sort, provide fine-tuned performance. Mastering these combinations allows for efficient one-line solutions for preparing reports, analyzing logs, and structuring datasets.
Useful Options:
- -n - Sort numerically instead of alphabetically
- -r - Reverse the sort order (descending)
- -u - Remove duplicate lines from output
- -k - Sort by specific column number
- -t - Specify field delimiter character
- -f - Ignore case when sorting
- -o - Write output to a file
# Case-insensitive sort
sort -f mixed_case.txt
# Save sorted output to file
sort names.txt -o sorted_names.txt
# Combine multiple options
sort -nur data.txt