Bash Sort Lines (sort)

Organize and arrange text lines in order

📊 What is the sort Command?

The sort command arranges lines of text in alphabetical or numerical order. It helps organize data from files or command output, making information easier to read and analyze.


# Sort lines alphabetically
sort names.txt
                                    

Sort Options

🔤

Alphabetical Sort

Default sorting by letters

sort file.txt
🔢

Numerical Sort

Sort numbers correctly

sort -n numbers.txt
🔄

Reverse Order

Sort in descending order

sort -r file.txt
✨

Unique Lines

Remove duplicate entries

sort -u file.txt

🔹 Basic Sorting

The sort command organizes lines of text from files or standard input into a specified order. By default, it performs a lexicographic (dictionary) sort, comparing entire lines character-by-character from left to right based on the system's locale settings. This is ideal for alphabetizing lists of words or strings. For instance, sort file.txt will rearrange all lines in file.txt in ascending order. This fundamental operation is the basis for more advanced sorting techniques and is widely used in shell scripting and data preparation to structure output before further analysis or reporting.

# Create a sample file
echo -e "banana\napple\ncherry\ndate" > fruits.txt

# Sort the file
sort fruits.txt

Output:

apple
banana
cherry
date

🔹 Numerical Sorting

To sort data numerically, you must use the -n or --numeric-sort option with the sort command. Without this flag, sort treats numbers as plain text, leading to incorrect ordering like 10 appearing before 2 because '1' comes before '2' in character comparison. The -n flag instructs sort to interpret the initial numeric value of each line, ensuring proper ascending order (e.g., 2, 5, 10, 21). This is crucial for processing log files with line numbers, version numbers, or any dataset where numerical value, not text representation, determines the correct sequence.

# Create a file with numbers
echo -e "10\n2\n100\n25\n3" > numbers.txt

# Sort numerically
sort -n numbers.txt

# Sort in reverse numerical order
sort -nr numbers.txt

Output (sort -n):

2
3
10
25
100

🔹 Reverse Sorting

The -r or --reverse option inverts the output order of the sort command. When combined with basic or numeric sorting, it produces a descending sequence. For alphabetical sorts, this means Z to A. For numeric sorts (using -n), it lists values from highest to lowest. This is exceptionally useful for generating top-N lists, such as finding the largest files in a directory (ls -l | sort -rnk5) or displaying the most recent entries in a timestamped log. It provides a quick way to prioritize data without requiring additional post-processing steps.

# Sort in reverse alphabetical order
sort -r fruits.txt

# Combine with numerical sort
sort -nr numbers.txt

Output (sort -r):

date
cherry
banana
apple

🔹 Remove Duplicates

The -u or --unique flag performs a dual function: it sorts input and then removes any duplicate adjacent lines. This is a powerful data-cleaning tool that eliminates redundancy in files like lists, configuration entries, or logs. For example, sort -u emails.txt will output a sorted list of unique email addresses. It is more efficient than piping sort to uniq in many cases, as it combines both operations. However, it's important to note that sorting is required first, as -u only removes duplicates that are consecutive in the sorted output.

# Create file with duplicates
echo -e "apple\nbanana\napple\ncherry\nbanana" > duplicates.txt

# Sort and remove duplicates
sort -u duplicates.txt

Output:

apple
banana
cherry

🔹 Sort by Column

To sort structured data like CSV or tab-delimited files by a specific field, use the -k (key) option. This option defines a sort key based on column position. You must often pair it with -t to specify the field delimiter. For instance, sort -t',' -k2,2n data.csv sorts a comma-separated file numerically by its second column. The syntax -k2,2 means the key starts and ends at column 2. You can specify complex ranges like -k2,4 for sorting on columns 2 through 4. This precision is essential for managing multi-field records in system administration and data science workflows.

# Create CSV file
echo -e "John,25\nAlice,30\nBob,20" > ages.csv

# Sort by second column (age)
sort -t',' -k2 -n ages.csv

Output:

Bob,20
John,25
Alice,30

🔹 Common Sort Options

The sort utility offers a robust suite of options that can be combined to address complex data organization challenges. Beyond basic and numeric sorting, flags like -M sort by month names, -h sort human-readable numbers (e.g., 2K, 1G), and -V perform natural version sorting. The -o option allows output to a file (even the input file). Options for controlling the sort algorithm, like --parallel for speed or --stable for a stable sort, provide fine-tuned performance. Mastering these combinations allows for efficient one-line solutions for preparing reports, analyzing logs, and structuring datasets.

Useful Options:

  • -n - Sort numerically instead of alphabetically
  • -r - Reverse the sort order (descending)
  • -u - Remove duplicate lines from output
  • -k - Sort by specific column number
  • -t - Specify field delimiter character
  • -f - Ignore case when sorting
  • -o - Write output to a file
# Case-insensitive sort
sort -f mixed_case.txt

# Save sorted output to file
sort names.txt -o sorted_names.txt

# Combine multiple options
sort -nur data.txt

🧠 Test Your Knowledge

Which flag sorts numbers correctly?