Bash Pattern Scan (awk)

📊 What is awk?

awk is a powerful text processing tool that scans files line by line and processes data in columns. It's perfect for extracting specific fields, performing calculations, and formatting output from structured text data.


# Print first column from a file
awk '{print $1}' data.txt

# Print specific columns
awk '{print $1, $3}' file.txt

awk Basics

📋

Print Columns

Extract specific fields

awk '{print $1}' file

🔍

Pattern Match

Process matching lines

awk '/pattern/ {print}' file

🧮

Calculations

Perform math operations

awk '{sum+=$1} END {print sum}'

⚙️

Custom Delimiter

Specify field separator

awk -F',' '{print $1}' file

🔹 Printing Columns

AWK's default action is to split each input record into fields separated by whitespace, allowing intuitive column access. The variable $1 refers to the first field, $2 to the second, and $0 holds the entire line. This model is perfect for working with tabular data from logs, command output (like ps or df), and configuration files. For example, awk '{print $NF}' file.txt prints the last field of each line, demonstrating how AWK simplifies data extraction tasks that would be cumbersome with manual parsing.

# Print first column
awk '{print $1}' users.txt

# Print multiple columns
awk '{print $1, $3}' data.txt

# Print with custom text
awk '{print "Name:", $1, "Age:", $2}' people.txt

# Print entire line
awk '{print $0}' file.txt

Input (users.txt):

John 25 Engineer
Sarah 30 Designer
Mike 28 Developer

Output (awk '{print $1, $3}'):

John Engineer
Sarah Designer
Mike Developer

🔹 Pattern Matching

AWK can selectively process lines by specifying a pattern before the action block, integrating filtering and extraction. Patterns can be regular expressions, string matches, or even conditions based on field values. For example, awk '$3 > 100 {print $1, $3}' data.txt only processes lines where the third field exceeds 100. This reduces the need for preliminary grep or sed commands, leading to cleaner, more efficient one-liners and scripts that perform complex data selection in a single pass.

# Print lines containing "error"
awk '/error/ {print}' app.log

# Print first column of matching lines
awk '/Engineer/ {print $1}' users.txt

# Match and format output
awk '/^John/ {print "Found:", $1, $2}' data.txt

# Multiple patterns
awk '/error|warning/ {print $0}' system.log

Output:

John
Found: John 25

🔹 Using Custom Delimiters

The -F flag in AWK defines the input field separator, enabling parsing of non-whitespace delimited data. Common delimiters include commas for CSVs (-F','), colons for files like /etc/passwd (-F':'), or even multi-character separators. This flexibility is crucial for ETL (Extract, Transform, Load) processes, where data arrives in varied formats from different sources. Proper delimiter specification ensures fields are split correctly, forming the foundation for accurate subsequent analysis and reporting.

# Use comma as delimiter (CSV files)
awk -F',' '{print $1, $2}' data.csv

# Use colon as delimiter (like /etc/passwd)
awk -F':' '{print $1, $3}' /etc/passwd

# Use tab as delimiter
awk -F'\t' '{print $1}' data.tsv

# Multiple character delimiter
awk -F'::' '{print $1}' file.txt

Input (data.csv):

John,25,Engineer
Sarah,30,Designer

Output (awk -F',' '{print $1, $3}'):

John Engineer
Sarah Designer

🔹 Performing Calculations

AWK functions as a capable data analysis tool by supporting arithmetic operations and maintaining variables across lines. You can compute sums, averages, minimums, maximums, and custom aggregations. For instance, to calculate the total and average of the second column: awk '{sum+=$2} END {print "Total:", sum, "Average:", sum/NR}' file.txt. This capability eliminates the need to import data into external spreadsheet software for basic statistics, allowing quick insights directly from the command line or within shell scripts.

# Sum all values in first column
awk '{sum += $1} END {print "Total:", sum}' numbers.txt

# Calculate average
awk '{sum += $1; count++} END {print "Average:", sum/count}' data.txt

# Find maximum value
awk 'BEGIN {max=0} {if($1>max) max=$1} END {print "Max:", max}' nums.txt

# Multiple calculations
awk '{sum+=$2; count++} END {print "Total:", sum, "Count:", count}' sales.txt

Input (numbers.txt):

Output:

Total: 100
Average: 25

🔹 Conditional Processing

Conditional logic in AWK, using if, else if, and else, enables sophisticated data-driven processing flows. Conditions can compare fields, match regular expressions, or check line numbers. This allows for tasks like categorizing log entries by severity, filtering dataset rows based on multiple criteria, or applying different formatting rules to various data types. Such conditional processing makes AWK scripts powerful for cleaning, validating, and transforming raw data into structured, actionable information.

# Print if value is greater than 25
awk '{if($2 > 25) print $1, $2}' data.txt

# Print different messages based on value
awk '{if($2 >= 30) print $1, "Senior"; else print $1, "Junior"}' ages.txt

# Multiple conditions
awk '{if($2>20 && $2<30) print $0}' data.txt

# Count matching conditions
awk '{if($3=="Engineer") count++} END {print count}' users.txt

Output:

Sarah 30
Mike 28
Sarah Senior
Mike Junior

🔹 BEGIN and END Blocks

The BEGIN block executes once before any input is read, ideal for printing report headers, initializing counters, or setting variables. The END block runs once after all input has been processed, perfect for displaying grand totals, summaries, or final calculations. These blocks provide a clear program structure, separating initialization, main processing, and cleanup phases. This structure enhances script readability, maintainability, and is a hallmark of well-organized AWK programs for production data pipelines.

# Print header before processing
awk 'BEGIN {print "Name\tAge"} {print $1, $2}' data.txt

# Print footer after processing
awk '{print $0} END {print "--- End of File ---"}' file.txt

# Initialize variables and print summary
awk 'BEGIN {sum=0} {sum+=$1} END {print "Total:", sum}' numbers.txt

# Both BEGIN and END
awk 'BEGIN {print "Report"} {count++} END {print "Lines:", count}' file.txt

Output:

Name    Age
John 25
Sarah 30
--- End of File ---

🔹 Built-in Variables

AWK's built-in variables offer metadata and state information that adapts processing dynamically. NR (Number of Records) counts all lines processed so far. NF (Number of Fields) changes per line, useful for validating data consistency or accessing the last field ($NF). FILENAME provides the name of the current input file, enabling multi-file processing within a single script. Leveraging these variables allows for more robust scripts that handle edge cases and provide informative output, such as line numbers for error tracking.

Common Variables:

NR: Current line number
NF: Number of fields in current line
$NF: Last field in current line
FILENAME: Current filename
FS: Field separator (default: whitespace)
OFS: Output field separator

# Print line numbers
awk '{print NR, $0}' file.txt

# Print number of fields per line
awk '{print "Line", NR, "has", NF, "fields"}' data.txt

# Print last field
awk '{print $NF}' file.txt

# Print filename and line
awk '{print FILENAME, NR, $0}' *.txt

Output:

1 John 25 Engineer
2 Sarah 30 Designer
Line 1 has 3 fields
Line 2 has 3 fields

🔹 Formatting Output

AWK's printf statement provides fine-grained control over output layout, surpassing the simple print. It uses format specifiers like %s for strings, %d for integers, and %f for floats, with width and precision controls. For example, awk '{printf "| %-15s | %10.2f |\n", $1, $2}' creates a neatly aligned table. This is essential for generating reports, logs, or any output where consistent column alignment and professional presentation are required for readability or further automated processing.

# Format with specific width
awk '{printf "%-10s %5d\n", $1, $2}' data.txt

# Format numbers with decimals
awk '{printf "%.2f\n", $1}' numbers.txt

# Create aligned columns
awk '{printf "%-15s %-10s %5d\n", $1, $2, $3}' users.txt

# Format with tabs
awk '{printf "%s\t%s\t%s\n", $1, $2, $3}' file.txt

Output:

John           25
Sarah          30
Mike           28

🔹 Practical awk Examples

AWK excels in real-world scenarios like log analysis, data summarization, and format conversion. System administrators use it to parse Apache logs for top IP addresses, developers use it to analyze code metrics, and data engineers use it for quick pre-processing of datasets. An example command: awk '{req[$7]++} END {for (page in req) print req[page], page}' access.log | sort -rn | head -10 shows the ten most frequently requested URLs. These practical applications demonstrate AWK's role as a versatile and efficient text-processing Swiss Army knife.

# Calculate total sales from CSV
awk -F',' '{sum+=$3} END {print "Total Sales: $"sum}' sales.csv

# Extract email addresses
awk -F',' '{print $2}' contacts.csv

# Process log file and count errors
awk '/ERROR/ {count++} END {print "Errors:", count}' app.log

# Generate report with header and footer
awk 'BEGIN {print "User Report\n---"} {print $1, $2} END {print "---\nTotal:", NR}' users.txt

# Filter and format data
awk -F':' '$3 >= 1000 {printf "%-15s %5d\n", $1, $3}' /etc/passwd

📊 What is awk?

awk Basics

Print Columns

Pattern Match

Calculations

Custom Delimiter

🔹 Printing Columns

Input (users.txt):

Output (awk '{print $1, $3}'):

🔹 Pattern Matching

Output:

🔹 Using Custom Delimiters

Input (data.csv):

Output (awk -F',' '{print $1, $3}'):

🔹 Performing Calculations

Input (numbers.txt):

Output:

🔹 Conditional Processing

Output:

🔹 BEGIN and END Blocks

Output:

🔹 Built-in Variables

Common Variables:

Output:

🔹 Formatting Output

Output:

🔹 Practical awk Examples

🧠 Test Your Knowledge

What does $1 represent in awk?