Parsing and Processing Files
Learn to parse structured data formats like CSV, log files, and configuration files using bash tools.
Parsing CSV Files
#!/bin/bash
# data.csv: name,age,city
while IFS=',' read -r name age city; do
echo "Name: $name, Age: $age, City: $city"
done < data.csv
# Skip header row
tail -n +2 data.csv | while IFS=',' read -r name age city; do
echo "$name is $age years old"
done
Handling CSV with Quotes
CSV files often have quoted fields:
#!/bin/bash
# For complex CSV, use awk
awk -F',' '{
gsub(/"/, "", $1) # Remove quotes
print $1, $3
}' data.csv
# Or use csvtool/csvkit if available
# csvtool col 1,3 data.csv
Exercise: Parse CSV
Extract fields from CSV:
Parsing Log Files
#!/bin/bash
# Apache log format: IP - - [date] "request" status size
while read -r ip _ _ date request status size; do
# Remove brackets from date
date="${date#[}"
# Count errors (status 4xx/5xx)
if [[ "$status" =~ ^[45] ]]; then
echo "Error from $ip: $status"
fi
done < access.log
# Extract just IPs with awk
awk '{print $1}' access.log | sort | uniq -c | sort -rn
Using awk for Parsing
#!/bin/bash
# Print specific columns
awk '{print $1, $3}' file.txt
# With custom delimiter
awk -F':' '{print $1}' /etc/passwd
# Conditional printing
awk '$3 > 100 {print $1, $3}' data.txt
# Sum a column
awk '{sum += $2} END {print "Total:", sum}' numbers.txt
# Format output
awk '{printf "%-10s %5d\n", $1, $2}' data.txt
Parsing Configuration Files
#!/bin/bash
# config.ini: key=value format
declare -A CONFIG
while IFS='=' read -r key value; do
# Skip comments and empty lines
[[ "$key" =~ ^#.*$ || -z "$key" ]] && continue
# Trim whitespace
key="${key// /}"
value="${value// /}"
CONFIG["$key"]="$value"
done < config.ini
echo "Host: ${CONFIG[host]}"
echo "Port: ${CONFIG[port]}"
Using grep for Extraction
#!/bin/bash
# Extract lines matching pattern
grep "ERROR" logfile.txt
# Extract with context
grep -B2 -A2 "ERROR" logfile.txt
# Extract just the match
grep -o 'error=[0-9]*' logfile.txt
# Count occurrences
grep -c "pattern" file.txt
Exercise: Count Patterns
Count occurrences in a file:
Using sed for Extraction
#!/bin/bash
# Extract between patterns
sed -n '/START/,/END/p' file.txt
# Extract and transform
sed -n 's/.*name="\([^"]*\)".*/\1/p' file.xml
# Extract specific line
sed -n '5p' file.txt
# Extract range
sed -n '5,10p' file.txt
Parsing JSON (with jq)
#!/bin/bash
# If jq is available
JSON='{"name":"Alice","age":30}'
# Extract field
echo "$JSON" | jq -r '.name' # Alice
# Without jq - basic extraction
echo "$JSON" | grep -o '"name":"[^"]*"' | cut -d'"' -f4
Parsing Key-Value Pairs
#!/bin/bash
# Format: key: value
while IFS=': ' read -r key value; do
case "$key" in
Name) NAME="$value" ;;
Email) EMAIL="$value" ;;
Age) AGE="$value" ;;
esac
done < record.txt
echo "Found: $NAME ($EMAIL), age $AGE"
Processing Multiple Files
#!/bin/bash
for file in *.log; do
echo "=== $file ==="
# Count errors per file
errors=$(grep -c "ERROR" "$file")
echo "Errors: $errors"
# Extract unique IPs
awk '{print $1}' "$file" | sort -u | wc -l | xargs echo "Unique IPs:"
done
Practical Example: Log Analysis
#!/bin/bash
LOG_FILE="${1:-/var/log/syslog}"
echo "=== Log Analysis ==="
# Total lines
TOTAL=$(wc -l < "$LOG_FILE")
echo "Total entries: $TOTAL"
# Errors
ERRORS=$(grep -ci "error\|fail" "$LOG_FILE")
echo "Error entries: $ERRORS"
# By hour (if timestamps like "HH:MM")
echo -e "\nEntries by hour:"
grep -o '[0-2][0-9]:[0-5][0-9]' "$LOG_FILE" | \
cut -d: -f1 | sort | uniq -c | sort -rn | head -5
Key Takeaways
- Use
IFSwithreadto split delimited data awkexcels at column-based processinggrepextracts matching lines or patternssedtransforms and extracts with regex- Skip headers with
tail -n +2 - Parse key-value configs into associative arrays
- Combine tools with pipes for complex parsing
- Always handle edge cases (empty lines, comments)

