Parsing and Processing Files

Learn to parse structured data formats like CSV, log files, and configuration files using bash tools.

Parsing CSV Files

#!/bin/bash
# data.csv: name,age,city

while IFS=',' read -r name age city; do
    echo "Name: $name, Age: $age, City: $city"
done < data.csv

# Skip header row
tail -n +2 data.csv | while IFS=',' read -r name age city; do
    echo "$name is $age years old"
done

Handling CSV with Quotes

CSV files often have quoted fields:

#!/bin/bash
# For complex CSV, use awk
awk -F',' '{
    gsub(/"/, "", $1)    # Remove quotes
    print $1, $3
}' data.csv

# Or use csvtool/csvkit if available
# csvtool col 1,3 data.csv

Exercise: Parse CSV

Extract fields from CSV:

Parsing Log Files

#!/bin/bash
# Apache log format: IP - - [date] "request" status size

while read -r ip _ _ date request status size; do
    # Remove brackets from date
    date="${date#[}"

    # Count errors (status 4xx/5xx)
    if [[ "$status" =~ ^[45] ]]; then
        echo "Error from $ip: $status"
    fi
done < access.log

# Extract just IPs with awk
awk '{print $1}' access.log | sort | uniq -c | sort -rn

Using awk for Parsing

#!/bin/bash

# Print specific columns
awk '{print $1, $3}' file.txt

# With custom delimiter
awk -F':' '{print $1}' /etc/passwd

# Conditional printing
awk '$3 > 100 {print $1, $3}' data.txt

# Sum a column
awk '{sum += $2} END {print "Total:", sum}' numbers.txt

# Format output
awk '{printf "%-10s %5d\n", $1, $2}' data.txt

Parsing Configuration Files

#!/bin/bash
# config.ini: key=value format

declare -A CONFIG

while IFS='=' read -r key value; do
    # Skip comments and empty lines
    [[ "$key" =~ ^#.*$ || -z "$key" ]] && continue

    # Trim whitespace
    key="${key// /}"
    value="${value// /}"

    CONFIG["$key"]="$value"
done < config.ini

echo "Host: ${CONFIG[host]}"
echo "Port: ${CONFIG[port]}"

Using grep for Extraction

#!/bin/bash

# Extract lines matching pattern
grep "ERROR" logfile.txt

# Extract with context
grep -B2 -A2 "ERROR" logfile.txt

# Extract just the match
grep -o 'error=[0-9]*' logfile.txt

# Count occurrences
grep -c "pattern" file.txt

Exercise: Count Patterns

Count occurrences in a file:

Using sed for Extraction

#!/bin/bash

# Extract between patterns
sed -n '/START/,/END/p' file.txt

# Extract and transform
sed -n 's/.*name="\([^"]*\)".*/\1/p' file.xml

# Extract specific line
sed -n '5p' file.txt

# Extract range
sed -n '5,10p' file.txt

Parsing JSON (with jq)

#!/bin/bash
# If jq is available

JSON='{"name":"Alice","age":30}'

# Extract field
echo "$JSON" | jq -r '.name'    # Alice

# Without jq - basic extraction
echo "$JSON" | grep -o '"name":"[^"]*"' | cut -d'"' -f4

Parsing Key-Value Pairs

#!/bin/bash
# Format: key: value

while IFS=': ' read -r key value; do
    case "$key" in
        Name)    NAME="$value" ;;
        Email)   EMAIL="$value" ;;
        Age)     AGE="$value" ;;
    esac
done < record.txt

echo "Found: $NAME ($EMAIL), age $AGE"

Processing Multiple Files

#!/bin/bash

for file in *.log; do
    echo "=== $file ==="

    # Count errors per file
    errors=$(grep -c "ERROR" "$file")
    echo "Errors: $errors"

    # Extract unique IPs
    awk '{print $1}' "$file" | sort -u | wc -l | xargs echo "Unique IPs:"
done

Practical Example: Log Analysis

#!/bin/bash

LOG_FILE="${1:-/var/log/syslog}"

echo "=== Log Analysis ==="

# Total lines
TOTAL=$(wc -l < "$LOG_FILE")
echo "Total entries: $TOTAL"

# Errors
ERRORS=$(grep -ci "error\|fail" "$LOG_FILE")
echo "Error entries: $ERRORS"

# By hour (if timestamps like "HH:MM")
echo -e "\nEntries by hour:"
grep -o '[0-2][0-9]:[0-5][0-9]' "$LOG_FILE" | \
    cut -d: -f1 | sort | uniq -c | sort -rn | head -5

Key Takeaways

Use IFS with read to split delimited data
awk excels at column-based processing
grep extracts matching lines or patterns
sed transforms and extracts with regex
Skip headers with tail -n +2
Parse key-value configs into associative arrays
Combine tools with pipes for complex parsing
Always handle edge cases (empty lines, comments)

#!/bin/bash # data.csv: name,age,city while IFS=',' read -r name age city; do echo "Name: $name, Age: $age, City: $city" done < data.csv # Skip header row tail -n +2 data.csv | while IFS=',' read -r name age city; do echo "$name is $age years old" done

Parsing Log Files

#!/bin/bash # Apache log format: IP - - [date] "request" status size while read -r ip _ _ date request status size; do # Remove brackets from date date="${date#[}" # Count errors (status 4xx/5xx) if [[ "$status" =~ ^[45] ]]; then echo "Error from $ip: $status" fi done < access.log # Extract just IPs with awk awk '{print $1}' access.log | sort | uniq -c | sort -rn

Using awk for Parsing

#!/bin/bash # Print specific columns awk '{print $1, $3}' file.txt # With custom delimiter awk -F':' '{print $1}' /etc/passwd # Conditional printing awk '$3 > 100 {print $1, $3}' data.txt # Sum a column awk '{sum += $2} END {print "Total:", sum}' numbers.txt # Format output awk '{printf "%-10s %5d\n", $1, $2}' data.txt

Parsing Configuration Files

#!/bin/bash # config.ini: key=value format declare -A CONFIG while IFS='=' read -r key value; do # Skip comments and empty lines [[ "$key" =~ ^#.*$ || -z "$key" ]] && continue # Trim whitespace key="${key// /}" value="${value// /}" CONFIG["$key"]="$value" done < config.ini echo "Host: ${CONFIG[host]}" echo "Port: ${CONFIG[port]}"

#!/bin/bash # Extract lines matching pattern grep "ERROR" logfile.txt # Extract with context grep -B2 -A2 "ERROR" logfile.txt # Extract just the match grep -o 'error=[0-9]*' logfile.txt # Count occurrences grep -c "pattern" file.txt

#!/bin/bash # Extract between patterns sed -n '/START/,/END/p' file.txt # Extract and transform sed -n 's/.*name="$[^"]*$".*/\1/p' file.xml # Extract specific line sed -n '5p' file.txt # Extract range sed -n '5,10p' file.txt

#!/bin/bash # If jq is available JSON='{"name":"Alice","age":30}' # Extract field echo "$JSON" | jq -r '.name' # Alice # Without jq - basic extraction echo "$JSON" | grep -o '"name":"[^"]*"' | cut -d'"' -f4

#!/bin/bash # Format: key: value while IFS=': ' read -r key value; do case "$key" in Name) NAME="$value" ;; Email) EMAIL="$value" ;; Age) AGE="$value" ;; esac done < record.txt echo "Found: $NAME ($EMAIL), age $AGE"

#!/bin/bash for file in *.log; do echo "=== $file ===" # Count errors per file errors=$(grep -c "ERROR" "$file") echo "Errors: $errors" # Extract unique IPs awk '{print $1}' "$file" | sort -u | wc -l | xargs echo "Unique IPs:" done

Practical Example: Log Analysis

#!/bin/bash LOG_FILE="${1:-/var/log/syslog}" echo "=== Log Analysis ===" # Total lines TOTAL=$(wc -l < "$LOG_FILE") echo "Total entries: $TOTAL" # Errors ERRORS=$(grep -ci "error\|fail" "$LOG_FILE") echo "Error entries: $ERRORS" # By hour (if timestamps like "HH:MM") echo -e "\nEntries by hour:" grep -o '[0-2][0-9]:[0-5][0-9]' "$LOG_FILE" | \ cut -d: -f1 | sort | uniq -c | sort -rn | head -5

Key Takeaways

Use IFS with read to split delimited data

awk excels at column-based processing

grep extracts matching lines or patterns

sed transforms and extracts with regex

Skip headers with tail -n +2

Parse key-value configs into associative arrays

Combine tools with pipes for complex parsing

Always handle edge cases (empty lines, comments)

Parsing and Processing Files

Parsing CSV Files

Handling CSV with Quotes

Exercise: Parse CSV

Parsing Log Files

Using awk for Parsing

Parsing Configuration Files

Using grep for Extraction

Exercise: Count Patterns

Using sed for Extraction

Parsing JSON (with jq)

Parsing Key-Value Pairs

Processing Multiple Files

Practical Example: Log Analysis

Key Takeaways

Discussion

Parsing and Processing Files

Parsing CSV Files

Handling CSV with Quotes

Exercise: Parse CSV

Parsing Log Files

Using awk for Parsing

Parsing Configuration Files

Using grep for Extraction

Exercise: Count Patterns

Using sed for Extraction

Parsing JSON (with jq)

Parsing Key-Value Pairs

Processing Multiple Files

Practical Example: Log Analysis

Key Takeaways

Discussion