Lesson 2.4: Caching Mechanisms

The Memory Hierarchy

CPU L1 Cache (KB)        ~1 ns
CPU L2 Cache (MB)        ~10 ns
RAM (GB)                 ~100 ns
SSD (TB)                 ~100 µs (1000x slower than RAM)
HDD (TB)                 ~10 ms (100,000x slower than RAM)

Key insight: Databases use RAM as a cache to avoid disk I/O.

PostgreSQL Shared Buffers

The main database cache:

-- Check shared_buffers setting
SHOW shared_buffers;  -- Default: 128MB (often too small)

-- Recommended: 25% of system RAM
-- Example: 16GB RAM → shared_buffers = 4GB

How it works:

Query needs page (8KB block)
Check shared_buffers (RAM)
If found (cache hit): Return immediately
If not (cache miss): Read from disk, cache in buffers

Monitor cache hit ratio:

SELECT
    sum(heap_blks_read) as disk_reads,
    sum(heap_blks_hit) as cache_hits,
    sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) as cache_hit_ratio
FROM pg_statio_user_tables;

-- Goal: cache_hit_ratio > 0.99 (99% hits)
-- Below 0.95? Increase shared_buffers or optimize queries

OS Page Cache

PostgreSQL uses a two-level cache:

Shared buffers (Postgres-managed)
OS page cache (Linux/macOS manages automatically)

Total RAM for caching = shared_buffers + OS cache

Example:

16GB RAM system
shared_buffers = 4GB
OS cache ≈ 8GB (after OS overhead)
Total cache: 12GB for database

Query Result Caching (Application Layer)

PostgreSQL doesn't cache query results by default. You implement it:

Redis caching pattern:

def get_user(user_id):
    # Check Redis cache
    cached = redis.get(f"user:{user_id}")
    if cached:
        return json.loads(cached)

    # Cache miss: Query database
    user = db.execute("SELECT * FROM users WHERE id = %s", user_id)

    # Cache for 5 minutes
    redis.setex(f"user:{user_id}", 300, json.dumps(user))

    return user

When to cache:

Expensive queries (joins, aggregations)
Hot data (user profiles, popular products)
Data that changes infrequently

When NOT to cache:

Data that must be real-time (inventory, prices)
Personalized data (too many cache keys)
Large result sets (cache overhead)

Prepared Statements (Plan Caching)

-- Prepared statement (cached plan)
PREPARE get_user (bigint) AS
    SELECT * FROM users WHERE id = $1;

-- Execute multiple times (reuses plan)
EXECUTE get_user(123);
EXECUTE get_user(456);

Benefits:

Query planning happens once
Faster execution (skip parse/plan steps)
Protection against SQL injection

Most ORMs use prepared statements automatically.

Key Takeaways

RAM is 1000x faster than SSD, so caching in memory is critical
PostgreSQL shared_buffers should be ~25% of system RAM
OS page cache provides additional caching beyond shared_buffers
Monitor cache hit ratio (should be >99%)
Application-layer caching (Redis) reduces database load for expensive queries
Prepared statements cache query plans for repeated execution