Lesson 2.4: Caching Mechanisms
The Memory Hierarchy
CPU L1 Cache (KB) ~1 ns
CPU L2 Cache (MB) ~10 ns
RAM (GB) ~100 ns
SSD (TB) ~100 µs (1000x slower than RAM)
HDD (TB) ~10 ms (100,000x slower than RAM)
Key insight: Databases use RAM as a cache to avoid disk I/O.
PostgreSQL Shared Buffers
The main database cache:
-- Check shared_buffers setting
SHOW shared_buffers; -- Default: 128MB (often too small)
-- Recommended: 25% of system RAM
-- Example: 16GB RAM → shared_buffers = 4GB
How it works:
- Query needs page (8KB block)
- Check shared_buffers (RAM)
- If found (cache hit): Return immediately
- If not (cache miss): Read from disk, cache in buffers
Monitor cache hit ratio:
SELECT
sum(heap_blks_read) as disk_reads,
sum(heap_blks_hit) as cache_hits,
sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) as cache_hit_ratio
FROM pg_statio_user_tables;
-- Goal: cache_hit_ratio > 0.99 (99% hits)
-- Below 0.95? Increase shared_buffers or optimize queries
OS Page Cache
PostgreSQL uses a two-level cache:
- Shared buffers (Postgres-managed)
- OS page cache (Linux/macOS manages automatically)
Total RAM for caching = shared_buffers + OS cache
Example:
- 16GB RAM system
- shared_buffers = 4GB
- OS cache ≈ 8GB (after OS overhead)
- Total cache: 12GB for database
Query Result Caching (Application Layer)
PostgreSQL doesn't cache query results by default. You implement it:
Redis caching pattern:
def get_user(user_id):
# Check Redis cache
cached = redis.get(f"user:{user_id}")
if cached:
return json.loads(cached)
# Cache miss: Query database
user = db.execute("SELECT * FROM users WHERE id = %s", user_id)
# Cache for 5 minutes
redis.setex(f"user:{user_id}", 300, json.dumps(user))
return user
When to cache:
- Expensive queries (joins, aggregations)
- Hot data (user profiles, popular products)
- Data that changes infrequently
When NOT to cache:
- Data that must be real-time (inventory, prices)
- Personalized data (too many cache keys)
- Large result sets (cache overhead)
Prepared Statements (Plan Caching)
-- Prepared statement (cached plan)
PREPARE get_user (bigint) AS
SELECT * FROM users WHERE id = $1;
-- Execute multiple times (reuses plan)
EXECUTE get_user(123);
EXECUTE get_user(456);
Benefits:
- Query planning happens once
- Faster execution (skip parse/plan steps)
- Protection against SQL injection
Most ORMs use prepared statements automatically.
Key Takeaways
- RAM is 1000x faster than SSD, so caching in memory is critical
- PostgreSQL shared_buffers should be ~25% of system RAM
- OS page cache provides additional caching beyond shared_buffers
- Monitor cache hit ratio (should be >99%)
- Application-layer caching (Redis) reduces database load for expensive queries
- Prepared statements cache query plans for repeated execution

