Lesson 2.3: Query Execution, Planner, and Optimizer
The Query Pipeline
SQL Query
↓
[Parser] → Parse tree (syntax check)
↓
[Rewriter] → Apply views, rules
↓
[Planner/Optimizer] → Generate execution plans, choose best
↓
[Executor] → Actually run the query
↓
Results
The Query Planner
The planner generates multiple execution plans and estimates their cost:
-- Example query
SELECT u.name, COUNT(o.id) as order_count
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE u.created_at > '2024-01-01'
GROUP BY u.name;
Possible plans:
- Nested Loop Join: For each user, scan orders
- Hash Join: Build hash table of users, probe with orders
- Merge Join: Sort both tables, merge
Planner chooses based on:
- Table sizes (row counts)
- Available indexes
- Data distribution statistics
- Memory available (work_mem)
Reading EXPLAIN Output
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM users WHERE email = 'alice@example.com';
Without index:
Seq Scan on users (cost=0.00..18334.00 rows=1 width=64) (actual time=45.234..89.567 rows=1)
Filter: (email = 'alice@example.com')
Rows Removed by Filter: 999999
Buffers: shared read=8334
Interpretation:
Seq Scan: Full table scan (bad)cost=0.00..18334.00: Estimated work (arbitrary units)actual time=45.234..89.567: Real time in millisecondsrows=1: Found 1 row after scanning 1,000,000
With index:
Index Scan using users_email_idx on users (cost=0.42..8.44 rows=1 width=64) (actual time=0.023..0.024 rows=1)
Index Cond: (email = 'alice@example.com')
Buffers: shared hit=4
Interpretation:
Index Scan: Using index (good)cost=0.42..8.44: Much lower costactual time=0.023..0.024: 0.02ms vs 89ms (4000x faster)Buffers: shared hit=4: Read 4 pages from cache
Common Plan Nodes
Seq Scan (Sequential Scan)
- Reads entire table
- Bad for large tables, acceptable for small tables (under 100 rows)
Index Scan
- Uses index to find rows
- May read random heap pages (can be slow if many rows)
Index Only Scan
- All needed columns in index
- Doesn't touch heap (fastest)
Bitmap Index Scan
- Builds bitmap of matching rows from index
- Then reads heap pages in order (faster than random I/O)
Hash Join
- Builds hash table of smaller table
- Probes with larger table
- Fast for large joins
Nested Loop
- For each row in outer table, scan inner table
- Fast for small tables or when inner table has index
Statistics and ANALYZE
The planner relies on statistics to make decisions:
-- Update statistics
ANALYZE users;
-- View statistics
SELECT
schemaname, tablename,
n_live_tup as row_count,
n_dead_tup as dead_rows,
last_autovacuum,
last_analyze
FROM pg_stat_user_tables
WHERE tablename = 'users';
Why statistics matter:
- Planner estimates row counts to choose join algorithms
- Outdated stats → bad plans → slow queries
- Run ANALYZE after large data loads
Key Takeaways
- The query planner generates multiple execution plans and chooses the best based on cost estimates
- EXPLAIN shows the actual execution plan and performance metrics
- Sequential scans are slow for large tables; index scans are much faster
- Different join algorithms (nested loop, hash join, merge join) are optimal for different scenarios
- The planner relies on statistics; run ANALYZE regularly to keep them updated
- Reading EXPLAIN output is essential for diagnosing slow queries

