Lesson 5.1: Identifying Slow Query Patterns
Introduction
AI systems push databases to their limits. Feature stores need to serve 10k reads/second. ML pipelines write millions of rows/hour. Real-time inference can't tolerate slow queries.
This lesson teaches you how to identify performance bottlenecks.
Using pg_stat_statements
-- Enable extension (add to postgresql.conf: shared_preload_libraries = 'pg_stat_statements')
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
-- Find slowest queries
SELECT
query,
calls,
total_exec_time / 1000 AS total_seconds,
mean_exec_time AS avg_ms,
max_exec_time AS max_ms
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;
Common Slow Patterns
1. Missing indexes:
-- Full table scan
SELECT * FROM users WHERE email = 'test@example.com';
-- Fix: Add index
CREATE INDEX users_email_idx ON users(email);
2. N+1 queries:
# Bad: N+1 queries
users = db.execute("SELECT id, name FROM users")
for user in users:
orders = db.execute(f"SELECT * FROM orders WHERE user_id = {user.id}") # N queries!
# Good: Single JOIN
SELECT u.id, u.name, o.id AS order_id, o.total
FROM users u
LEFT JOIN orders o ON u.id = o.user_id;
3. Unbounded queries:
-- Bad: Returns entire table
SELECT * FROM model_predictions WHERE model_id = 'gpt-4'; -- 10M rows!
-- Good: Add LIMIT and WHERE created_at
SELECT * FROM model_predictions
WHERE model_id = 'gpt-4'
AND created_at > NOW() - INTERVAL '7 days'
LIMIT 1000;
4. Function in WHERE clause:
-- Bad: Can't use index
SELECT * FROM users WHERE LOWER(email) = 'test@example.com';
-- Good: Expression index or store normalized
CREATE INDEX users_email_lower_idx ON users(LOWER(email));
EXPLAIN Deep Dive
EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
SELECT d.content, e.vector <=> query_vector AS dist
FROM documents d
JOIN embeddings e ON d.id = e.document_id
WHERE d.user_id = 123
ORDER BY dist
LIMIT 10;
Key metrics:
actual time: Real execution timerows: Estimated vs actual row countsBuffers: shared hit: Cache hits (good)Buffers: shared read: Disk reads (slow)Heap Fetches: Rows read from table (minimize)
Key Takeaways
- Enable pg_stat_statements to track query performance
- Common slow patterns: missing indexes, N+1 queries, unbounded queries, functions in WHERE
- Use EXPLAIN ANALYZE to diagnose slow queries
- Focus on queries with high total_exec_time (calls × avg_time)
- Buffer metrics reveal I/O bottlenecks
- Expression indexes solve function-in-WHERE problems

