Lesson 3.1: B-Tree Architecture Deep Dive
Introduction
Indexes are the difference between a 10ms query and a 10-second query. Between a system that handles 1000 requests/second and one that crashes at 10.
For AI applications, indexing becomes even more critical:
- RAG systems query millions of embeddings
- Feature stores serve 10k+ reads/second
- Model logs grow to billions of rows
- Real-time inference can't wait for slow queries
How B-Trees Actually Work
Node structure:
[Page Header: 24 bytes]
[Item Pointers: N × 4 bytes]
[Free Space]
[Items: Key-Value pairs]
[Special Space: btree metadata]
Default page size: 8KB (8192 bytes)
Typical keys per page: 100-300
Tree height for 1B rows: 4-5 levels
Example: Index on email (TEXT column)
CREATE INDEX users_email_idx ON users(email);
-- Index entry format:
-- ['alice@example.com' → TID(page=42, offset=3)]
-- ['bob@example.com' → TID(page=42, offset=7)]
-- ...
-- TID = Tuple Identifier (pointer to heap row)
Index Scan Performance
Best case: Index Seek (single value)
SELECT * FROM users WHERE email = 'alice@example.com';
-- I/O operations:
-- 1. Read root page (usually cached)
-- 2. Read internal page (usually cached)
-- 3. Read leaf page (index)
-- 4. Read heap page (actual row)
-- Total: ~4 page reads (32KB) = 1-2ms
Range scan:
SELECT * FROM users WHERE created_at > NOW() - INTERVAL '7 days';
-- If index on created_at:
-- 1-2. Traverse B-tree to starting point
-- 3. Sequential scan through leaf pages
-- 4. Random heap reads for each row
-- Performance depends on result size
When indexes hurt performance:
-- Bad: Returning 50% of table
SELECT * FROM users WHERE status = 'active'; -- If 50% are active
-- Planner will choose Seq Scan instead of Index Scan
-- Why? Reading 500k random heap pages slower than reading entire table sequentially
Rule of thumb:
- Index Scan: Fast for under ~5% of table
- Bitmap Index Scan: Good for 5-25% of table
- Sequential Scan: Better for over 25% of table
Index Size and Maintenance
-- Check index size
SELECT
indexname,
pg_size_pretty(pg_relation_size(indexname::regclass)) AS size
FROM pg_indexes
WHERE tablename = 'documents';
-- Example output:
-- documents_pkey | 214 MB
-- documents_user_id_idx | 85 MB
-- documents_embedding_idx | 4.2 GB (vector index is large!)
Index bloat:
-- Rebuild bloated index
REINDEX INDEX documents_user_id_idx;
-- Or rebuild all indexes on table
REINDEX TABLE documents;
Key Takeaways
- B-tree nodes match disk page size (8KB) for optimal I/O
- Tree height stays small even with billions of rows (4-5 levels)
- Index seeks require ~3-4 page reads (1-2ms typical)
- Indexes are counterproductive when returning >25% of table rows
- Monitor index size and rebuild to fix bloat
- Understanding physical structure helps predict query performance

