Lesson 3.1: B-Tree Architecture Deep Dive

Introduction

Indexes are the difference between a 10ms query and a 10-second query. Between a system that handles 1000 requests/second and one that crashes at 10.

For AI applications, indexing becomes even more critical:

RAG systems query millions of embeddings
Feature stores serve 10k+ reads/second
Model logs grow to billions of rows
Real-time inference can't wait for slow queries

How B-Trees Actually Work

Node structure:

[Page Header: 24 bytes]
[Item Pointers: N × 4 bytes]
[Free Space]
[Items: Key-Value pairs]
[Special Space: btree metadata]

Default page size: 8KB (8192 bytes)
Typical keys per page: 100-300
Tree height for 1B rows: 4-5 levels

Example: Index on email (TEXT column)

CREATE INDEX users_email_idx ON users(email);

-- Index entry format:
-- ['alice@example.com' → TID(page=42, offset=3)]
-- ['bob@example.com' → TID(page=42, offset=7)]
-- ...

-- TID = Tuple Identifier (pointer to heap row)

Index Scan Performance

Best case: Index Seek (single value)

SELECT * FROM users WHERE email = 'alice@example.com';

-- I/O operations:
-- 1. Read root page (usually cached)
-- 2. Read internal page (usually cached)
-- 3. Read leaf page (index)
-- 4. Read heap page (actual row)
-- Total: ~4 page reads (32KB) = 1-2ms

Range scan:

SELECT * FROM users WHERE created_at > NOW() - INTERVAL '7 days';

-- If index on created_at:
-- 1-2. Traverse B-tree to starting point
-- 3. Sequential scan through leaf pages
-- 4. Random heap reads for each row
-- Performance depends on result size

When indexes hurt performance:

-- Bad: Returning 50% of table
SELECT * FROM users WHERE status = 'active';  -- If 50% are active

-- Planner will choose Seq Scan instead of Index Scan
-- Why? Reading 500k random heap pages slower than reading entire table sequentially

Rule of thumb:

Index Scan: Fast for under ~5% of table
Bitmap Index Scan: Good for 5-25% of table
Sequential Scan: Better for over 25% of table

Index Size and Maintenance

-- Check index size
SELECT
    indexname,
    pg_size_pretty(pg_relation_size(indexname::regclass)) AS size
FROM pg_indexes
WHERE tablename = 'documents';

-- Example output:
-- documents_pkey          | 214 MB
-- documents_user_id_idx   | 85 MB
-- documents_embedding_idx | 4.2 GB  (vector index is large!)

Index bloat:

-- Rebuild bloated index
REINDEX INDEX documents_user_id_idx;

-- Or rebuild all indexes on table
REINDEX TABLE documents;

Key Takeaways

B-tree nodes match disk page size (8KB) for optimal I/O
Tree height stays small even with billions of rows (4-5 levels)
Index seeks require ~3-4 page reads (1-2ms typical)
Indexes are counterproductive when returning >25% of table rows
Monitor index size and rebuild to fix bloat
Understanding physical structure helps predict query performance