Lesson 2.2: B-Trees and Hash Indexes
The Fundamental Data Structure: B-Tree
B-trees are the reason databases are fast. Understanding them is critical.
What is a B-Tree?
A balanced tree structure that keeps data sorted and allows searches, insertions, and deletions in O(log n) time.
Structure:
[50 | 100] ← Root (level 0)
/ | \
/ | \
[10|20|30] [60|70] [110|120] ← Internal nodes (level 1)
/ | | \ / | \ / | \
[...rows...] [...rows...] [...] ← Leaf nodes (level 2, actual data pointers)
Key properties:
- Each node has multiple keys (typically 100-1000)
- Tree height stays small (4 levels = 1 billion rows)
- Nodes are sized to match disk pages (8KB in Postgres)
Example: Finding id = 75
1. Read root node: [50 | 100] → Go right (75 > 50, 75 < 100)
2. Read node: [60 | 70] → Go right (75 > 70)
3. Read leaf node → Find row
Total disk reads: 3 (usually all cached after first query)
Index Types in PostgreSQL
B-Tree Index (Default)
-- Create index
CREATE INDEX users_email_idx ON users(email);
-- How it works
-- Index stores: [email → row_location]
-- Sorted: 'alice@...' → 'bob@...' → 'charlie@...'
-- Fast queries
SELECT * FROM users WHERE email = 'alice@example.com'; -- Index seek
SELECT * FROM users WHERE email LIKE 'alice%'; -- Index range scan
SELECT * FROM users ORDER BY email LIMIT 10; -- Index ordered scan
-- Slow queries (can't use index)
SELECT * FROM users WHERE email LIKE '%alice%'; -- Full table scan
Hash Index
-- Create hash index
CREATE INDEX users_email_hash_idx ON users USING HASH(email);
-- Use cases
-- Only supports equality (=), not ranges or LIKE
-- Slightly faster than B-tree for exact matches
-- Smaller index size
-- Fast
SELECT * FROM users WHERE email = 'alice@example.com';
-- Can't use hash index
SELECT * FROM users WHERE email > 'alice@example.com'; -- Needs ordering
When to use hash indexes:
- Equality lookups only (no ranges, no sorting)
- Very large text columns (UUIDs, hashes)
- Space-constrained environments
Most of the time, stick with B-tree.
How Indexes Speed Up Queries
Without index:
-- Full table scan
SELECT * FROM users WHERE email = 'alice@example.com';
-- Postgres scans every row (1 million rows)
-- Read all pages from disk: 8KB × 125,000 pages = 1GB read
-- Time: ~2-5 seconds
With index:
-- Index seek
CREATE INDEX users_email_idx ON users(email);
SELECT * FROM users WHERE email = 'alice@example.com';
-- Postgres:
-- 1. Reads 3-4 index pages (B-tree traversal)
-- 2. Gets row location
-- 3. Reads 1 heap page (actual row)
-- Total: 4-5 page reads = ~40KB
-- Time: 1-2ms (500x faster)
Key Takeaways
- B-trees keep data sorted and enable O(log n) searches
- B-tree height stays small even with billions of rows (typically 3-4 levels)
- B-tree indexes support equality, ranges, sorting, and prefix matching
- Hash indexes only support equality but are slightly faster for exact matches
- Indexes dramatically reduce disk I/O by avoiding full table scans
- Most queries should use B-tree indexes as the default choice

