Module 8: Indexing Strategies
How Vector Databases Organize Data for Fast Search
Introduction
Indexing is what makes vector databases fast. Without proper indexes, every query would need to compare against every vector—impossible at scale.
By the end of this module, you'll understand:
- How different index types work
- When to use each index type
- How to tune indexes for your use case
- The trade-offs you're making
8.1 Why Indexes Matter
The Scale Problem
Consider a database with 10 million vectors of 1,536 dimensions:
- Each comparison: ~1,536 operations
- Brute force: 10M × 1,536 = 15 billion operations per query
- At 100ms per query, you can handle maybe 10 queries/second
With a good index:
- Query time: 2-10ms
- Throughput: 1,000+ queries/second
- Same results (mostly)
The Trade-off Triangle
Every index balances three factors:
Accuracy
/\
/ \
/ \
/______\
Speed Memory
You can optimize for two, but the third will suffer.
8.2 Index Types Explained
Flat Index (Exact Search)
No index at all—compares query against every vector.
// Conceptual implementation
function flatSearch(query: number[], vectors: number[][], k: number) {
return vectors
.map((v, i) => ({ index: i, score: cosineSim(query, v) }))
.sort((a, b) => b.score - a.score)
.slice(0, k)
}
Characteristics:
- 100% accuracy
- O(n) time complexity
- Works for small datasets (< 10K vectors)
- No memory overhead
When to use:
- Datasets under 10,000 vectors
- When accuracy is absolutely critical
- For benchmarking other indexes
IVF (Inverted File Index)
Divides vectors into clusters. Only searches relevant clusters.
Step 1: Cluster vectors during index build
┌─────────────────────────────────────────────────┐
│ Cluster 1 Cluster 2 Cluster 3 Cluster 4 │
│ [v1,v4] [v2,v5] [v3,v7] [v6,v8] │
│ ● ● ● ● │
│ / \ / \ / \ / \ │
│ v1 v4 v2 v5 v3 v7 v6 v8 │
└─────────────────────────────────────────────────┘
Step 2: At query time, find nearest cluster(s)
Query → Find nearest centroid → Search only that cluster
Parameters:
nlist: Number of clusters (typically sqrt(n) to 4*sqrt(n))nprobe: Clusters to search (higher = more accurate, slower)
pgvector example:
-- Create IVFFlat index with 100 clusters
CREATE INDEX ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
-- Increase probes for better accuracy
SET ivfflat.probes = 10;
When to use:
- Large datasets (1M+ vectors)
- Memory constraints
- Acceptable accuracy loss
HNSW (Hierarchical Navigable Small World)
A multi-layer graph where nodes are vectors connected to their neighbors.
Level 2: [ A ] ─────────────────── [ D ]
↓ ↓
Level 1: [ A ] ─── [ C ] ─── [ D ] ─── [ F ]
↓ ↓ ↓ ↓
Level 0: [A] ─ [B] ─ [C] ─ [D] ─ [E] ─ [F] ─ [G] ─ [H]
How it works:
- Start at top layer (few nodes, long-range connections)
- Greedily move toward query
- Drop to lower layer
- Repeat until bottom layer
- Do thorough local search
Parameters:
M: Connections per node (default 16)efConstruction: Build quality (default 64-200)efSearch: Search quality (default 50-100)
pgvector example:
-- Create HNSW index
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- Tune search quality
SET hnsw.ef_search = 100;
When to use:
- General-purpose (most common choice)
- When you need good accuracy/speed balance
- In-memory workloads
Product Quantization (PQ)
Compresses vectors to reduce memory and speed up distance calculations.
Original vector (1536 floats, 6KB):
[0.1, 0.2, ..., 0.3, 0.4, ...]
Split into 8 subvectors:
[sub1] [sub2] [sub3] [sub4] [sub5] [sub6] [sub7] [sub8]
Quantize each to codebook index:
[12] [45] [78] [23] [56] [89] [11] [44]
Compressed (8 bytes):
Much smaller! Distance computed using lookup tables.
When to use:
- Very large datasets (100M+ vectors)
- Memory is limited
- Can accept lower accuracy
8.3 Index Selection Guide
| Dataset Size | Recommended Index | Notes |
|---|---|---|
| < 10K | Flat | Exact search is fast enough |
| 10K - 1M | HNSW | Best balance for most use cases |
| 1M - 10M | HNSW (tuned) | Increase M and ef_construction |
| 10M - 100M | IVF-HNSW or IVF-PQ | Hybrid approaches |
| > 100M | Distributed/sharded | Single-node limits reached |
By Use Case
Highest accuracy (legal, medical):
- Use HNSW with high
ef_search(200+) - Accept slower queries
Lowest latency (real-time chat):
- Use HNSW with moderate parameters
- Tune
ef_searchfor your latency budget
Lowest cost (large corpus, limited budget):
- Use IVF-PQ
- Accept accuracy trade-off
8.4 Tuning HNSW Parameters
Build Parameters
-- Higher values = better index, slower build
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (
m = 32, -- More connections (default 16)
ef_construction = 128 -- Better quality (default 64)
);
M (connections):
- Default: 16
- Range: 4-64
- Higher = better accuracy, more memory, slower build
- For 1M vectors: 16-24
- For 10M+ vectors: 24-48
ef_construction:
- Default: 64
- Range: 50-500
- Higher = better index quality
- Recommended: 2 * M minimum
Search Parameters
-- Control accuracy/speed at query time
SET hnsw.ef_search = 100; -- Default is 40
ef_search:
- Must be ≥ k (number of results)
- Higher = better recall, slower queries
- Tune based on your recall requirements
Tuning Workflow
// 1. Build index with moderate parameters
await createIndex({ m: 16, ef_construction: 100 })
// 2. Measure recall at different ef_search values
for (const ef of [20, 40, 60, 80, 100, 150, 200]) {
const recall = await measureRecall(testQueries, ef)
const latency = await measureLatency(testQueries, ef)
console.log(`ef=${ef}: recall=${recall}, p99=${latency}ms`)
}
// 3. Choose ef_search that meets your requirements
// Example output:
// ef=40: recall=0.89, p99=2.3ms
// ef=80: recall=0.95, p99=4.1ms
// ef=100: recall=0.97, p99=5.2ms
8.5 Tuning IVFFlat Parameters
Number of Lists
-- Rule of thumb: sqrt(n) to 4*sqrt(n)
-- For 1M vectors: 1000-4000 lists
CREATE INDEX ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 1000);
Probes
-- More probes = better accuracy, slower queries
SET ivfflat.probes = 10; -- Default is 1
Recommendations:
- Start with
probes = sqrt(lists) - Increase until recall meets requirements
- Typical range: 1-50
8.6 Practical Indexing Tips
When to Rebuild
Rebuild your index when:
- You've added 20%+ new data
- Query performance degrades
- You're changing index parameters
-- PostgreSQL: Rebuild index
REINDEX INDEX documents_embedding_idx;
Memory Estimation
HNSW memory per vector:
(4 * dimensions + 8 * M * 2) bytes
For 1536 dimensions, M=16:
(4 * 1536 + 8 * 16 * 2) = 6,400 bytes = 6.25 KB
1 million vectors ≈ 6.25 GB
IVFFlat memory per vector:
4 * dimensions bytes
For 1536 dimensions:
4 * 1536 = 6 KB
1 million vectors ≈ 6 GB
Build Time
Building indexes takes time. Plan accordingly:
| Vectors | HNSW (M=16) | IVFFlat |
|---|---|---|
| 100K | 1-2 min | < 1 min |
| 1M | 10-20 min | 2-5 min |
| 10M | 2-4 hours | 30-60 min |
Consider building during off-peak hours or using background jobs.
8.7 Database-Specific Index Features
Pinecone
Pinecone manages indexes automatically. You configure:
- Pod type (s1, p1, p2) affects performance
- Replicas for throughput
- Shards for large indexes
await pinecone.createIndex({
name: 'production-index',
dimension: 1536,
metric: 'cosine',
spec: {
pod: {
podType: 'p2.x1', // High-performance pod
pods: 1,
replicas: 2 // For higher throughput
}
}
})
Qdrant
Full control over HNSW parameters:
await client.createCollection('documents', {
vectors: {
size: 1536,
distance: 'Cosine'
},
hnsw_config: {
m: 16,
ef_construct: 128,
full_scan_threshold: 10000 // Use brute force below this
}
})
Chroma
Currently HNSW-only with limited tuning:
const collection = await client.createCollection({
name: 'documents',
metadata: {
'hnsw:space': 'cosine',
'hnsw:M': 16,
'hnsw:efConstruction': 100,
'hnsw:efSearch': 50
}
})
Key Takeaways
- Indexes trade accuracy for speed—understand the trade-off
- HNSW is the default choice for most workloads
- Tune parameters based on your accuracy and latency requirements
- Monitor recall in production
- Rebuild indexes as data grows
Exercise: Index Tuning Experiment
Using pgvector or your database of choice:
- Create a test dataset of 100K vectors
- Build HNSW indexes with different M values (8, 16, 32)
- Measure recall and latency at different ef_search values
- Create a chart showing the accuracy/latency trade-off
- Determine the optimal parameters for:
- Maximum recall (≥ 0.99)
- Minimum latency (p99 < 10ms)
Next up: Module 9 - Querying and Filtering

