Module 8: Indexing Strategies

How Vector Databases Organize Data for Fast Search

Introduction

Indexing is what makes vector databases fast. Without proper indexes, every query would need to compare against every vector—impossible at scale.

By the end of this module, you'll understand:

How different index types work
When to use each index type
How to tune indexes for your use case
The trade-offs you're making

8.1 Why Indexes Matter

The Scale Problem

Consider a database with 10 million vectors of 1,536 dimensions:

Each comparison: ~1,536 operations
Brute force: 10M × 1,536 = 15 billion operations per query
At 100ms per query, you can handle maybe 10 queries/second

With a good index:

Query time: 2-10ms
Throughput: 1,000+ queries/second
Same results (mostly)

The Trade-off Triangle

Every index balances three factors:

        Accuracy
           /\
          /  \
         /    \
        /______\
    Speed      Memory

You can optimize for two, but the third will suffer.

8.2 Index Types Explained

Flat Index (Exact Search)

No index at all—compares query against every vector.

// Conceptual implementation
function flatSearch(query: number[], vectors: number[][], k: number) {
  return vectors
    .map((v, i) => ({ index: i, score: cosineSim(query, v) }))
    .sort((a, b) => b.score - a.score)
    .slice(0, k)
}

Characteristics:

100% accuracy
O(n) time complexity
Works for small datasets (< 10K vectors)
No memory overhead

When to use:

Datasets under 10,000 vectors
When accuracy is absolutely critical
For benchmarking other indexes

IVF (Inverted File Index)

Divides vectors into clusters. Only searches relevant clusters.

Step 1: Cluster vectors during index build
┌─────────────────────────────────────────────────┐
│  Cluster 1    Cluster 2    Cluster 3    Cluster 4  │
│   [v1,v4]      [v2,v5]      [v3,v7]      [v6,v8]   │
│     ●            ●            ●            ●        │
│    / \          / \          / \          / \       │
│   v1  v4      v2  v5      v3  v7      v6  v8       │
└─────────────────────────────────────────────────┘

Step 2: At query time, find nearest cluster(s)
Query → Find nearest centroid → Search only that cluster

Parameters:

nlist: Number of clusters (typically sqrt(n) to 4*sqrt(n))
nprobe: Clusters to search (higher = more accurate, slower)

pgvector example:

-- Create IVFFlat index with 100 clusters
CREATE INDEX ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

-- Increase probes for better accuracy
SET ivfflat.probes = 10;

When to use:

Large datasets (1M+ vectors)
Memory constraints
Acceptable accuracy loss

HNSW (Hierarchical Navigable Small World)

A multi-layer graph where nodes are vectors connected to their neighbors.

Level 2: [  A  ] ─────────────────── [  D  ]
             ↓                           ↓
Level 1: [  A  ] ─── [  C  ] ─── [  D  ] ─── [  F  ]
             ↓           ↓           ↓           ↓
Level 0: [A] ─ [B] ─ [C] ─ [D] ─ [E] ─ [F] ─ [G] ─ [H]

How it works:

Start at top layer (few nodes, long-range connections)
Greedily move toward query
Drop to lower layer
Repeat until bottom layer
Do thorough local search

Parameters:

M: Connections per node (default 16)
efConstruction: Build quality (default 64-200)
efSearch: Search quality (default 50-100)

pgvector example:

-- Create HNSW index
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

-- Tune search quality
SET hnsw.ef_search = 100;

When to use:

General-purpose (most common choice)
When you need good accuracy/speed balance
In-memory workloads

Product Quantization (PQ)

Compresses vectors to reduce memory and speed up distance calculations.

Original vector (1536 floats, 6KB):
[0.1, 0.2, ..., 0.3, 0.4, ...]

Split into 8 subvectors:
[sub1] [sub2] [sub3] [sub4] [sub5] [sub6] [sub7] [sub8]

Quantize each to codebook index:
[12] [45] [78] [23] [56] [89] [11] [44]

Compressed (8 bytes):
Much smaller! Distance computed using lookup tables.

When to use:

Very large datasets (100M+ vectors)
Memory is limited
Can accept lower accuracy

8.3 Index Selection Guide

Dataset Size	Recommended Index	Notes
< 10K	Flat	Exact search is fast enough
10K - 1M	HNSW	Best balance for most use cases
1M - 10M	HNSW (tuned)	Increase M and ef_construction
10M - 100M	IVF-HNSW or IVF-PQ	Hybrid approaches
> 100M	Distributed/sharded	Single-node limits reached

By Use Case

Highest accuracy (legal, medical):

Use HNSW with high ef_search (200+)
Accept slower queries

Lowest latency (real-time chat):

Use HNSW with moderate parameters
Tune ef_search for your latency budget

Lowest cost (large corpus, limited budget):

Use IVF-PQ
Accept accuracy trade-off

8.4 Tuning HNSW Parameters

Build Parameters

-- Higher values = better index, slower build
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (
  m = 32,                 -- More connections (default 16)
  ef_construction = 128   -- Better quality (default 64)
);

M (connections):

Default: 16
Range: 4-64
Higher = better accuracy, more memory, slower build
For 1M vectors: 16-24
For 10M+ vectors: 24-48

ef_construction:

Default: 64
Range: 50-500
Higher = better index quality
Recommended: 2 * M minimum

Search Parameters

-- Control accuracy/speed at query time
SET hnsw.ef_search = 100;  -- Default is 40

ef_search:

Must be ≥ k (number of results)
Higher = better recall, slower queries
Tune based on your recall requirements

Tuning Workflow

// 1. Build index with moderate parameters
await createIndex({ m: 16, ef_construction: 100 })

// 2. Measure recall at different ef_search values
for (const ef of [20, 40, 60, 80, 100, 150, 200]) {
  const recall = await measureRecall(testQueries, ef)
  const latency = await measureLatency(testQueries, ef)
  console.log(`ef=${ef}: recall=${recall}, p99=${latency}ms`)
}

// 3. Choose ef_search that meets your requirements
// Example output:
// ef=40:  recall=0.89, p99=2.3ms
// ef=80:  recall=0.95, p99=4.1ms
// ef=100: recall=0.97, p99=5.2ms

8.5 Tuning IVFFlat Parameters

Number of Lists

-- Rule of thumb: sqrt(n) to 4*sqrt(n)
-- For 1M vectors: 1000-4000 lists
CREATE INDEX ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 1000);

Probes

-- More probes = better accuracy, slower queries
SET ivfflat.probes = 10;  -- Default is 1

Recommendations:

Start with probes = sqrt(lists)
Increase until recall meets requirements
Typical range: 1-50

8.6 Practical Indexing Tips

When to Rebuild

Rebuild your index when:

You've added 20%+ new data
Query performance degrades
You're changing index parameters

-- PostgreSQL: Rebuild index
REINDEX INDEX documents_embedding_idx;

Memory Estimation

HNSW memory per vector:

(4 * dimensions + 8 * M * 2) bytes

For 1536 dimensions, M=16:
(4 * 1536 + 8 * 16 * 2) = 6,400 bytes = 6.25 KB

1 million vectors ≈ 6.25 GB

IVFFlat memory per vector:

4 * dimensions bytes

For 1536 dimensions:
4 * 1536 = 6 KB

1 million vectors ≈ 6 GB

Build Time

Building indexes takes time. Plan accordingly:

Vectors	HNSW (M=16)	IVFFlat
100K	1-2 min	< 1 min
1M	10-20 min	2-5 min
10M	2-4 hours	30-60 min

Consider building during off-peak hours or using background jobs.

8.7 Database-Specific Index Features

Pinecone

Pinecone manages indexes automatically. You configure:

Pod type (s1, p1, p2) affects performance
Replicas for throughput
Shards for large indexes

await pinecone.createIndex({
  name: 'production-index',
  dimension: 1536,
  metric: 'cosine',
  spec: {
    pod: {
      podType: 'p2.x1',  // High-performance pod
      pods: 1,
      replicas: 2  // For higher throughput
    }
  }
})

Qdrant

Full control over HNSW parameters:

await client.createCollection('documents', {
  vectors: {
    size: 1536,
    distance: 'Cosine'
  },
  hnsw_config: {
    m: 16,
    ef_construct: 128,
    full_scan_threshold: 10000  // Use brute force below this
  }
})

Chroma

Currently HNSW-only with limited tuning:

const collection = await client.createCollection({
  name: 'documents',
  metadata: {
    'hnsw:space': 'cosine',
    'hnsw:M': 16,
    'hnsw:efConstruction': 100,
    'hnsw:efSearch': 50
  }
})

Key Takeaways

Indexes trade accuracy for speed—understand the trade-off
HNSW is the default choice for most workloads
Tune parameters based on your accuracy and latency requirements
Monitor recall in production
Rebuild indexes as data grows

Exercise: Index Tuning Experiment

Using pgvector or your database of choice:

Create a test dataset of 100K vectors
Build HNSW indexes with different M values (8, 16, 32)
Measure recall and latency at different ef_search values
Create a chart showing the accuracy/latency trade-off
Determine the optimal parameters for:
- Maximum recall (≥ 0.99)
- Minimum latency (p99 < 10ms)

Next up: Module 9 - Querying and Filtering

Module 8: Indexing Strategies

How Vector Databases Organize Data for Fast Search

Introduction

Indexing is what makes vector databases fast. Without proper indexes, every query would need to compare against every vector—impossible at scale.

By the end of this module, you'll understand:

How different index types work
When to use each index type
How to tune indexes for your use case
The trade-offs you're making

8.1 Why Indexes Matter

The Scale Problem

Consider a database with 10 million vectors of 1,536 dimensions:

Each comparison: ~1,536 operations
Brute force: 10M × 1,536 = 15 billion operations per query
At 100ms per query, you can handle maybe 10 queries/second

With a good index:

Query time: 2-10ms
Throughput: 1,000+ queries/second
Same results (mostly)

The Trade-off Triangle

Every index balances three factors:

        Accuracy
           /\
          /  \
         /    \
        /______\
    Speed      Memory

You can optimize for two, but the third will suffer.

8.2 Index Types Explained

Flat Index (Exact Search)

No index at all—compares query against every vector.

// Conceptual implementation
function flatSearch(query: number[], vectors: number[][], k: number) {
  return vectors
    .map((v, i) => ({ index: i, score: cosineSim(query, v) }))
    .sort((a, b) => b.score - a.score)
    .slice(0, k)
}

Characteristics:

100% accuracy
O(n) time complexity
Works for small datasets (< 10K vectors)
No memory overhead

When to use:

Datasets under 10,000 vectors
When accuracy is absolutely critical
For benchmarking other indexes

IVF (Inverted File Index)

Divides vectors into clusters. Only searches relevant clusters.

Step 1: Cluster vectors during index build
┌─────────────────────────────────────────────────┐
│  Cluster 1    Cluster 2    Cluster 3    Cluster 4  │
│   [v1,v4]      [v2,v5]      [v3,v7]      [v6,v8]   │
│     ●            ●            ●            ●        │
│    / \          / \          / \          / \       │
│   v1  v4      v2  v5      v3  v7      v6  v8       │
└─────────────────────────────────────────────────┘

Step 2: At query time, find nearest cluster(s)
Query → Find nearest centroid → Search only that cluster

Parameters:

nlist: Number of clusters (typically sqrt(n) to 4*sqrt(n))
nprobe: Clusters to search (higher = more accurate, slower)

pgvector example:

-- Create IVFFlat index with 100 clusters
CREATE INDEX ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

-- Increase probes for better accuracy
SET ivfflat.probes = 10;

When to use:

Large datasets (1M+ vectors)
Memory constraints
Acceptable accuracy loss

HNSW (Hierarchical Navigable Small World)

A multi-layer graph where nodes are vectors connected to their neighbors.

Level 2: [  A  ] ─────────────────── [  D  ]
             ↓                           ↓
Level 1: [  A  ] ─── [  C  ] ─── [  D  ] ─── [  F  ]
             ↓           ↓           ↓           ↓
Level 0: [A] ─ [B] ─ [C] ─ [D] ─ [E] ─ [F] ─ [G] ─ [H]

How it works:

Start at top layer (few nodes, long-range connections)
Greedily move toward query
Drop to lower layer
Repeat until bottom layer
Do thorough local search

Parameters:

M: Connections per node (default 16)
efConstruction: Build quality (default 64-200)
efSearch: Search quality (default 50-100)

pgvector example:

-- Create HNSW index
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

-- Tune search quality
SET hnsw.ef_search = 100;

When to use:

General-purpose (most common choice)
When you need good accuracy/speed balance
In-memory workloads

Product Quantization (PQ)

Compresses vectors to reduce memory and speed up distance calculations.

Original vector (1536 floats, 6KB):
[0.1, 0.2, ..., 0.3, 0.4, ...]

Split into 8 subvectors:
[sub1] [sub2] [sub3] [sub4] [sub5] [sub6] [sub7] [sub8]

Quantize each to codebook index:
[12] [45] [78] [23] [56] [89] [11] [44]

Compressed (8 bytes):
Much smaller! Distance computed using lookup tables.

When to use:

Very large datasets (100M+ vectors)
Memory is limited
Can accept lower accuracy

8.3 Index Selection Guide

Dataset Size	Recommended Index	Notes
< 10K	Flat	Exact search is fast enough
10K - 1M	HNSW	Best balance for most use cases
1M - 10M	HNSW (tuned)	Increase M and ef_construction
10M - 100M	IVF-HNSW or IVF-PQ	Hybrid approaches
> 100M	Distributed/sharded	Single-node limits reached

By Use Case

Highest accuracy (legal, medical):

Use HNSW with high ef_search (200+)
Accept slower queries

Lowest latency (real-time chat):

Use HNSW with moderate parameters
Tune ef_search for your latency budget

Lowest cost (large corpus, limited budget):

Use IVF-PQ
Accept accuracy trade-off

8.4 Tuning HNSW Parameters

Build Parameters

-- Higher values = better index, slower build
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (
  m = 32,                 -- More connections (default 16)
  ef_construction = 128   -- Better quality (default 64)
);

M (connections):

Default: 16
Range: 4-64
Higher = better accuracy, more memory, slower build
For 1M vectors: 16-24
For 10M+ vectors: 24-48

ef_construction:

Default: 64
Range: 50-500
Higher = better index quality
Recommended: 2 * M minimum

Search Parameters

-- Control accuracy/speed at query time
SET hnsw.ef_search = 100;  -- Default is 40

ef_search:

Must be ≥ k (number of results)
Higher = better recall, slower queries
Tune based on your recall requirements

Tuning Workflow

// 1. Build index with moderate parameters
await createIndex({ m: 16, ef_construction: 100 })

// 2. Measure recall at different ef_search values
for (const ef of [20, 40, 60, 80, 100, 150, 200]) {
  const recall = await measureRecall(testQueries, ef)
  const latency = await measureLatency(testQueries, ef)
  console.log(`ef=${ef}: recall=${recall}, p99=${latency}ms`)
}

// 3. Choose ef_search that meets your requirements
// Example output:
// ef=40:  recall=0.89, p99=2.3ms
// ef=80:  recall=0.95, p99=4.1ms
// ef=100: recall=0.97, p99=5.2ms

8.5 Tuning IVFFlat Parameters

Number of Lists

-- Rule of thumb: sqrt(n) to 4*sqrt(n)
-- For 1M vectors: 1000-4000 lists
CREATE INDEX ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 1000);

Probes

-- More probes = better accuracy, slower queries
SET ivfflat.probes = 10;  -- Default is 1

Recommendations:

Start with probes = sqrt(lists)
Increase until recall meets requirements
Typical range: 1-50

8.6 Practical Indexing Tips

When to Rebuild

Rebuild your index when:

You've added 20%+ new data
Query performance degrades
You're changing index parameters

-- PostgreSQL: Rebuild index
REINDEX INDEX documents_embedding_idx;

Memory Estimation

HNSW memory per vector:

(4 * dimensions + 8 * M * 2) bytes

For 1536 dimensions, M=16:
(4 * 1536 + 8 * 16 * 2) = 6,400 bytes = 6.25 KB

1 million vectors ≈ 6.25 GB

IVFFlat memory per vector:

4 * dimensions bytes

For 1536 dimensions:
4 * 1536 = 6 KB

1 million vectors ≈ 6 GB

Build Time

Building indexes takes time. Plan accordingly:

Vectors	HNSW (M=16)	IVFFlat
100K	1-2 min	< 1 min
1M	10-20 min	2-5 min
10M	2-4 hours	30-60 min

Consider building during off-peak hours or using background jobs.

8.7 Database-Specific Index Features

Pinecone

Pinecone manages indexes automatically. You configure:

Pod type (s1, p1, p2) affects performance
Replicas for throughput
Shards for large indexes

await pinecone.createIndex({
  name: 'production-index',
  dimension: 1536,
  metric: 'cosine',
  spec: {
    pod: {
      podType: 'p2.x1',  // High-performance pod
      pods: 1,
      replicas: 2  // For higher throughput
    }
  }
})

Qdrant

Full control over HNSW parameters:

await client.createCollection('documents', {
  vectors: {
    size: 1536,
    distance: 'Cosine'
  },
  hnsw_config: {
    m: 16,
    ef_construct: 128,
    full_scan_threshold: 10000  // Use brute force below this
  }
})

Chroma

Currently HNSW-only with limited tuning:

const collection = await client.createCollection({
  name: 'documents',
  metadata: {
    'hnsw:space': 'cosine',
    'hnsw:M': 16,
    'hnsw:efConstruction': 100,
    'hnsw:efSearch': 50
  }
})

Key Takeaways

Indexes trade accuracy for speed—understand the trade-off
HNSW is the default choice for most workloads
Tune parameters based on your accuracy and latency requirements
Monitor recall in production
Rebuild indexes as data grows

Exercise: Index Tuning Experiment

Using pgvector or your database of choice:

Create a test dataset of 100K vectors
Build HNSW indexes with different M values (8, 16, 32)
Measure recall and latency at different ef_search values
Create a chart showing the accuracy/latency trade-off
Determine the optimal parameters for:
- Maximum recall (≥ 0.99)
- Minimum latency (p99 < 10ms)

Next up: Module 9 - Querying and Filtering