Module 3: Similarity Search Explained

How Vector Databases Find the Nearest Neighbors

Introduction

Now that you understand embeddings, let's explore how vector databases actually find similar vectors. This isn't just academic—understanding search algorithms helps you make better decisions about index types, performance tuning, and trade-offs.

By the end of this module, you'll understand:

Distance metrics and when to use each
Exact vs. approximate search
How ANN (Approximate Nearest Neighbor) algorithms work
The speed-accuracy trade-off

3.1 Distance Metrics

How Do We Measure Similarity?

Given two vectors, we need a way to quantify how similar they are. There are several distance metrics, each with different properties.

Cosine Similarity

The most common metric for text embeddings. It measures the angle between two vectors.

function cosineSimilarity(a: number[], b: number[]): number {
  let dotProduct = 0
  let normA = 0
  let normB = 0

  for (let i = 0; i < a.length; i++) {
    dotProduct += a[i] * b[i]
    normA += a[i] * a[i]
    normB += b[i] * b[i]
  }

  return dotProduct / (Math.sqrt(normA) * Math.sqrt(normB))
}

// Example
const v1 = [1, 2, 3]
const v2 = [2, 4, 6]  // Same direction, different magnitude
const v3 = [-1, -2, -3]  // Opposite direction

console.log(cosineSimilarity(v1, v2))  // 1.0 (identical direction)
console.log(cosineSimilarity(v1, v3))  // -1.0 (opposite direction)

Properties:

Range: -1 to 1 (1 = identical, 0 = orthogonal, -1 = opposite)
Ignores magnitude (length of vector)
Perfect for normalized embeddings
Most embedding models are optimized for cosine similarity

Euclidean Distance (L2)

The straight-line distance between two points in space.

function euclideanDistance(a: number[], b: number[]): number {
  let sum = 0
  for (let i = 0; i < a.length; i++) {
    sum += (a[i] - b[i]) ** 2
  }
  return Math.sqrt(sum)
}

// Example
const v1 = [0, 0]
const v2 = [3, 4]
console.log(euclideanDistance(v1, v2))  // 5.0

Properties:

Range: 0 to infinity (0 = identical)
Considers magnitude
Good for when vector length matters
Common in image embeddings

Dot Product (Inner Product)

Simple multiplication and sum. Related to cosine similarity.

function dotProduct(a: number[], b: number[]): number {
  let sum = 0
  for (let i = 0; i < a.length; i++) {
    sum += a[i] * b[i]
  }
  return sum
}

Properties:

Range: -infinity to infinity
Combines angle and magnitude
Fastest to compute
Used when embeddings are normalized (then equivalent to cosine)

Which Metric to Use?

Metric	Best For	Notes
Cosine	Text embeddings, semantic search	Most common, ignores magnitude
Euclidean	Images, spatial data	When magnitude matters
Dot Product	Normalized embeddings	Fastest computation

Rule of thumb: Use cosine similarity for text unless you have a specific reason not to.

3.2 Exact vs. Approximate Search

The Scaling Problem

Exact search compares your query against every vector in the database:

// Exact search - O(n) complexity
function exactSearch(query: number[], vectors: number[][], k: number) {
  const scores = vectors.map((v, i) => ({
    index: i,
    score: cosineSimilarity(query, v)
  }))

  return scores
    .sort((a, b) => b.score - a.score)
    .slice(0, k)
}

The math is brutal:

1 million vectors × 1,536 dimensions = 1.5 billion operations per query
At 1,000 QPS, that's 1.5 trillion operations per second
Exact search doesn't scale

Approximate Nearest Neighbor (ANN)

ANN algorithms trade perfect accuracy for massive speed improvements:

Vectors	Exact Search	ANN (HNSW)
100K	100ms	1ms
1M	1s	2ms
10M	10s	5ms
100M	100s	10ms

The trade-off: ANN might miss the absolute best match, but it finds very good matches in a fraction of the time.

In practice, ANN returns the correct top-10 results 95-99% of the time. For most applications, this is more than acceptable.

3.3 How ANN Algorithms Work

HNSW (Hierarchical Navigable Small World)

The most popular ANN algorithm. Used by Pinecone, pgvector, Qdrant, and others.

Intuition: Imagine trying to find someone in a city:

You don't check every house
You navigate through neighborhoods (hierarchy)
At each level, you move toward your target
Eventually, you reach the specific block and find your person

How HNSW works:

Level 3: [  A  ]                    // Few nodes, long jumps
Level 2: [  A  -----  D  ]          // More nodes, medium jumps
Level 1: [  A  --  C  --  D  --  F  ]  // Many nodes, short jumps
Level 0: [ A  B  C  D  E  F  G  H  ]  // All nodes, shortest jumps

Start at the top level with few nodes
Find the closest node at this level
Move down to the next level
Repeat until you reach the bottom
Do a thorough local search at the bottom

HNSW Parameters:

M: Number of connections per node (higher = more accurate, more memory)
efConstruction: Build quality (higher = better index, slower build)
efSearch: Search quality (higher = more accurate, slower queries)

IVF (Inverted File Index)

Another common approach. Divides vectors into clusters.

Intuition: Instead of searching everywhere:

Divide vectors into clusters (using k-means)
Find which cluster(s) your query is closest to
Only search within those clusters

Cluster 1: [v1, v5, v8, v12]    // Query is closest to this cluster
Cluster 2: [v2, v6, v9, v13]
Cluster 3: [v3, v7, v10, v14]
Cluster 4: [v4, v11, v15, v16]

// Only search Cluster 1 instead of all 16 vectors

IVF Parameters:

nlist: Number of clusters (more = more precision, slower build)
nprobe: Clusters to search (more = more accurate, slower query)

Product Quantization (PQ)

Compresses vectors to reduce memory and speed up search.

Intuition: Instead of storing full 1536-dimensional vectors:

Split each vector into segments (e.g., 8 segments of 192 dimensions)
Quantize each segment to a limited set of values
Store just the quantization codes

Original: [0.1, 0.2, ..., 0.9] (1536 floats = 6KB)
Quantized: [12, 45, 78, 23, 56, 89, 11, 44] (8 bytes)

Trade-off: Lower memory, slightly lower accuracy.

3.4 Accuracy Metrics

Recall@K

The percentage of true top-K results that your ANN search actually returns.

// If the true top-10 are: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
// And ANN returns: [1, 2, 3, 4, 5, 7, 8, 11, 12, 13]
// Recall@10 = 7/10 = 70%

Industry standards:

Recall@10 > 95% is excellent
Recall@10 > 90% is good
Recall@10 > 80% is acceptable for most uses

Latency

Query response time. Depends on:

Dataset size
Index type
Hardware
Search parameters

Typical targets:

p50 < 10ms (median query)
p99 < 100ms (99th percentile)

Throughput

Queries per second (QPS) the system can handle.

Depends on:

Hardware (CPU, memory, SSD)
Index configuration
Concurrency

3.5 Choosing the Right Algorithm

Algorithm	Best For	Trade-offs
HNSW	General purpose, low latency	Higher memory usage
IVF-Flat	Memory-constrained, moderate scale	Lower recall at high speed
IVF-PQ	Very large datasets, cost-sensitive	Lower accuracy
Flat (exact)	Small datasets, highest accuracy	Doesn't scale

Decision guide:

Dataset size < 100K vectors?
  → Consider exact search or simple HNSW

Dataset size 100K - 10M?
  → HNSW with default parameters

Dataset size > 10M?
  → HNSW with tuned parameters or IVF-based

Memory constrained?
  → IVF-PQ or quantized HNSW

Need highest accuracy?
  → Increase ef/nprobe, accept higher latency

3.6 Practical Implications

Building the Index

Index building is expensive and happens during data ingestion:

// This step builds the HNSW graph
await vectorDB.upsert({
  vectors: embeddedDocuments  // Building index as you insert
})

Best practices:

Batch inserts (don't insert one at a time)
Build index during off-peak hours for large datasets
Some DBs support background index building

Tuning for Your Use Case

High accuracy requirements (medical, legal):

// Increase search quality
await vectorDB.query({
  vector: queryEmbedding,
  topK: 10,
  efSearch: 200  // Higher = more accurate
})

Low latency requirements (real-time chat):

// Lower search quality for speed
await vectorDB.query({
  vector: queryEmbedding,
  topK: 10,
  efSearch: 50  // Lower = faster
})

Balance (most applications):

Use default parameters
Measure actual recall on your data
Tune only if needed

Key Takeaways

Cosine similarity is the default for text embeddings
Exact search doesn't scale—use ANN for datasets > 10K vectors
HNSW is the most common algorithm, offering good balance of speed and accuracy
Recall and latency are the key metrics to monitor
Tune parameters based on your specific accuracy and latency requirements

Exercise: Understanding the Trade-offs

Given a system with these requirements:

5 million document embeddings
p99 latency < 50ms
Recall@10 > 95%

Would you use exact search or ANN? Why?
Which algorithm would you choose?
How would you tune the parameters?
What would you monitor in production?

Next up: Module 4 - The Vector Database Landscape

Module 3: Similarity Search Explained

How Vector Databases Find the Nearest Neighbors

Introduction

By the end of this module, you'll understand:

Distance metrics and when to use each
Exact vs. approximate search
How ANN (Approximate Nearest Neighbor) algorithms work
The speed-accuracy trade-off

3.1 Distance Metrics

How Do We Measure Similarity?

Given two vectors, we need a way to quantify how similar they are. There are several distance metrics, each with different properties.

Cosine Similarity

The most common metric for text embeddings. It measures the angle between two vectors.

function cosineSimilarity(a: number[], b: number[]): number {
  let dotProduct = 0
  let normA = 0
  let normB = 0

  for (let i = 0; i < a.length; i++) {
    dotProduct += a[i] * b[i]
    normA += a[i] * a[i]
    normB += b[i] * b[i]
  }

  return dotProduct / (Math.sqrt(normA) * Math.sqrt(normB))
}

// Example
const v1 = [1, 2, 3]
const v2 = [2, 4, 6]  // Same direction, different magnitude
const v3 = [-1, -2, -3]  // Opposite direction

console.log(cosineSimilarity(v1, v2))  // 1.0 (identical direction)
console.log(cosineSimilarity(v1, v3))  // -1.0 (opposite direction)

Properties:

Range: -1 to 1 (1 = identical, 0 = orthogonal, -1 = opposite)
Ignores magnitude (length of vector)
Perfect for normalized embeddings
Most embedding models are optimized for cosine similarity

Euclidean Distance (L2)

The straight-line distance between two points in space.

function euclideanDistance(a: number[], b: number[]): number {
  let sum = 0
  for (let i = 0; i < a.length; i++) {
    sum += (a[i] - b[i]) ** 2
  }
  return Math.sqrt(sum)
}

// Example
const v1 = [0, 0]
const v2 = [3, 4]
console.log(euclideanDistance(v1, v2))  // 5.0

Properties:

Range: 0 to infinity (0 = identical)
Considers magnitude
Good for when vector length matters
Common in image embeddings

Dot Product (Inner Product)

Simple multiplication and sum. Related to cosine similarity.

function dotProduct(a: number[], b: number[]): number {
  let sum = 0
  for (let i = 0; i < a.length; i++) {
    sum += a[i] * b[i]
  }
  return sum
}

Properties:

Range: -infinity to infinity
Combines angle and magnitude
Fastest to compute
Used when embeddings are normalized (then equivalent to cosine)

Which Metric to Use?

Metric	Best For	Notes
Cosine	Text embeddings, semantic search	Most common, ignores magnitude
Euclidean	Images, spatial data	When magnitude matters
Dot Product	Normalized embeddings	Fastest computation

Rule of thumb: Use cosine similarity for text unless you have a specific reason not to.

3.2 Exact vs. Approximate Search

The Scaling Problem

Exact search compares your query against every vector in the database:

// Exact search - O(n) complexity
function exactSearch(query: number[], vectors: number[][], k: number) {
  const scores = vectors.map((v, i) => ({
    index: i,
    score: cosineSimilarity(query, v)
  }))

  return scores
    .sort((a, b) => b.score - a.score)
    .slice(0, k)
}

The math is brutal:

1 million vectors × 1,536 dimensions = 1.5 billion operations per query
At 1,000 QPS, that's 1.5 trillion operations per second
Exact search doesn't scale

Approximate Nearest Neighbor (ANN)

ANN algorithms trade perfect accuracy for massive speed improvements:

Vectors	Exact Search	ANN (HNSW)
100K	100ms	1ms
1M	1s	2ms
10M	10s	5ms
100M	100s	10ms

The trade-off: ANN might miss the absolute best match, but it finds very good matches in a fraction of the time.

In practice, ANN returns the correct top-10 results 95-99% of the time. For most applications, this is more than acceptable.

3.3 How ANN Algorithms Work

HNSW (Hierarchical Navigable Small World)

The most popular ANN algorithm. Used by Pinecone, pgvector, Qdrant, and others.

Intuition: Imagine trying to find someone in a city:

You don't check every house
You navigate through neighborhoods (hierarchy)
At each level, you move toward your target
Eventually, you reach the specific block and find your person

How HNSW works:

Level 3: [  A  ]                    // Few nodes, long jumps
Level 2: [  A  -----  D  ]          // More nodes, medium jumps
Level 1: [  A  --  C  --  D  --  F  ]  // Many nodes, short jumps
Level 0: [ A  B  C  D  E  F  G  H  ]  // All nodes, shortest jumps

Start at the top level with few nodes
Find the closest node at this level
Move down to the next level
Repeat until you reach the bottom
Do a thorough local search at the bottom

HNSW Parameters:

M: Number of connections per node (higher = more accurate, more memory)
efConstruction: Build quality (higher = better index, slower build)
efSearch: Search quality (higher = more accurate, slower queries)

IVF (Inverted File Index)

Another common approach. Divides vectors into clusters.

Intuition: Instead of searching everywhere:

Divide vectors into clusters (using k-means)
Find which cluster(s) your query is closest to
Only search within those clusters

Cluster 1: [v1, v5, v8, v12]    // Query is closest to this cluster
Cluster 2: [v2, v6, v9, v13]
Cluster 3: [v3, v7, v10, v14]
Cluster 4: [v4, v11, v15, v16]

// Only search Cluster 1 instead of all 16 vectors

IVF Parameters:

nlist: Number of clusters (more = more precision, slower build)
nprobe: Clusters to search (more = more accurate, slower query)

Product Quantization (PQ)

Compresses vectors to reduce memory and speed up search.

Intuition: Instead of storing full 1536-dimensional vectors:

Split each vector into segments (e.g., 8 segments of 192 dimensions)
Quantize each segment to a limited set of values
Store just the quantization codes

Original: [0.1, 0.2, ..., 0.9] (1536 floats = 6KB)
Quantized: [12, 45, 78, 23, 56, 89, 11, 44] (8 bytes)

Trade-off: Lower memory, slightly lower accuracy.

3.4 Accuracy Metrics

Recall@K

The percentage of true top-K results that your ANN search actually returns.

// If the true top-10 are: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
// And ANN returns: [1, 2, 3, 4, 5, 7, 8, 11, 12, 13]
// Recall@10 = 7/10 = 70%

Industry standards:

Recall@10 > 95% is excellent
Recall@10 > 90% is good
Recall@10 > 80% is acceptable for most uses

Latency

Query response time. Depends on:

Dataset size
Index type
Hardware
Search parameters

Typical targets:

p50 < 10ms (median query)
p99 < 100ms (99th percentile)

Throughput

Queries per second (QPS) the system can handle.

Depends on:

Hardware (CPU, memory, SSD)
Index configuration
Concurrency

3.5 Choosing the Right Algorithm

Algorithm	Best For	Trade-offs
HNSW	General purpose, low latency	Higher memory usage
IVF-Flat	Memory-constrained, moderate scale	Lower recall at high speed
IVF-PQ	Very large datasets, cost-sensitive	Lower accuracy
Flat (exact)	Small datasets, highest accuracy	Doesn't scale

Decision guide:

Dataset size < 100K vectors?
  → Consider exact search or simple HNSW

Dataset size 100K - 10M?
  → HNSW with default parameters

Dataset size > 10M?
  → HNSW with tuned parameters or IVF-based

Memory constrained?
  → IVF-PQ or quantized HNSW

Need highest accuracy?
  → Increase ef/nprobe, accept higher latency

3.6 Practical Implications

Building the Index

Index building is expensive and happens during data ingestion:

// This step builds the HNSW graph
await vectorDB.upsert({
  vectors: embeddedDocuments  // Building index as you insert
})

Best practices:

Batch inserts (don't insert one at a time)
Build index during off-peak hours for large datasets
Some DBs support background index building

Tuning for Your Use Case

High accuracy requirements (medical, legal):

// Increase search quality
await vectorDB.query({
  vector: queryEmbedding,
  topK: 10,
  efSearch: 200  // Higher = more accurate
})

Low latency requirements (real-time chat):

// Lower search quality for speed
await vectorDB.query({
  vector: queryEmbedding,
  topK: 10,
  efSearch: 50  // Lower = faster
})

Balance (most applications):

Use default parameters
Measure actual recall on your data
Tune only if needed

Key Takeaways

Cosine similarity is the default for text embeddings
Exact search doesn't scale—use ANN for datasets > 10K vectors
HNSW is the most common algorithm, offering good balance of speed and accuracy
Recall and latency are the key metrics to monitor
Tune parameters based on your specific accuracy and latency requirements

Exercise: Understanding the Trade-offs

Given a system with these requirements:

5 million document embeddings
p99 latency < 50ms
Recall@10 > 95%

Would you use exact search or ANN? Why?
Which algorithm would you choose?
How would you tune the parameters?
What would you monitor in production?

Next up: Module 4 - The Vector Database Landscape