Module 3: Similarity Search Explained
How Vector Databases Find the Nearest Neighbors
Introduction
Now that you understand embeddings, let's explore how vector databases actually find similar vectors. This isn't just academic—understanding search algorithms helps you make better decisions about index types, performance tuning, and trade-offs.
By the end of this module, you'll understand:
- Distance metrics and when to use each
- Exact vs. approximate search
- How ANN (Approximate Nearest Neighbor) algorithms work
- The speed-accuracy trade-off
3.1 Distance Metrics
How Do We Measure Similarity?
Given two vectors, we need a way to quantify how similar they are. There are several distance metrics, each with different properties.
Cosine Similarity
The most common metric for text embeddings. It measures the angle between two vectors.
function cosineSimilarity(a: number[], b: number[]): number {
let dotProduct = 0
let normA = 0
let normB = 0
for (let i = 0; i < a.length; i++) {
dotProduct += a[i] * b[i]
normA += a[i] * a[i]
normB += b[i] * b[i]
}
return dotProduct / (Math.sqrt(normA) * Math.sqrt(normB))
}
// Example
const v1 = [1, 2, 3]
const v2 = [2, 4, 6] // Same direction, different magnitude
const v3 = [-1, -2, -3] // Opposite direction
console.log(cosineSimilarity(v1, v2)) // 1.0 (identical direction)
console.log(cosineSimilarity(v1, v3)) // -1.0 (opposite direction)
Properties:
- Range: -1 to 1 (1 = identical, 0 = orthogonal, -1 = opposite)
- Ignores magnitude (length of vector)
- Perfect for normalized embeddings
- Most embedding models are optimized for cosine similarity
Euclidean Distance (L2)
The straight-line distance between two points in space.
function euclideanDistance(a: number[], b: number[]): number {
let sum = 0
for (let i = 0; i < a.length; i++) {
sum += (a[i] - b[i]) ** 2
}
return Math.sqrt(sum)
}
// Example
const v1 = [0, 0]
const v2 = [3, 4]
console.log(euclideanDistance(v1, v2)) // 5.0
Properties:
- Range: 0 to infinity (0 = identical)
- Considers magnitude
- Good for when vector length matters
- Common in image embeddings
Dot Product (Inner Product)
Simple multiplication and sum. Related to cosine similarity.
function dotProduct(a: number[], b: number[]): number {
let sum = 0
for (let i = 0; i < a.length; i++) {
sum += a[i] * b[i]
}
return sum
}
Properties:
- Range: -infinity to infinity
- Combines angle and magnitude
- Fastest to compute
- Used when embeddings are normalized (then equivalent to cosine)
Which Metric to Use?
| Metric | Best For | Notes |
|---|---|---|
| Cosine | Text embeddings, semantic search | Most common, ignores magnitude |
| Euclidean | Images, spatial data | When magnitude matters |
| Dot Product | Normalized embeddings | Fastest computation |
Rule of thumb: Use cosine similarity for text unless you have a specific reason not to.
3.2 Exact vs. Approximate Search
The Scaling Problem
Exact search compares your query against every vector in the database:
// Exact search - O(n) complexity
function exactSearch(query: number[], vectors: number[][], k: number) {
const scores = vectors.map((v, i) => ({
index: i,
score: cosineSimilarity(query, v)
}))
return scores
.sort((a, b) => b.score - a.score)
.slice(0, k)
}
The math is brutal:
- 1 million vectors × 1,536 dimensions = 1.5 billion operations per query
- At 1,000 QPS, that's 1.5 trillion operations per second
- Exact search doesn't scale
Approximate Nearest Neighbor (ANN)
ANN algorithms trade perfect accuracy for massive speed improvements:
| Vectors | Exact Search | ANN (HNSW) |
|---|---|---|
| 100K | 100ms | 1ms |
| 1M | 1s | 2ms |
| 10M | 10s | 5ms |
| 100M | 100s | 10ms |
The trade-off: ANN might miss the absolute best match, but it finds very good matches in a fraction of the time.
In practice, ANN returns the correct top-10 results 95-99% of the time. For most applications, this is more than acceptable.
3.3 How ANN Algorithms Work
HNSW (Hierarchical Navigable Small World)
The most popular ANN algorithm. Used by Pinecone, pgvector, Qdrant, and others.
Intuition: Imagine trying to find someone in a city:
- You don't check every house
- You navigate through neighborhoods (hierarchy)
- At each level, you move toward your target
- Eventually, you reach the specific block and find your person
How HNSW works:
Level 3: [ A ] // Few nodes, long jumps
Level 2: [ A ----- D ] // More nodes, medium jumps
Level 1: [ A -- C -- D -- F ] // Many nodes, short jumps
Level 0: [ A B C D E F G H ] // All nodes, shortest jumps
- Start at the top level with few nodes
- Find the closest node at this level
- Move down to the next level
- Repeat until you reach the bottom
- Do a thorough local search at the bottom
HNSW Parameters:
- M: Number of connections per node (higher = more accurate, more memory)
- efConstruction: Build quality (higher = better index, slower build)
- efSearch: Search quality (higher = more accurate, slower queries)
IVF (Inverted File Index)
Another common approach. Divides vectors into clusters.
Intuition: Instead of searching everywhere:
- Divide vectors into clusters (using k-means)
- Find which cluster(s) your query is closest to
- Only search within those clusters
Cluster 1: [v1, v5, v8, v12] // Query is closest to this cluster
Cluster 2: [v2, v6, v9, v13]
Cluster 3: [v3, v7, v10, v14]
Cluster 4: [v4, v11, v15, v16]
// Only search Cluster 1 instead of all 16 vectors
IVF Parameters:
- nlist: Number of clusters (more = more precision, slower build)
- nprobe: Clusters to search (more = more accurate, slower query)
Product Quantization (PQ)
Compresses vectors to reduce memory and speed up search.
Intuition: Instead of storing full 1536-dimensional vectors:
- Split each vector into segments (e.g., 8 segments of 192 dimensions)
- Quantize each segment to a limited set of values
- Store just the quantization codes
Original: [0.1, 0.2, ..., 0.9] (1536 floats = 6KB)
Quantized: [12, 45, 78, 23, 56, 89, 11, 44] (8 bytes)
Trade-off: Lower memory, slightly lower accuracy.
3.4 Accuracy Metrics
Recall@K
The percentage of true top-K results that your ANN search actually returns.
// If the true top-10 are: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
// And ANN returns: [1, 2, 3, 4, 5, 7, 8, 11, 12, 13]
// Recall@10 = 7/10 = 70%
Industry standards:
- Recall@10 > 95% is excellent
- Recall@10 > 90% is good
- Recall@10 > 80% is acceptable for most uses
Latency
Query response time. Depends on:
- Dataset size
- Index type
- Hardware
- Search parameters
Typical targets:
- p50 < 10ms (median query)
- p99 < 100ms (99th percentile)
Throughput
Queries per second (QPS) the system can handle.
Depends on:
- Hardware (CPU, memory, SSD)
- Index configuration
- Concurrency
3.5 Choosing the Right Algorithm
| Algorithm | Best For | Trade-offs |
|---|---|---|
| HNSW | General purpose, low latency | Higher memory usage |
| IVF-Flat | Memory-constrained, moderate scale | Lower recall at high speed |
| IVF-PQ | Very large datasets, cost-sensitive | Lower accuracy |
| Flat (exact) | Small datasets, highest accuracy | Doesn't scale |
Decision guide:
Dataset size < 100K vectors?
→ Consider exact search or simple HNSW
Dataset size 100K - 10M?
→ HNSW with default parameters
Dataset size > 10M?
→ HNSW with tuned parameters or IVF-based
Memory constrained?
→ IVF-PQ or quantized HNSW
Need highest accuracy?
→ Increase ef/nprobe, accept higher latency
3.6 Practical Implications
Building the Index
Index building is expensive and happens during data ingestion:
// This step builds the HNSW graph
await vectorDB.upsert({
vectors: embeddedDocuments // Building index as you insert
})
Best practices:
- Batch inserts (don't insert one at a time)
- Build index during off-peak hours for large datasets
- Some DBs support background index building
Tuning for Your Use Case
High accuracy requirements (medical, legal):
// Increase search quality
await vectorDB.query({
vector: queryEmbedding,
topK: 10,
efSearch: 200 // Higher = more accurate
})
Low latency requirements (real-time chat):
// Lower search quality for speed
await vectorDB.query({
vector: queryEmbedding,
topK: 10,
efSearch: 50 // Lower = faster
})
Balance (most applications):
- Use default parameters
- Measure actual recall on your data
- Tune only if needed
Key Takeaways
- Cosine similarity is the default for text embeddings
- Exact search doesn't scale—use ANN for datasets > 10K vectors
- HNSW is the most common algorithm, offering good balance of speed and accuracy
- Recall and latency are the key metrics to monitor
- Tune parameters based on your specific accuracy and latency requirements
Exercise: Understanding the Trade-offs
Given a system with these requirements:
- 5 million document embeddings
- p99 latency < 50ms
- Recall@10 > 95%
- Would you use exact search or ANN? Why?
- Which algorithm would you choose?
- How would you tune the parameters?
- What would you monitor in production?
Next up: Module 4 - The Vector Database Landscape

