Lesson 6.1: How Vector Search Actually Works
Embeddings Explained
Text to numbers:
from openai import OpenAI
client = OpenAI()
# Convert text to 1536-dimensional vector
response = client.embeddings.create(
model="text-embedding-3-small",
input="SQL databases are great for AI systems"
)
embedding = response.data[0].embedding
# [0.023, -0.041, 0.012, ..., 0.031] (1536 numbers)
Similarity via distance:
import numpy as np
def cosine_similarity(vec1, vec2):
return np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))
# Similar sentences = close embeddings
embed1 = get_embedding("SQL is fast")
embed2 = get_embedding("Databases are quick")
embed3 = get_embedding("I like pizza")
print(cosine_similarity(embed1, embed2)) # 0.85 (similar!)
print(cosine_similarity(embed1, embed3)) # 0.12 (different)
Naive Vector Search (Slow)
-- Brute force: Compare query to all embeddings
SELECT
id,
content,
embedding <=> query_embedding AS distance
FROM documents
ORDER BY distance
LIMIT 10;
-- For 1M vectors:
-- 1M × 1536 float multiplications = 1.5 billion ops
-- Time: 5-10 seconds (unacceptable for production)
Approximate Nearest Neighbor (ANN)
IVFFlat algorithm:
- Build time: Cluster vectors into N lists (like K-means)
- Query time: Search only nearest clusters
-- Create IVFFlat index
CREATE INDEX embeddings_ivfflat_idx ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
-- Set probes (how many clusters to search)
SET ivfflat.probes = 10;
-- Now searches 10 clusters, not entire table
-- Time: 50ms (100x faster, ~95% recall)
HNSW algorithm (better):
- Hierarchical graph structure
- Better recall than IVFFlat
- Slightly slower builds, faster queries
CREATE INDEX embeddings_hnsw_idx ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- Query time: 20-50ms, ~98% recall
Key Takeaways
- Embeddings convert text to high-dimensional vectors for semantic similarity
- Naive vector search is too slow for production (5-10 seconds for 1M vectors)
- ANN indexes (IVFFlat, HNSW) trade perfect recall for speed (100x faster)
- IVFFlat clusters vectors, searches nearest clusters (~95% recall)
- HNSW uses hierarchical graphs for better recall (~98%) and speed (20-50ms)
- Production vector search requires ANN indexes for acceptable latency

