What Are Vector Databases? A Complete Guide to How They Work
If you've been following AI developments, you've probably heard about vector databases. They're powering everything from ChatGPT's memory to semantic search engines to recommendation systems.
But what exactly are they? How do they work? And why can't we just use regular databases?
Let's break it down from first principles.
The Problem: Computers Don't Understand Meaning
Traditional databases are great at exact matches. Want to find all users named "John"? Easy. Find all orders over $100? No problem.
But what if you want to find:
- Documents similar to a given topic
- Images that look like another image
- Products that customers might also like
- Text that means the same thing as a question
These are semantic queries—they're about meaning, not exact values. And traditional databases can't handle them.
-- This works perfectly in SQL
SELECT * FROM products WHERE category = 'electronics';
-- But this is impossible
SELECT * FROM products WHERE meaning SIMILAR TO 'something to listen to music';
The second query should return headphones, speakers, earbuds, MP3 players—but SQL has no concept of "meaning."
The Solution: Turn Meaning Into Numbers
Here's the key insight that makes vector databases possible:
We can represent the meaning of anything as a list of numbers.
This list of numbers is called a vector (or embedding). And once meaning is represented as numbers, we can do math on it.
What Is a Vector?
A vector is simply an ordered list of numbers. Think of it as coordinates in space:
2D vector: [3, 4] → A point on a flat plane
3D vector: [1, 2, 3] → A point in 3D space
768D vector: [0.1, -0.3, 0.8, ...] → A point in 768-dimensional space
The magic happens when we use many dimensions. Modern embedding models typically use 384 to 1536 dimensions.
How Do We Get Vectors? (Embeddings)
Vectors are created by embedding models—neural networks trained to convert data into meaningful numerical representations.
Here's what happens when you embed text:
// Using OpenAI's embedding model
const response = await openai.embeddings.create({
model: "text-embedding-3-small",
input: "I love programming"
});
const vector = response.data[0].embedding;
// Returns: [0.0231, -0.0192, 0.0847, ...] (1536 numbers)
The brilliant part? Similar meanings produce similar vectors.
"I love programming" → [0.023, -0.019, 0.084, ...]
"I enjoy coding" → [0.025, -0.017, 0.081, ...] // Very similar!
"I hate vegetables" → [-0.156, 0.234, -0.089, ...] // Very different!
This works for:
- Text: Words, sentences, documents
- Images: Photos, diagrams, artwork
- Audio: Music, speech, sounds
- Code: Functions, programs, repositories
- Any data: As long as you have a model to embed it
How Vector Similarity Works
Once we have vectors, we need to measure how similar they are. The most common method is cosine similarity.
Cosine Similarity Explained
Imagine two arrows pointing from the origin. Cosine similarity measures the angle between them:
- Angle = 0°: Vectors point the same direction → Similarity = 1 (identical meaning)
- Angle = 90°: Vectors are perpendicular → Similarity = 0 (unrelated)
- Angle = 180°: Vectors point opposite directions → Similarity = -1 (opposite meaning)
Cosine Similarity = (A · B) / (|A| × |B|)
Where:
- A · B is the dot product (multiply corresponding elements, sum them up)
- |A| and |B| are the magnitudes (lengths) of each vector
Here's a simple example with 3D vectors:
function cosineSimilarity(a, b) {
let dotProduct = 0;
let magnitudeA = 0;
let magnitudeB = 0;
for (let i = 0; i < a.length; i++) {
dotProduct += a[i] * b[i];
magnitudeA += a[i] * a[i];
magnitudeB += b[i] * b[i];
}
return dotProduct / (Math.sqrt(magnitudeA) * Math.sqrt(magnitudeB));
}
// Example
const programming = [0.8, 0.6, 0.1];
const coding = [0.75, 0.65, 0.12];
const cooking = [0.1, 0.2, 0.95];
cosineSimilarity(programming, coding); // 0.997 - Very similar!
cosineSimilarity(programming, cooking); // 0.386 - Not similar
The Challenge: Finding Needles in Haystacks
Here's where it gets interesting. Calculating similarity between two vectors is fast. But what if you have millions of vectors?
1 million vectors × 1536 dimensions = 1.5 billion numbers to compare
Comparing your query against every single vector would be painfully slow. This is called the nearest neighbor search problem.
The Naive Approach (Too Slow)
// Don't do this with millions of vectors!
function findSimilar(query, allVectors, topK) {
const similarities = allVectors.map(v => ({
vector: v,
similarity: cosineSimilarity(query, v.embedding)
}));
return similarities
.sort((a, b) => b.similarity - a.similarity)
.slice(0, topK);
}
This is O(n) for every query—checking every single vector. With millions of vectors, that's unacceptable.
How Vector Databases Actually Work
Vector databases solve this with clever indexing algorithms that trade a tiny bit of accuracy for massive speed improvements.
HNSW: The Most Popular Algorithm
Hierarchical Navigable Small World (HNSW) is the most widely used indexing algorithm. Here's how it works:
Imagine organizing your vectors into a multi-layered graph:
Layer 2 (sparse): A -------- B -------- C
\ / \ /
Layer 1 (medium): D--E--F--G--H--I--J--K--L
\|/|\|/|\|/|\|/|\|/|
Layer 0 (dense): All vectors connected to nearby neighbors
Search process:
- Start at the top layer (very few nodes)
- Greedily move toward vectors more similar to your query
- Drop down to the next layer
- Repeat until you reach the bottom layer
- Return the best matches found
This is like using a map: first you find the right country, then the city, then the neighborhood, then the street.
Result: Instead of checking millions of vectors, you check maybe a few hundred. Queries that took seconds now take milliseconds.
Other Indexing Methods
| Algorithm | How It Works | Best For |
|---|---|---|
| HNSW | Multi-layer graph navigation | General purpose, high accuracy |
| IVF | Cluster vectors, search relevant clusters | Very large datasets |
| PQ | Compress vectors into smaller codes | Memory-constrained systems |
| LSH | Hash similar vectors to same buckets | Real-time applications |
Most vector databases use HNSW or a combination of these techniques.
Approximate vs. Exact Search
Here's an important tradeoff:
Exact search (brute force):
- ✅ Always finds the true nearest neighbors
- ❌ Slow with large datasets (O(n) per query)
Approximate search (HNSW, IVF, etc.):
- ✅ Fast even with billions of vectors
- ❌ Might miss some relevant results (typically 95-99% accuracy)
For most applications, approximate search is more than good enough. Finding 95 of the top 100 most similar items in 10ms beats finding all 100 in 10 seconds.
Real-World Architecture
Here's how vector databases fit into a typical AI application:
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ User Query │────▶│ Embedding Model │────▶│ Query Vector │
│ "How do I..." │ │ (OpenAI, etc.) │ │ [0.1, -0.3,...] │
└─────────────────┘ └─────────────────┘ └────────┬────────┘
│
▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Top K Results │◀────│ Vector Database │◀────│ Similarity │
│ + Metadata │ │ (Pinecone, │ │ Search │
└────────┬────────┘ │ Weaviate...) │ └─────────────────┘
│ └─────────────────┘
▼
┌─────────────────┐ ┌─────────────────┐
│ LLM (GPT-4, │────▶│ Final Answer │
│ Claude, etc.) │ │ to User │
└─────────────────┘ └─────────────────┘
This pattern is called RAG (Retrieval-Augmented Generation):
- Convert user's question to a vector
- Find similar content in your database
- Feed that content to an LLM as context
- LLM generates an answer using the retrieved information
Popular Vector Databases
Pinecone
- Fully managed, serverless
- Great developer experience
- Scales automatically
- Best for: Teams that want zero infrastructure management
Weaviate
- Open source with cloud option
- Built-in ML models for auto-vectorization
- GraphQL API
- Best for: Teams wanting flexibility and control
Qdrant
- Open source, written in Rust
- Very fast and memory-efficient
- Rich filtering capabilities
- Best for: Performance-critical applications
Milvus
- Open source, highly scalable
- Supports multiple index types
- Kubernetes-native
- Best for: Large-scale enterprise deployments
Chroma
- Open source, Python-native
- Simple API, easy to get started
- Good for local development
- Best for: Prototyping and small projects
pgvector
- PostgreSQL extension
- Use your existing Postgres database
- Familiar SQL interface (new to SQL? Start with our SQL Basics course)
- Best for: Teams already using PostgreSQL
When to Use Vector Databases
Use Vector Databases For:
- Semantic search: Find documents by meaning, not keywords
- RAG applications: Give LLMs access to your data
- Recommendation systems: "Users who liked X also liked Y"
- Image/audio search: Find similar media files
- Anomaly detection: Find outliers in high-dimensional data
- Duplicate detection: Find near-duplicate content
- Question answering: Find relevant context for questions
Stick With Traditional Databases For:
- Exact lookups: Find user by ID, order by number
- Transactional data: Banking, inventory, orders
- Relational queries: JOINs, aggregations, reports
- Structured filtering: WHERE category = 'X' AND price < 100
Use Both Together:
Many applications use vector databases alongside traditional databases:
// 1. Semantic search with vector database
const relevantDocs = await vectorDB.query({
vector: queryEmbedding,
topK: 10
});
// 2. Filter results with traditional database
const finalResults = await sql`
SELECT * FROM products
WHERE id IN (${relevantDocs.map(d => d.id)})
AND in_stock = true
AND price < ${maxPrice}
`;
Getting Started: A Simple Example
Here's a complete example using Node.js and Pinecone:
import { Pinecone } from '@pinecone-database/pinecone';
import OpenAI from 'openai';
// Initialize clients
const pinecone = new Pinecone({ apiKey: process.env.PINECONE_API_KEY });
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
// Get the index
const index = pinecone.index('my-knowledge-base');
// Function to create embeddings
async function embed(text) {
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: text
});
return response.data[0].embedding;
}
// Store a document
async function storeDocument(id, text, metadata = {}) {
const embedding = await embed(text);
await index.upsert([{
id,
values: embedding,
metadata: { text, ...metadata }
}]);
}
// Search for similar documents
async function search(query, topK = 5) {
const queryEmbedding = await embed(query);
const results = await index.query({
vector: queryEmbedding,
topK,
includeMetadata: true
});
return results.matches.map(match => ({
text: match.metadata.text,
score: match.score
}));
}
// Example usage
await storeDocument('doc1', 'Python is great for machine learning');
await storeDocument('doc2', 'JavaScript powers the modern web');
await storeDocument('doc3', 'Neural networks learn patterns from data');
const results = await search('AI and deep learning');
// Returns doc3 and doc1 (semantically related to AI)
Key Takeaways
- Vector databases store meaning as numbers (vectors/embeddings)
- Embedding models convert text, images, etc. into vectors
- Similar meanings = similar vectors (measured by cosine similarity)
- HNSW and other algorithms make search fast (approximate nearest neighbor)
- Use vector databases for semantic search, recommendations, and RAG
- Use traditional databases for exact queries and transactions
- Most AI applications use both together
Vector databases aren't replacing SQL—they're complementing it by adding semantic understanding to your data infrastructure.
Next Steps
Ready to dive deeper? Here are some resources:
- Try Pinecone's free tier for a managed experience
- Experiment with Chroma locally
- Add pgvector to your existing PostgreSQL
- Learn about RAG patterns for AI applications
The best way to understand vector databases is to build something with them. Start with a simple semantic search over your own documents, and expand from there.

