•

What Are Vector Databases? A Complete Guide to How They Work

November 29, 2025•12 minutes

If you've been following AI developments, you've probably heard about vector databases. They're powering everything from ChatGPT's memory to semantic search engines to recommendation systems.

But what exactly are they? How do they work? And why can't we just use regular databases?

Let's break it down from first principles.

The Problem: Computers Don't Understand Meaning

Traditional databases are great at exact matches. Want to find all users named "John"? Easy. Find all orders over $100? No problem.

But what if you want to find:

Documents similar to a given topic
Images that look like another image
Products that customers might also like
Text that means the same thing as a question

These are semantic queries—they're about meaning, not exact values. And traditional databases can't handle them.

-- This works perfectly in SQL
SELECT * FROM products WHERE category = 'electronics';

-- But this is impossible
SELECT * FROM products WHERE meaning SIMILAR TO 'something to listen to music';

The second query should return headphones, speakers, earbuds, MP3 players—but SQL has no concept of "meaning."

The Solution: Turn Meaning Into Numbers

Here's the key insight that makes vector databases possible:

We can represent the meaning of anything as a list of numbers.

This list of numbers is called a vector (or embedding). And once meaning is represented as numbers, we can do math on it.

What Is a Vector?

A vector is simply an ordered list of numbers. Think of it as coordinates in space:

2D vector: [3, 4]           → A point on a flat plane
3D vector: [1, 2, 3]        → A point in 3D space
768D vector: [0.1, -0.3, 0.8, ...] → A point in 768-dimensional space

The magic happens when we use many dimensions. Modern embedding models typically use 384 to 1536 dimensions.

How Do We Get Vectors? (Embeddings)

Vectors are created by embedding models—neural networks trained to convert data into meaningful numerical representations.

Here's what happens when you embed text:

// Using OpenAI's embedding model
const response = await openai.embeddings.create({
  model: "text-embedding-3-small",
  input: "I love programming"
});

const vector = response.data[0].embedding;
// Returns: [0.0231, -0.0192, 0.0847, ...] (1536 numbers)

The brilliant part? Similar meanings produce similar vectors.

"I love programming"     → [0.023, -0.019, 0.084, ...]
"I enjoy coding"         → [0.025, -0.017, 0.081, ...]  // Very similar!
"I hate vegetables"      → [-0.156, 0.234, -0.089, ...] // Very different!

This works for:

Text: Words, sentences, documents
Images: Photos, diagrams, artwork
Audio: Music, speech, sounds
Code: Functions, programs, repositories
Any data: As long as you have a model to embed it

How Vector Similarity Works

Once we have vectors, we need to measure how similar they are. The most common method is cosine similarity.

Cosine Similarity Explained

Imagine two arrows pointing from the origin. Cosine similarity measures the angle between them:

Angle = 0°: Vectors point the same direction → Similarity = 1 (identical meaning)
Angle = 90°: Vectors are perpendicular → Similarity = 0 (unrelated)
Angle = 180°: Vectors point opposite directions → Similarity = -1 (opposite meaning)

Cosine Similarity = (A · B) / (|A| × |B|)

Where:
- A · B is the dot product (multiply corresponding elements, sum them up)
- |A| and |B| are the magnitudes (lengths) of each vector

Here's a simple example with 3D vectors:

function cosineSimilarity(a, b) {
  let dotProduct = 0;
  let magnitudeA = 0;
  let magnitudeB = 0;

  for (let i = 0; i < a.length; i++) {
    dotProduct += a[i] * b[i];
    magnitudeA += a[i] * a[i];
    magnitudeB += b[i] * b[i];
  }

  return dotProduct / (Math.sqrt(magnitudeA) * Math.sqrt(magnitudeB));
}

// Example
const programming = [0.8, 0.6, 0.1];
const coding = [0.75, 0.65, 0.12];
const cooking = [0.1, 0.2, 0.95];

cosineSimilarity(programming, coding);  // 0.997 - Very similar!
cosineSimilarity(programming, cooking); // 0.386 - Not similar

The Challenge: Finding Needles in Haystacks

Here's where it gets interesting. Calculating similarity between two vectors is fast. But what if you have millions of vectors?

1 million vectors × 1536 dimensions = 1.5 billion numbers to compare

Comparing your query against every single vector would be painfully slow. This is called the nearest neighbor search problem.

The Naive Approach (Too Slow)

// Don't do this with millions of vectors!
function findSimilar(query, allVectors, topK) {
  const similarities = allVectors.map(v => ({
    vector: v,
    similarity: cosineSimilarity(query, v.embedding)
  }));

  return similarities
    .sort((a, b) => b.similarity - a.similarity)
    .slice(0, topK);
}

This is O(n) for every query—checking every single vector. With millions of vectors, that's unacceptable.

How Vector Databases Actually Work

Vector databases solve this with clever indexing algorithms that trade a tiny bit of accuracy for massive speed improvements.

HNSW: The Most Popular Algorithm

Hierarchical Navigable Small World (HNSW) is the most widely used indexing algorithm. Here's how it works:

Imagine organizing your vectors into a multi-layered graph:

Layer 2 (sparse):     A -------- B -------- C
                       \        / \        /
Layer 1 (medium):    D--E--F--G--H--I--J--K--L
                      \|/|\|/|\|/|\|/|\|/|
Layer 0 (dense):     All vectors connected to nearby neighbors

Search process:

Start at the top layer (very few nodes)
Greedily move toward vectors more similar to your query
Drop down to the next layer
Repeat until you reach the bottom layer
Return the best matches found

This is like using a map: first you find the right country, then the city, then the neighborhood, then the street.

Result: Instead of checking millions of vectors, you check maybe a few hundred. Queries that took seconds now take milliseconds.

Other Indexing Methods

Algorithm	How It Works	Best For
HNSW	Multi-layer graph navigation	General purpose, high accuracy
IVF	Cluster vectors, search relevant clusters	Very large datasets
PQ	Compress vectors into smaller codes	Memory-constrained systems
LSH	Hash similar vectors to same buckets	Real-time applications

Most vector databases use HNSW or a combination of these techniques.

Approximate vs. Exact Search

Here's an important tradeoff:

Exact search (brute force):

✅ Always finds the true nearest neighbors
❌ Slow with large datasets (O(n) per query)

Approximate search (HNSW, IVF, etc.):

✅ Fast even with billions of vectors
❌ Might miss some relevant results (typically 95-99% accuracy)

For most applications, approximate search is more than good enough. Finding 95 of the top 100 most similar items in 10ms beats finding all 100 in 10 seconds.

Real-World Architecture

Here's how vector databases fit into a typical AI application:

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│   User Query    │────▶│ Embedding Model │────▶│  Query Vector   │
│ "How do I..."   │     │  (OpenAI, etc.) │     │ [0.1, -0.3,...] │
└─────────────────┘     └─────────────────┘     └────────┬────────┘
                                                         │
                                                         ▼
┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│  Top K Results  │◀────│ Vector Database │◀────│ Similarity      │
│  + Metadata     │     │  (Pinecone,     │     │ Search          │
└────────┬────────┘     │   Weaviate...)  │     └─────────────────┘
         │              └─────────────────┘
         ▼
┌─────────────────┐     ┌─────────────────┐
│   LLM (GPT-4,   │────▶│  Final Answer   │
│   Claude, etc.) │     │  to User        │
└─────────────────┘     └─────────────────┘

This pattern is called RAG (Retrieval-Augmented Generation):

Convert user's question to a vector
Find similar content in your database
Feed that content to an LLM as context
LLM generates an answer using the retrieved information

Popular Vector Databases

Pinecone

Fully managed, serverless
Great developer experience
Scales automatically
Best for: Teams that want zero infrastructure management

Weaviate

Open source with cloud option
Built-in ML models for auto-vectorization
GraphQL API
Best for: Teams wanting flexibility and control

Qdrant

Open source, written in Rust
Very fast and memory-efficient
Rich filtering capabilities
Best for: Performance-critical applications

Milvus

Open source, highly scalable
Supports multiple index types
Kubernetes-native
Best for: Large-scale enterprise deployments

Chroma

Open source, Python-native
Simple API, easy to get started
Good for local development
Best for: Prototyping and small projects

pgvector

PostgreSQL extension
Use your existing Postgres database
Familiar SQL interface (new to SQL? Start with our SQL Basics course)
Best for: Teams already using PostgreSQL

When to Use Vector Databases

Use Vector Databases For:

Semantic search: Find documents by meaning, not keywords
RAG applications: Give LLMs access to your data
Recommendation systems: "Users who liked X also liked Y"
Image/audio search: Find similar media files
Anomaly detection: Find outliers in high-dimensional data
Duplicate detection: Find near-duplicate content
Question answering: Find relevant context for questions

Stick With Traditional Databases For:

Exact lookups: Find user by ID, order by number
Transactional data: Banking, inventory, orders
Relational queries: JOINs, aggregations, reports
Structured filtering: WHERE category = 'X' AND price < 100

Use Both Together:

Many applications use vector databases alongside traditional databases:

// 1. Semantic search with vector database
const relevantDocs = await vectorDB.query({
  vector: queryEmbedding,
  topK: 10
});

// 2. Filter results with traditional database
const finalResults = await sql`
  SELECT * FROM products
  WHERE id IN (${relevantDocs.map(d => d.id)})
  AND in_stock = true
  AND price < ${maxPrice}
`;

Getting Started: A Simple Example

Here's a complete example using Node.js and Pinecone:

import { Pinecone } from '@pinecone-database/pinecone';
import OpenAI from 'openai';

// Initialize clients
const pinecone = new Pinecone({ apiKey: process.env.PINECONE_API_KEY });
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

// Get the index
const index = pinecone.index('my-knowledge-base');

// Function to create embeddings
async function embed(text) {
  const response = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: text
  });
  return response.data[0].embedding;
}

// Store a document
async function storeDocument(id, text, metadata = {}) {
  const embedding = await embed(text);

  await index.upsert([{
    id,
    values: embedding,
    metadata: { text, ...metadata }
  }]);
}

// Search for similar documents
async function search(query, topK = 5) {
  const queryEmbedding = await embed(query);

  const results = await index.query({
    vector: queryEmbedding,
    topK,
    includeMetadata: true
  });

  return results.matches.map(match => ({
    text: match.metadata.text,
    score: match.score
  }));
}

// Example usage
await storeDocument('doc1', 'Python is great for machine learning');
await storeDocument('doc2', 'JavaScript powers the modern web');
await storeDocument('doc3', 'Neural networks learn patterns from data');

const results = await search('AI and deep learning');
// Returns doc3 and doc1 (semantically related to AI)

Key Takeaways

Vector databases store meaning as numbers (vectors/embeddings)
Embedding models convert text, images, etc. into vectors
Similar meanings = similar vectors (measured by cosine similarity)
HNSW and other algorithms make search fast (approximate nearest neighbor)
Use vector databases for semantic search, recommendations, and RAG
Use traditional databases for exact queries and transactions
Most AI applications use both together

Vector databases aren't replacing SQL—they're complementing it by adding semantic understanding to your data infrastructure.

Next Steps

Ready to dive deeper? Here are some resources:

Try Pinecone's free tier for a managed experience
Experiment with Chroma locally
Add pgvector to your existing PostgreSQL
Learn about RAG patterns for AI applications

The best way to understand vector databases is to build something with them. Start with a simple semantic search over your own documents, and expand from there.

What Are Vector Databases? A Complete Guide to How They Work

November 29, 2025•12 minutes

If you've been following AI developments, you've probably heard about vector databases. They're powering everything from ChatGPT's memory to semantic search engines to recommendation systems.

But what exactly are they? How do they work? And why can't we just use regular databases?

Let's break it down from first principles.

The Problem: Computers Don't Understand Meaning

Traditional databases are great at exact matches. Want to find all users named "John"? Easy. Find all orders over $100? No problem.

But what if you want to find:

Documents similar to a given topic
Images that look like another image
Products that customers might also like
Text that means the same thing as a question

These are semantic queries—they're about meaning, not exact values. And traditional databases can't handle them.

-- This works perfectly in SQL
SELECT * FROM products WHERE category = 'electronics';

-- But this is impossible
SELECT * FROM products WHERE meaning SIMILAR TO 'something to listen to music';

The second query should return headphones, speakers, earbuds, MP3 players—but SQL has no concept of "meaning."

The Solution: Turn Meaning Into Numbers

Here's the key insight that makes vector databases possible:

We can represent the meaning of anything as a list of numbers.

This list of numbers is called a vector (or embedding). And once meaning is represented as numbers, we can do math on it.

What Is a Vector?

A vector is simply an ordered list of numbers. Think of it as coordinates in space:

2D vector: [3, 4]           → A point on a flat plane
3D vector: [1, 2, 3]        → A point in 3D space
768D vector: [0.1, -0.3, 0.8, ...] → A point in 768-dimensional space

The magic happens when we use many dimensions. Modern embedding models typically use 384 to 1536 dimensions.

How Do We Get Vectors? (Embeddings)

Vectors are created by embedding models—neural networks trained to convert data into meaningful numerical representations.

Here's what happens when you embed text:

// Using OpenAI's embedding model
const response = await openai.embeddings.create({
  model: "text-embedding-3-small",
  input: "I love programming"
});

const vector = response.data[0].embedding;
// Returns: [0.0231, -0.0192, 0.0847, ...] (1536 numbers)

The brilliant part? Similar meanings produce similar vectors.

"I love programming"     → [0.023, -0.019, 0.084, ...]
"I enjoy coding"         → [0.025, -0.017, 0.081, ...]  // Very similar!
"I hate vegetables"      → [-0.156, 0.234, -0.089, ...] // Very different!

This works for:

Text: Words, sentences, documents
Images: Photos, diagrams, artwork
Audio: Music, speech, sounds
Code: Functions, programs, repositories
Any data: As long as you have a model to embed it

How Vector Similarity Works

Once we have vectors, we need to measure how similar they are. The most common method is cosine similarity.

Cosine Similarity Explained

Imagine two arrows pointing from the origin. Cosine similarity measures the angle between them:

Angle = 0°: Vectors point the same direction → Similarity = 1 (identical meaning)
Angle = 90°: Vectors are perpendicular → Similarity = 0 (unrelated)
Angle = 180°: Vectors point opposite directions → Similarity = -1 (opposite meaning)

Cosine Similarity = (A · B) / (|A| × |B|)

Where:
- A · B is the dot product (multiply corresponding elements, sum them up)
- |A| and |B| are the magnitudes (lengths) of each vector

Here's a simple example with 3D vectors:

function cosineSimilarity(a, b) {
  let dotProduct = 0;
  let magnitudeA = 0;
  let magnitudeB = 0;

  for (let i = 0; i < a.length; i++) {
    dotProduct += a[i] * b[i];
    magnitudeA += a[i] * a[i];
    magnitudeB += b[i] * b[i];
  }

  return dotProduct / (Math.sqrt(magnitudeA) * Math.sqrt(magnitudeB));
}

// Example
const programming = [0.8, 0.6, 0.1];
const coding = [0.75, 0.65, 0.12];
const cooking = [0.1, 0.2, 0.95];

cosineSimilarity(programming, coding);  // 0.997 - Very similar!
cosineSimilarity(programming, cooking); // 0.386 - Not similar

The Challenge: Finding Needles in Haystacks

Here's where it gets interesting. Calculating similarity between two vectors is fast. But what if you have millions of vectors?

1 million vectors × 1536 dimensions = 1.5 billion numbers to compare

Comparing your query against every single vector would be painfully slow. This is called the nearest neighbor search problem.

The Naive Approach (Too Slow)

// Don't do this with millions of vectors!
function findSimilar(query, allVectors, topK) {
  const similarities = allVectors.map(v => ({
    vector: v,
    similarity: cosineSimilarity(query, v.embedding)
  }));

  return similarities
    .sort((a, b) => b.similarity - a.similarity)
    .slice(0, topK);
}

This is O(n) for every query—checking every single vector. With millions of vectors, that's unacceptable.

How Vector Databases Actually Work

Vector databases solve this with clever indexing algorithms that trade a tiny bit of accuracy for massive speed improvements.

HNSW: The Most Popular Algorithm

Hierarchical Navigable Small World (HNSW) is the most widely used indexing algorithm. Here's how it works:

Imagine organizing your vectors into a multi-layered graph:

Layer 2 (sparse):     A -------- B -------- C
                       \        / \        /
Layer 1 (medium):    D--E--F--G--H--I--J--K--L
                      \|/|\|/|\|/|\|/|\|/|
Layer 0 (dense):     All vectors connected to nearby neighbors

Search process:

Start at the top layer (very few nodes)
Greedily move toward vectors more similar to your query
Drop down to the next layer
Repeat until you reach the bottom layer
Return the best matches found

This is like using a map: first you find the right country, then the city, then the neighborhood, then the street.

Result: Instead of checking millions of vectors, you check maybe a few hundred. Queries that took seconds now take milliseconds.

Other Indexing Methods

Algorithm	How It Works	Best For
HNSW	Multi-layer graph navigation	General purpose, high accuracy
IVF	Cluster vectors, search relevant clusters	Very large datasets
PQ	Compress vectors into smaller codes	Memory-constrained systems
LSH	Hash similar vectors to same buckets	Real-time applications

Most vector databases use HNSW or a combination of these techniques.

Approximate vs. Exact Search

Here's an important tradeoff:

Exact search (brute force):

✅ Always finds the true nearest neighbors
❌ Slow with large datasets (O(n) per query)

Approximate search (HNSW, IVF, etc.):

✅ Fast even with billions of vectors
❌ Might miss some relevant results (typically 95-99% accuracy)

For most applications, approximate search is more than good enough. Finding 95 of the top 100 most similar items in 10ms beats finding all 100 in 10 seconds.

Real-World Architecture

Here's how vector databases fit into a typical AI application:

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│   User Query    │────▶│ Embedding Model │────▶│  Query Vector   │
│ "How do I..."   │     │  (OpenAI, etc.) │     │ [0.1, -0.3,...] │
└─────────────────┘     └─────────────────┘     └────────┬────────┘
                                                         │
                                                         ▼
┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│  Top K Results  │◀────│ Vector Database │◀────│ Similarity      │
│  + Metadata     │     │  (Pinecone,     │     │ Search          │
└────────┬────────┘     │   Weaviate...)  │     └─────────────────┘
         │              └─────────────────┘
         ▼
┌─────────────────┐     ┌─────────────────┐
│   LLM (GPT-4,   │────▶│  Final Answer   │
│   Claude, etc.) │     │  to User        │
└─────────────────┘     └─────────────────┘

This pattern is called RAG (Retrieval-Augmented Generation):

Convert user's question to a vector
Find similar content in your database
Feed that content to an LLM as context
LLM generates an answer using the retrieved information

Popular Vector Databases

Pinecone

Fully managed, serverless
Great developer experience
Scales automatically
Best for: Teams that want zero infrastructure management

Weaviate

Open source with cloud option
Built-in ML models for auto-vectorization
GraphQL API
Best for: Teams wanting flexibility and control

Qdrant

Open source, written in Rust
Very fast and memory-efficient
Rich filtering capabilities
Best for: Performance-critical applications

Milvus

Open source, highly scalable
Supports multiple index types
Kubernetes-native
Best for: Large-scale enterprise deployments

Chroma

Open source, Python-native
Simple API, easy to get started
Good for local development
Best for: Prototyping and small projects

pgvector

PostgreSQL extension
Use your existing Postgres database
Familiar SQL interface (new to SQL? Start with our SQL Basics course)
Best for: Teams already using PostgreSQL

When to Use Vector Databases

Use Vector Databases For:

Semantic search: Find documents by meaning, not keywords
RAG applications: Give LLMs access to your data
Recommendation systems: "Users who liked X also liked Y"
Image/audio search: Find similar media files
Anomaly detection: Find outliers in high-dimensional data
Duplicate detection: Find near-duplicate content
Question answering: Find relevant context for questions

Stick With Traditional Databases For:

Exact lookups: Find user by ID, order by number
Transactional data: Banking, inventory, orders
Relational queries: JOINs, aggregations, reports
Structured filtering: WHERE category = 'X' AND price < 100

Use Both Together:

Many applications use vector databases alongside traditional databases:

// 1. Semantic search with vector database
const relevantDocs = await vectorDB.query({
  vector: queryEmbedding,
  topK: 10
});

// 2. Filter results with traditional database
const finalResults = await sql`
  SELECT * FROM products
  WHERE id IN (${relevantDocs.map(d => d.id)})
  AND in_stock = true
  AND price < ${maxPrice}
`;

Getting Started: A Simple Example

Here's a complete example using Node.js and Pinecone:

import { Pinecone } from '@pinecone-database/pinecone';
import OpenAI from 'openai';

// Initialize clients
const pinecone = new Pinecone({ apiKey: process.env.PINECONE_API_KEY });
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

// Get the index
const index = pinecone.index('my-knowledge-base');

// Function to create embeddings
async function embed(text) {
  const response = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: text
  });
  return response.data[0].embedding;
}

// Store a document
async function storeDocument(id, text, metadata = {}) {
  const embedding = await embed(text);

  await index.upsert([{
    id,
    values: embedding,
    metadata: { text, ...metadata }
  }]);
}

// Search for similar documents
async function search(query, topK = 5) {
  const queryEmbedding = await embed(query);

  const results = await index.query({
    vector: queryEmbedding,
    topK,
    includeMetadata: true
  });

  return results.matches.map(match => ({
    text: match.metadata.text,
    score: match.score
  }));
}

// Example usage
await storeDocument('doc1', 'Python is great for machine learning');
await storeDocument('doc2', 'JavaScript powers the modern web');
await storeDocument('doc3', 'Neural networks learn patterns from data');

const results = await search('AI and deep learning');
// Returns doc3 and doc1 (semantically related to AI)

Key Takeaways

Vector databases store meaning as numbers (vectors/embeddings)
Embedding models convert text, images, etc. into vectors
Similar meanings = similar vectors (measured by cosine similarity)
HNSW and other algorithms make search fast (approximate nearest neighbor)
Use vector databases for semantic search, recommendations, and RAG
Use traditional databases for exact queries and transactions
Most AI applications use both together

Vector databases aren't replacing SQL—they're complementing it by adding semantic understanding to your data infrastructure.

Next Steps

Ready to dive deeper? Here are some resources:

Try Pinecone's free tier for a managed experience
Experiment with Chroma locally
Add pgvector to your existing PostgreSQL
Learn about RAG patterns for AI applications

The best way to understand vector databases is to build something with them. Start with a simple semantic search over your own documents, and expand from there.

The Problem: Computers Don't Understand Meaning

The Solution: Turn Meaning Into Numbers

What Is a Vector?

How Do We Get Vectors? (Embeddings)

How Vector Similarity Works

Cosine Similarity Explained

The Challenge: Finding Needles in Haystacks

The Naive Approach (Too Slow)

How Vector Databases Actually Work

HNSW: The Most Popular Algorithm

Other Indexing Methods

Approximate vs. Exact Search

Real-World Architecture

Popular Vector Databases

Pinecone

Weaviate

Qdrant

Milvus

Chroma

pgvector

When to Use Vector Databases

Use Vector Databases For:

Stick With Traditional Databases For:

Use Both Together:

Getting Started: A Simple Example

Key Takeaways

Next Steps

Tags

The Problem: Computers Don't Understand Meaning

The Solution: Turn Meaning Into Numbers

What Is a Vector?

How Do We Get Vectors? (Embeddings)

How Vector Similarity Works

Cosine Similarity Explained

The Challenge: Finding Needles in Haystacks

The Naive Approach (Too Slow)

How Vector Databases Actually Work

HNSW: The Most Popular Algorithm

Other Indexing Methods

Approximate vs. Exact Search

Real-World Architecture

Popular Vector Databases

Pinecone

Weaviate

Qdrant

Milvus

Chroma

pgvector

When to Use Vector Databases

Use Vector Databases For:

Stick With Traditional Databases For:

Use Both Together:

Getting Started: A Simple Example

Key Takeaways

Next Steps

Tags