Module 4: The Vector Database Landscape

Comparing Pinecone, Weaviate, Chroma, pgvector, and Qdrant

Introduction

The vector database market has exploded. Choosing the right one can be overwhelming. This module provides an honest comparison to help you make an informed decision.

By the end of this module, you'll understand:

The major players and their positioning
Key differentiators between options
When to choose each database
The trade-offs you'll face

4.1 The Major Players

Purpose-Built Vector Databases

Pinecone

Fully managed, serverless
Focus: Simplicity and production reliability
Pricing: Pay per usage

Qdrant

Open source with managed cloud option
Focus: Performance and advanced features
Pricing: Self-hosted free, cloud pay-per-use

Weaviate

Open source with managed cloud
Focus: ML integration and GraphQL interface
Pricing: Self-hosted free, cloud tiered

Chroma

Open source, embedded-first
Focus: Developer experience and simplicity
Pricing: Free (self-hosted only currently)

Database Extensions

pgvector (PostgreSQL)

Extension for PostgreSQL
Focus: Integration with existing Postgres
Pricing: Free (part of your Postgres costs)

Atlas Vector Search (MongoDB)

Native MongoDB feature
Focus: Existing MongoDB users
Pricing: Included with Atlas

4.2 Feature Comparison

Feature	Pinecone	Qdrant	Weaviate	Chroma	pgvector
Hosting	Managed only	Both	Both	Self-hosted	Self-hosted
Open Source	No	Yes	Yes	Yes	Yes
Max Dimensions	20,000	Unlimited	Unlimited	Unlimited	2,000
Metadata Filtering	Yes	Advanced	Yes	Yes	SQL
Hybrid Search	Yes	Yes	Yes	Limited	With extensions
Multi-tenancy	Namespaces	Collections	Tenants	Collections	Schemas
Replication	Auto	Manual	Auto	N/A	Postgres
API Style	REST/gRPC	REST/gRPC	GraphQL/REST	Python/JS	SQL

Index Types

Database	Supported Indexes
Pinecone	Proprietary (HNSW-based)
Qdrant	HNSW
Weaviate	HNSW
Chroma	HNSW
pgvector	IVFFlat, HNSW

4.3 Deep Dive: Each Database

Pinecone

Strengths:

Zero infrastructure management
Excellent developer experience
Strong reliability and uptime
Simple, clean API
Good documentation

Weaknesses:

No self-hosted option
Vendor lock-in
Can be expensive at scale
Limited customization

Best for:

Teams that want managed simplicity
Production applications needing reliability
Startups moving fast

import { Pinecone } from '@pinecone-database/pinecone'

const pinecone = new Pinecone()
const index = pinecone.index('my-index')

// Insert
await index.upsert([{
  id: 'doc-1',
  values: embedding,
  metadata: { category: 'tech' }
}])

// Query
const results = await index.query({
  vector: queryEmbedding,
  topK: 10,
  filter: { category: 'tech' }
})

Qdrant

Strengths:

Excellent performance
Rich filtering capabilities
Payload (metadata) search
Active development
Strong Rust performance

Weaknesses:

Smaller community than some alternatives
Cloud offering relatively new
Learning curve for advanced features

Best for:

Performance-critical applications
Complex filtering requirements
Teams comfortable with self-hosting

import { QdrantClient } from '@qdrant/js-client-rest'

const client = new QdrantClient({ url: 'http://localhost:6333' })

// Insert
await client.upsert('my-collection', {
  points: [{
    id: 'doc-1',
    vector: embedding,
    payload: { category: 'tech' }
  }]
})

// Query
const results = await client.search('my-collection', {
  vector: queryEmbedding,
  limit: 10,
  filter: {
    must: [{ key: 'category', match: { value: 'tech' } }]
  }
})

Weaviate

Strengths:

Built-in vectorization (optional)
GraphQL interface
Module ecosystem
Strong ML integration
Multi-modal support

Weaknesses:

GraphQL can be complex
Resource-heavy
Steeper learning curve

Best for:

Teams using GraphQL
Applications needing built-in ML
Multi-modal search (text, images)

# Weaviate GraphQL query
{
  Get {
    Document(
      nearVector: {
        vector: [0.1, 0.2, ...]
      }
      where: {
        path: ["category"]
        operator: Equal
        valueString: "tech"
      }
      limit: 10
    ) {
      title
      content
      _additional { certainty }
    }
  }
}

Chroma

Strengths:

Incredibly simple to start
Great for local development
Python and JavaScript clients
Embedded mode (no server)
Active community

Weaknesses:

Limited production features
No managed cloud (yet)
Basic compared to alternatives
Scaling limitations

Best for:

Local development and prototyping
Small to medium datasets
Learning vector databases
Simple use cases

import { ChromaClient } from 'chromadb'

const client = new ChromaClient()
const collection = await client.getOrCreateCollection({
  name: 'my-collection'
})

// Insert
await collection.add({
  ids: ['doc-1'],
  embeddings: [embedding],
  metadatas: [{ category: 'tech' }]
})

// Query
const results = await collection.query({
  queryEmbeddings: [queryEmbedding],
  nResults: 10,
  where: { category: 'tech' }
})

pgvector

Strengths:

Use existing PostgreSQL skills
Single database for everything
ACID transactions
Familiar SQL interface
Rich ecosystem

Weaknesses:

Not optimized purely for vectors
Scaling requires Postgres expertise
Limited to 2,000 dimensions
Performance ceiling at scale

Best for:

Existing PostgreSQL users
Applications needing transactions
Moderate scale (< 10M vectors)
Hybrid SQL + vector queries

-- Create table with vector column
CREATE TABLE documents (
  id SERIAL PRIMARY KEY,
  content TEXT,
  embedding vector(1536),
  category TEXT
);

-- Create index
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops);

-- Query
SELECT id, content,
       1 - (embedding <=> query_embedding) as similarity
FROM documents
WHERE category = 'tech'
ORDER BY embedding <=> query_embedding
LIMIT 10;

4.4 Decision Framework

Quick Decision Guide

Need managed, production-ready, minimal ops?
  → Pinecone

Need maximum performance, complex filtering?
  → Qdrant

Need GraphQL, built-in ML, multi-modal?
  → Weaviate

Need simple local dev, prototyping?
  → Chroma

Already using PostgreSQL, moderate scale?
  → pgvector

Detailed Decision Matrix

If you...	Consider
Want zero infrastructure	Pinecone
Need open source	Qdrant, Weaviate, Chroma, pgvector
Have limited budget	Self-hosted options
Need complex metadata filters	Qdrant
Want SQL interface	pgvector
Are prototyping	Chroma
Need multi-tenancy	Pinecone, Weaviate
Value simplicity over features	Pinecone, Chroma
Need 10M+ vectors	Pinecone, Qdrant

4.5 Migration Considerations

Switching Costs

Moving between vector databases involves:

Re-indexing: You can transfer vector data, but indexes must rebuild
Code changes: Different APIs require code updates
Feature parity: Some features may not exist in the new system

Portability Tips

Abstract your vector DB client

interface VectorStore {
  upsert(id: string, vector: number[], metadata: object): Promise<void>
  query(vector: number[], topK: number, filter?: object): Promise<Result[]>
  delete(id: string): Promise<void>
}

Store embeddings separately
- Keep raw embeddings in blob storage
- Allows re-indexing without re-computing
Use consistent metadata schema
- Same field names across databases
- Simplifies migration scripts

Key Takeaways

No single best choice—it depends on your requirements
Managed vs. self-hosted is often the biggest decision
Start simple with Chroma or pgvector, scale up as needed
Abstract your integration to enable future migration
Evaluate on your data—benchmarks vary by use case

Exercise: Evaluate for Your Use Case

Answer these questions about your project:

What's your expected data size (number of vectors)?
Do you need managed infrastructure or can you self-host?
What's your latency requirement?
Do you need complex metadata filtering?
What's your budget?
Are you already using PostgreSQL or MongoDB?

Based on your answers, which database would you choose?

Next up: Module 5 - Setting Up Pinecone

Module 4: The Vector Database Landscape

Comparing Pinecone, Weaviate, Chroma, pgvector, and Qdrant

Introduction

The vector database market has exploded. Choosing the right one can be overwhelming. This module provides an honest comparison to help you make an informed decision.

By the end of this module, you'll understand:

The major players and their positioning
Key differentiators between options
When to choose each database
The trade-offs you'll face

4.1 The Major Players

Purpose-Built Vector Databases

Pinecone

Fully managed, serverless
Focus: Simplicity and production reliability
Pricing: Pay per usage

Qdrant

Open source with managed cloud option
Focus: Performance and advanced features
Pricing: Self-hosted free, cloud pay-per-use

Weaviate

Open source with managed cloud
Focus: ML integration and GraphQL interface
Pricing: Self-hosted free, cloud tiered

Chroma

Open source, embedded-first
Focus: Developer experience and simplicity
Pricing: Free (self-hosted only currently)

Database Extensions

pgvector (PostgreSQL)

Extension for PostgreSQL
Focus: Integration with existing Postgres
Pricing: Free (part of your Postgres costs)

Atlas Vector Search (MongoDB)

Native MongoDB feature
Focus: Existing MongoDB users
Pricing: Included with Atlas

4.2 Feature Comparison

Feature	Pinecone	Qdrant	Weaviate	Chroma	pgvector
Hosting	Managed only	Both	Both	Self-hosted	Self-hosted
Open Source	No	Yes	Yes	Yes	Yes
Max Dimensions	20,000	Unlimited	Unlimited	Unlimited	2,000
Metadata Filtering	Yes	Advanced	Yes	Yes	SQL
Hybrid Search	Yes	Yes	Yes	Limited	With extensions
Multi-tenancy	Namespaces	Collections	Tenants	Collections	Schemas
Replication	Auto	Manual	Auto	N/A	Postgres
API Style	REST/gRPC	REST/gRPC	GraphQL/REST	Python/JS	SQL

Index Types

Database	Supported Indexes
Pinecone	Proprietary (HNSW-based)
Qdrant	HNSW
Weaviate	HNSW
Chroma	HNSW
pgvector	IVFFlat, HNSW

4.3 Deep Dive: Each Database

Pinecone

Strengths:

Zero infrastructure management
Excellent developer experience
Strong reliability and uptime
Simple, clean API
Good documentation

Weaknesses:

No self-hosted option
Vendor lock-in
Can be expensive at scale
Limited customization

Best for:

Teams that want managed simplicity
Production applications needing reliability
Startups moving fast

import { Pinecone } from '@pinecone-database/pinecone'

const pinecone = new Pinecone()
const index = pinecone.index('my-index')

// Insert
await index.upsert([{
  id: 'doc-1',
  values: embedding,
  metadata: { category: 'tech' }
}])

// Query
const results = await index.query({
  vector: queryEmbedding,
  topK: 10,
  filter: { category: 'tech' }
})

Qdrant

Strengths:

Excellent performance
Rich filtering capabilities
Payload (metadata) search
Active development
Strong Rust performance

Weaknesses:

Smaller community than some alternatives
Cloud offering relatively new
Learning curve for advanced features

Best for:

Performance-critical applications
Complex filtering requirements
Teams comfortable with self-hosting

import { QdrantClient } from '@qdrant/js-client-rest'

const client = new QdrantClient({ url: 'http://localhost:6333' })

// Insert
await client.upsert('my-collection', {
  points: [{
    id: 'doc-1',
    vector: embedding,
    payload: { category: 'tech' }
  }]
})

// Query
const results = await client.search('my-collection', {
  vector: queryEmbedding,
  limit: 10,
  filter: {
    must: [{ key: 'category', match: { value: 'tech' } }]
  }
})

Weaviate

Strengths:

Built-in vectorization (optional)
GraphQL interface
Module ecosystem
Strong ML integration
Multi-modal support

Weaknesses:

GraphQL can be complex
Resource-heavy
Steeper learning curve

Best for:

Teams using GraphQL
Applications needing built-in ML
Multi-modal search (text, images)

# Weaviate GraphQL query
{
  Get {
    Document(
      nearVector: {
        vector: [0.1, 0.2, ...]
      }
      where: {
        path: ["category"]
        operator: Equal
        valueString: "tech"
      }
      limit: 10
    ) {
      title
      content
      _additional { certainty }
    }
  }
}

Chroma

Strengths:

Incredibly simple to start
Great for local development
Python and JavaScript clients
Embedded mode (no server)
Active community

Weaknesses:

Limited production features
No managed cloud (yet)
Basic compared to alternatives
Scaling limitations

Best for:

Local development and prototyping
Small to medium datasets
Learning vector databases
Simple use cases

import { ChromaClient } from 'chromadb'

const client = new ChromaClient()
const collection = await client.getOrCreateCollection({
  name: 'my-collection'
})

// Insert
await collection.add({
  ids: ['doc-1'],
  embeddings: [embedding],
  metadatas: [{ category: 'tech' }]
})

// Query
const results = await collection.query({
  queryEmbeddings: [queryEmbedding],
  nResults: 10,
  where: { category: 'tech' }
})

pgvector

Strengths:

Use existing PostgreSQL skills
Single database for everything
ACID transactions
Familiar SQL interface
Rich ecosystem

Weaknesses:

Not optimized purely for vectors
Scaling requires Postgres expertise
Limited to 2,000 dimensions
Performance ceiling at scale

Best for:

Existing PostgreSQL users
Applications needing transactions
Moderate scale (< 10M vectors)
Hybrid SQL + vector queries

-- Create table with vector column
CREATE TABLE documents (
  id SERIAL PRIMARY KEY,
  content TEXT,
  embedding vector(1536),
  category TEXT
);

-- Create index
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops);

-- Query
SELECT id, content,
       1 - (embedding <=> query_embedding) as similarity
FROM documents
WHERE category = 'tech'
ORDER BY embedding <=> query_embedding
LIMIT 10;

4.4 Decision Framework

Quick Decision Guide

Need managed, production-ready, minimal ops?
  → Pinecone

Need maximum performance, complex filtering?
  → Qdrant

Need GraphQL, built-in ML, multi-modal?
  → Weaviate

Need simple local dev, prototyping?
  → Chroma

Already using PostgreSQL, moderate scale?
  → pgvector

Detailed Decision Matrix

If you...	Consider
Want zero infrastructure	Pinecone
Need open source	Qdrant, Weaviate, Chroma, pgvector
Have limited budget	Self-hosted options
Need complex metadata filters	Qdrant
Want SQL interface	pgvector
Are prototyping	Chroma
Need multi-tenancy	Pinecone, Weaviate
Value simplicity over features	Pinecone, Chroma
Need 10M+ vectors	Pinecone, Qdrant

4.5 Migration Considerations

Switching Costs

Moving between vector databases involves:

Re-indexing: You can transfer vector data, but indexes must rebuild
Code changes: Different APIs require code updates
Feature parity: Some features may not exist in the new system

Portability Tips

Abstract your vector DB client

interface VectorStore {
  upsert(id: string, vector: number[], metadata: object): Promise<void>
  query(vector: number[], topK: number, filter?: object): Promise<Result[]>
  delete(id: string): Promise<void>
}

Store embeddings separately
- Keep raw embeddings in blob storage
- Allows re-indexing without re-computing
Use consistent metadata schema
- Same field names across databases
- Simplifies migration scripts

Key Takeaways

No single best choice—it depends on your requirements
Managed vs. self-hosted is often the biggest decision
Start simple with Chroma or pgvector, scale up as needed
Abstract your integration to enable future migration
Evaluate on your data—benchmarks vary by use case

Exercise: Evaluate for Your Use Case

Answer these questions about your project:

What's your expected data size (number of vectors)?
Do you need managed infrastructure or can you self-host?
What's your latency requirement?
Do you need complex metadata filtering?
What's your budget?
Are you already using PostgreSQL or MongoDB?

Based on your answers, which database would you choose?

Next up: Module 5 - Setting Up Pinecone