Module 4: The Vector Database Landscape
Comparing Pinecone, Weaviate, Chroma, pgvector, and Qdrant
Introduction
The vector database market has exploded. Choosing the right one can be overwhelming. This module provides an honest comparison to help you make an informed decision.
By the end of this module, you'll understand:
- The major players and their positioning
- Key differentiators between options
- When to choose each database
- The trade-offs you'll face
4.1 The Major Players
Purpose-Built Vector Databases
Pinecone
- Fully managed, serverless
- Focus: Simplicity and production reliability
- Pricing: Pay per usage
Qdrant
- Open source with managed cloud option
- Focus: Performance and advanced features
- Pricing: Self-hosted free, cloud pay-per-use
Weaviate
- Open source with managed cloud
- Focus: ML integration and GraphQL interface
- Pricing: Self-hosted free, cloud tiered
Chroma
- Open source, embedded-first
- Focus: Developer experience and simplicity
- Pricing: Free (self-hosted only currently)
Database Extensions
pgvector (PostgreSQL)
- Extension for PostgreSQL
- Focus: Integration with existing Postgres
- Pricing: Free (part of your Postgres costs)
Atlas Vector Search (MongoDB)
- Native MongoDB feature
- Focus: Existing MongoDB users
- Pricing: Included with Atlas
4.2 Feature Comparison
| Feature | Pinecone | Qdrant | Weaviate | Chroma | pgvector |
|---|---|---|---|---|---|
| Hosting | Managed only | Both | Both | Self-hosted | Self-hosted |
| Open Source | No | Yes | Yes | Yes | Yes |
| Max Dimensions | 20,000 | Unlimited | Unlimited | Unlimited | 2,000 |
| Metadata Filtering | Yes | Advanced | Yes | Yes | SQL |
| Hybrid Search | Yes | Yes | Yes | Limited | With extensions |
| Multi-tenancy | Namespaces | Collections | Tenants | Collections | Schemas |
| Replication | Auto | Manual | Auto | N/A | Postgres |
| API Style | REST/gRPC | REST/gRPC | GraphQL/REST | Python/JS | SQL |
Index Types
| Database | Supported Indexes |
|---|---|
| Pinecone | Proprietary (HNSW-based) |
| Qdrant | HNSW |
| Weaviate | HNSW |
| Chroma | HNSW |
| pgvector | IVFFlat, HNSW |
4.3 Deep Dive: Each Database
Pinecone
Strengths:
- Zero infrastructure management
- Excellent developer experience
- Strong reliability and uptime
- Simple, clean API
- Good documentation
Weaknesses:
- No self-hosted option
- Vendor lock-in
- Can be expensive at scale
- Limited customization
Best for:
- Teams that want managed simplicity
- Production applications needing reliability
- Startups moving fast
import { Pinecone } from '@pinecone-database/pinecone'
const pinecone = new Pinecone()
const index = pinecone.index('my-index')
// Insert
await index.upsert([{
id: 'doc-1',
values: embedding,
metadata: { category: 'tech' }
}])
// Query
const results = await index.query({
vector: queryEmbedding,
topK: 10,
filter: { category: 'tech' }
})
Qdrant
Strengths:
- Excellent performance
- Rich filtering capabilities
- Payload (metadata) search
- Active development
- Strong Rust performance
Weaknesses:
- Smaller community than some alternatives
- Cloud offering relatively new
- Learning curve for advanced features
Best for:
- Performance-critical applications
- Complex filtering requirements
- Teams comfortable with self-hosting
import { QdrantClient } from '@qdrant/js-client-rest'
const client = new QdrantClient({ url: 'http://localhost:6333' })
// Insert
await client.upsert('my-collection', {
points: [{
id: 'doc-1',
vector: embedding,
payload: { category: 'tech' }
}]
})
// Query
const results = await client.search('my-collection', {
vector: queryEmbedding,
limit: 10,
filter: {
must: [{ key: 'category', match: { value: 'tech' } }]
}
})
Weaviate
Strengths:
- Built-in vectorization (optional)
- GraphQL interface
- Module ecosystem
- Strong ML integration
- Multi-modal support
Weaknesses:
- GraphQL can be complex
- Resource-heavy
- Steeper learning curve
Best for:
- Teams using GraphQL
- Applications needing built-in ML
- Multi-modal search (text, images)
# Weaviate GraphQL query
{
Get {
Document(
nearVector: {
vector: [0.1, 0.2, ...]
}
where: {
path: ["category"]
operator: Equal
valueString: "tech"
}
limit: 10
) {
title
content
_additional { certainty }
}
}
}
Chroma
Strengths:
- Incredibly simple to start
- Great for local development
- Python and JavaScript clients
- Embedded mode (no server)
- Active community
Weaknesses:
- Limited production features
- No managed cloud (yet)
- Basic compared to alternatives
- Scaling limitations
Best for:
- Local development and prototyping
- Small to medium datasets
- Learning vector databases
- Simple use cases
import { ChromaClient } from 'chromadb'
const client = new ChromaClient()
const collection = await client.getOrCreateCollection({
name: 'my-collection'
})
// Insert
await collection.add({
ids: ['doc-1'],
embeddings: [embedding],
metadatas: [{ category: 'tech' }]
})
// Query
const results = await collection.query({
queryEmbeddings: [queryEmbedding],
nResults: 10,
where: { category: 'tech' }
})
pgvector
Strengths:
- Use existing PostgreSQL skills
- Single database for everything
- ACID transactions
- Familiar SQL interface
- Rich ecosystem
Weaknesses:
- Not optimized purely for vectors
- Scaling requires Postgres expertise
- Limited to 2,000 dimensions
- Performance ceiling at scale
Best for:
- Existing PostgreSQL users
- Applications needing transactions
- Moderate scale (< 10M vectors)
- Hybrid SQL + vector queries
-- Create table with vector column
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
content TEXT,
embedding vector(1536),
category TEXT
);
-- Create index
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops);
-- Query
SELECT id, content,
1 - (embedding <=> query_embedding) as similarity
FROM documents
WHERE category = 'tech'
ORDER BY embedding <=> query_embedding
LIMIT 10;
4.4 Decision Framework
Quick Decision Guide
Need managed, production-ready, minimal ops?
→ Pinecone
Need maximum performance, complex filtering?
→ Qdrant
Need GraphQL, built-in ML, multi-modal?
→ Weaviate
Need simple local dev, prototyping?
→ Chroma
Already using PostgreSQL, moderate scale?
→ pgvector
Detailed Decision Matrix
| If you... | Consider |
|---|---|
| Want zero infrastructure | Pinecone |
| Need open source | Qdrant, Weaviate, Chroma, pgvector |
| Have limited budget | Self-hosted options |
| Need complex metadata filters | Qdrant |
| Want SQL interface | pgvector |
| Are prototyping | Chroma |
| Need multi-tenancy | Pinecone, Weaviate |
| Value simplicity over features | Pinecone, Chroma |
| Need 10M+ vectors | Pinecone, Qdrant |
4.5 Migration Considerations
Switching Costs
Moving between vector databases involves:
- Re-indexing: You can transfer vector data, but indexes must rebuild
- Code changes: Different APIs require code updates
- Feature parity: Some features may not exist in the new system
Portability Tips
- Abstract your vector DB client
interface VectorStore {
upsert(id: string, vector: number[], metadata: object): Promise<void>
query(vector: number[], topK: number, filter?: object): Promise<Result[]>
delete(id: string): Promise<void>
}
-
Store embeddings separately
- Keep raw embeddings in blob storage
- Allows re-indexing without re-computing
-
Use consistent metadata schema
- Same field names across databases
- Simplifies migration scripts
Key Takeaways
- No single best choice—it depends on your requirements
- Managed vs. self-hosted is often the biggest decision
- Start simple with Chroma or pgvector, scale up as needed
- Abstract your integration to enable future migration
- Evaluate on your data—benchmarks vary by use case
Exercise: Evaluate for Your Use Case
Answer these questions about your project:
- What's your expected data size (number of vectors)?
- Do you need managed infrastructure or can you self-host?
- What's your latency requirement?
- Do you need complex metadata filtering?
- What's your budget?
- Are you already using PostgreSQL or MongoDB?
Based on your answers, which database would you choose?
Next up: Module 5 - Setting Up Pinecone

