Module 14: Choosing the Right Database
A Framework for Making the Decision
Introduction
After learning about all these options, how do you actually choose? This module provides a practical framework for making the decision.
By the end of this module, you'll have:
- A decision framework you can apply
- Clear criteria for evaluation
- Migration considerations
- A path forward
14.1 The Decision Framework
Step 1: Define Your Requirements
Start by answering these questions:
interface Requirements {
// Scale
currentVectorCount: number
projectedGrowth: '10x' | '100x' | 'unknown'
// Performance
latencyTarget: number // p99 in ms
throughputTarget: number // QPS
// Features
needsHybridSearch: boolean
needsComplexFilters: boolean
needsMultiTenancy: boolean
// Operations
teamSize: number
opsExpertise: 'none' | 'some' | 'extensive'
existingInfra: 'postgres' | 'mongodb' | 'none'
// Constraints
budget: number // Monthly
complianceRequirements: string[]
dataResidency: string[] // Regions
}
Step 2: Eliminate Incompatible Options
Based on requirements, remove options that don't fit:
function filterOptions(req: Requirements): string[] {
let options = [
'pinecone', 'qdrant', 'weaviate',
'chroma', 'pgvector', 'mongodb-atlas'
]
// Data residency
if (req.dataResidency.includes('eu-only')) {
// Check which providers support EU regions
options = options.filter(o =>
['pinecone', 'qdrant', 'weaviate', 'pgvector'].includes(o)
)
}
// Scale
if (req.currentVectorCount > 10_000_000) {
options = options.filter(o =>
['pinecone', 'qdrant', 'weaviate'].includes(o)
)
}
// Budget
if (req.budget < 100) {
options = options.filter(o =>
['chroma', 'pgvector'].includes(o)
)
}
// Existing infrastructure
if (req.existingInfra === 'postgres') {
options.unshift('pgvector') // Prioritize
}
return options
}
Step 3: Score Remaining Options
interface ScoredOption {
name: string
scores: {
fit: number // How well it matches requirements
cost: number // Lower is better
risk: number // Lower is better
productivity: number // Higher is better
}
total: number
}
function scoreOption(
option: string,
req: Requirements
): ScoredOption {
const scores = {
fit: 0,
cost: 0,
risk: 0,
productivity: 0
}
// Score based on fit (1-10)
// ... scoring logic ...
const total = scores.fit * 0.4 +
(10 - scores.cost) * 0.25 +
(10 - scores.risk) * 0.20 +
scores.productivity * 0.15
return { name: option, scores, total }
}
14.2 Decision Trees by Use Case
RAG Application
Building a RAG chatbot?
│
├─ Small scale (< 100K docs)?
│ ├─ Already using Postgres? → pgvector
│ └─ Want simplest setup? → Chroma
│
├─ Medium scale (100K - 1M)?
│ ├─ Need managed simplicity? → Pinecone
│ ├─ Want more control? → Qdrant Cloud
│ └─ Already have Postgres? → pgvector (upgrade instance)
│
└─ Large scale (> 1M)?
├─ Have ops expertise? → Self-hosted Qdrant/pgvector
└─ Prefer managed? → Pinecone/Qdrant Cloud
Semantic Search
Building semantic search?
│
├─ Need hybrid search (semantic + keyword)?
│ ├─ Using Postgres? → pgvector + full-text search
│ ├─ Want managed? → Pinecone (sparse-dense)
│ └─ Complex filters? → Qdrant
│
├─ Pure semantic search?
│ ├─ Simple use case? → Any option works
│ └─ High performance? → Pinecone or Qdrant
│
└─ Multi-language?
└─ Use appropriate embedding model first
→ Any vector DB works after
Recommendation System
Building recommendations?
│
├─ User-item recommendations?
│ ├─ Millions of items? → Pinecone, Qdrant
│ └─ Complex item attributes? → Qdrant (advanced filtering)
│
├─ Content-based?
│ └─ Any vector DB works
│
└─ Real-time updates?
├─ High write volume? → Qdrant
└─ Moderate writes? → Any option
14.3 Quick Reference Guide
Choose Pinecone If:
- You want zero infrastructure management
- Reliability is critical
- You're willing to pay for simplicity
- Team lacks ops expertise
Choose Qdrant If:
- You need advanced filtering
- Performance is critical
- You want open-source with cloud option
- You might self-host later
Choose Weaviate If:
- You want built-in vectorization
- GraphQL is preferred
- Multi-modal search (text + images)
- You need auto-schema
Choose Chroma If:
- You're prototyping
- Local development
- Small production deployments
- Simplest possible API
Choose pgvector If:
- Already using PostgreSQL
- Need ACID transactions
- Want familiar SQL
- Moderate scale (< 10M vectors)
Choose MongoDB Atlas If:
- Already using MongoDB
- Want unified database
- Document-centric data model
14.4 Migration Considerations
Planning for Change
Your first choice might not be your final choice:
// Abstract your vector store interface
interface VectorStore {
upsert(docs: Document[]): Promise<void>
query(vector: number[], options: QueryOptions): Promise<Result[]>
delete(ids: string[]): Promise<void>
}
// Implement for each database
class PineconeStore implements VectorStore { ... }
class QdrantStore implements VectorStore { ... }
class ChromaStore implements VectorStore { ... }
// Easy to swap implementations
const store: VectorStore = new PineconeStore()
// Later: const store: VectorStore = new QdrantStore()
Migration Steps
- Export vectors and metadata from old system
- Set up new system with same schema
- Import data (with same embeddings)
- Run both systems in parallel
- Compare results on test queries
- Switch traffic gradually
- Decommission old system
Data Portability
Keep your embeddings portable:
// Store embeddings in your own storage
interface StoredDocument {
id: string
content: string
embedding: number[] // Keep a copy!
metadata: object
embeddingModel: string // Track which model
embeddedAt: Date
}
// If you switch databases, you have the embeddings
// Only need to re-embed if changing models
14.5 Red Flags to Watch For
During Evaluation
- No clear pricing: Hidden costs ahead
- No export mechanism: Vendor lock-in
- Limited filtering: Will hit walls later
- Single region only: Latency problems for global apps
- No uptime SLA: Risky for production
In Production
- Latency creeping up: Time to optimize or scale
- Costs growing faster than data: Inefficient queries
- Frequent timeouts: Infrastructure problems
- Missing features blocking progress: Wrong choice
14.6 The Final Checklist
Before committing, verify:
## Technical Fit
- [ ] Supports required vector dimensions
- [ ] Meets latency requirements
- [ ] Handles projected scale
- [ ] Has needed filtering capabilities
- [ ] Supports hybrid search (if needed)
## Operational Fit
- [ ] Team can operate it
- [ ] Monitoring available
- [ ] Backup/restore works
- [ ] Upgrade path clear
## Business Fit
- [ ] Within budget (including growth)
- [ ] Meets compliance requirements
- [ ] Acceptable vendor risk
- [ ] Support available when needed
## Strategic Fit
- [ ] Aligns with tech stack direction
- [ ] Migration path exists
- [ ] Skills transferable
14.7 Recommendation Summary
For Most Projects
Start with Pinecone if:
- You're building a product (not researching)
- Time to market matters
- Budget is reasonable
Start with pgvector if:
- Already using PostgreSQL
- Want to minimize new services
- Moderate scale expected
Start with Chroma if:
- Local development focus
- Experimenting with approaches
- Cost is primary concern
Then Evaluate
After 3-6 months:
- Are costs reasonable?
- Is performance acceptable?
- Are you hitting limitations?
If yes to all, stay. If not, consider alternatives.
Key Takeaways
- No universally best choice—it depends on your context
- Start with requirements, not features
- Plan for migration—abstract your interface
- Monitor in production—adjust as needed
- Re-evaluate periodically—the landscape changes
Exercise: Make Your Decision
For your project:
- Fill out the requirements template
- Apply the elimination criteria
- Score remaining options
- Make a preliminary choice
- List what would make you reconsider
- Create an abstraction layer for portability
Document your decision and reasoning for future reference.
Next up: Module 15 - Integration with LangChain & AI SDK

