Module 14: Choosing the Right Database

A Framework for Making the Decision

Introduction

After learning about all these options, how do you actually choose? This module provides a practical framework for making the decision.

By the end of this module, you'll have:

A decision framework you can apply
Clear criteria for evaluation
Migration considerations
A path forward

14.1 The Decision Framework

Step 1: Define Your Requirements

Start by answering these questions:

interface Requirements {
  // Scale
  currentVectorCount: number
  projectedGrowth: '10x' | '100x' | 'unknown'

  // Performance
  latencyTarget: number  // p99 in ms
  throughputTarget: number  // QPS

  // Features
  needsHybridSearch: boolean
  needsComplexFilters: boolean
  needsMultiTenancy: boolean

  // Operations
  teamSize: number
  opsExpertise: 'none' | 'some' | 'extensive'
  existingInfra: 'postgres' | 'mongodb' | 'none'

  // Constraints
  budget: number  // Monthly
  complianceRequirements: string[]
  dataResidency: string[]  // Regions
}

Step 2: Eliminate Incompatible Options

Based on requirements, remove options that don't fit:

function filterOptions(req: Requirements): string[] {
  let options = [
    'pinecone', 'qdrant', 'weaviate',
    'chroma', 'pgvector', 'mongodb-atlas'
  ]

  // Data residency
  if (req.dataResidency.includes('eu-only')) {
    // Check which providers support EU regions
    options = options.filter(o =>
      ['pinecone', 'qdrant', 'weaviate', 'pgvector'].includes(o)
    )
  }

  // Scale
  if (req.currentVectorCount > 10_000_000) {
    options = options.filter(o =>
      ['pinecone', 'qdrant', 'weaviate'].includes(o)
    )
  }

  // Budget
  if (req.budget < 100) {
    options = options.filter(o =>
      ['chroma', 'pgvector'].includes(o)
    )
  }

  // Existing infrastructure
  if (req.existingInfra === 'postgres') {
    options.unshift('pgvector')  // Prioritize
  }

  return options
}

Step 3: Score Remaining Options

interface ScoredOption {
  name: string
  scores: {
    fit: number        // How well it matches requirements
    cost: number       // Lower is better
    risk: number       // Lower is better
    productivity: number  // Higher is better
  }
  total: number
}

function scoreOption(
  option: string,
  req: Requirements
): ScoredOption {
  const scores = {
    fit: 0,
    cost: 0,
    risk: 0,
    productivity: 0
  }

  // Score based on fit (1-10)
  // ... scoring logic ...

  const total = scores.fit * 0.4 +
                (10 - scores.cost) * 0.25 +
                (10 - scores.risk) * 0.20 +
                scores.productivity * 0.15

  return { name: option, scores, total }
}

14.2 Decision Trees by Use Case

RAG Application

Building a RAG chatbot?
│
├─ Small scale (< 100K docs)?
│  ├─ Already using Postgres? → pgvector
│  └─ Want simplest setup? → Chroma
│
├─ Medium scale (100K - 1M)?
│  ├─ Need managed simplicity? → Pinecone
│  ├─ Want more control? → Qdrant Cloud
│  └─ Already have Postgres? → pgvector (upgrade instance)
│
└─ Large scale (> 1M)?
   ├─ Have ops expertise? → Self-hosted Qdrant/pgvector
   └─ Prefer managed? → Pinecone/Qdrant Cloud

Semantic Search

Building semantic search?
│
├─ Need hybrid search (semantic + keyword)?
│  ├─ Using Postgres? → pgvector + full-text search
│  ├─ Want managed? → Pinecone (sparse-dense)
│  └─ Complex filters? → Qdrant
│
├─ Pure semantic search?
│  ├─ Simple use case? → Any option works
│  └─ High performance? → Pinecone or Qdrant
│
└─ Multi-language?
   └─ Use appropriate embedding model first
      → Any vector DB works after

Recommendation System

Building recommendations?
│
├─ User-item recommendations?
│  ├─ Millions of items? → Pinecone, Qdrant
│  └─ Complex item attributes? → Qdrant (advanced filtering)
│
├─ Content-based?
│  └─ Any vector DB works
│
└─ Real-time updates?
   ├─ High write volume? → Qdrant
   └─ Moderate writes? → Any option

14.3 Quick Reference Guide

Choose Pinecone If:

You want zero infrastructure management
Reliability is critical
You're willing to pay for simplicity
Team lacks ops expertise

Choose Qdrant If:

You need advanced filtering
Performance is critical
You want open-source with cloud option
You might self-host later

Choose Weaviate If:

You want built-in vectorization
GraphQL is preferred
Multi-modal search (text + images)
You need auto-schema

Choose Chroma If:

You're prototyping
Local development
Small production deployments
Simplest possible API

Choose pgvector If:

Already using PostgreSQL
Need ACID transactions
Want familiar SQL
Moderate scale (< 10M vectors)

Choose MongoDB Atlas If:

Already using MongoDB
Want unified database
Document-centric data model

14.4 Migration Considerations

Planning for Change

Your first choice might not be your final choice:

// Abstract your vector store interface
interface VectorStore {
  upsert(docs: Document[]): Promise<void>
  query(vector: number[], options: QueryOptions): Promise<Result[]>
  delete(ids: string[]): Promise<void>
}

// Implement for each database
class PineconeStore implements VectorStore { ... }
class QdrantStore implements VectorStore { ... }
class ChromaStore implements VectorStore { ... }

// Easy to swap implementations
const store: VectorStore = new PineconeStore()
// Later: const store: VectorStore = new QdrantStore()

Migration Steps

Export vectors and metadata from old system
Set up new system with same schema
Import data (with same embeddings)
Run both systems in parallel
Compare results on test queries
Switch traffic gradually
Decommission old system

Data Portability

Keep your embeddings portable:

// Store embeddings in your own storage
interface StoredDocument {
  id: string
  content: string
  embedding: number[]  // Keep a copy!
  metadata: object
  embeddingModel: string  // Track which model
  embeddedAt: Date
}

// If you switch databases, you have the embeddings
// Only need to re-embed if changing models

14.5 Red Flags to Watch For

During Evaluation

No clear pricing: Hidden costs ahead
No export mechanism: Vendor lock-in
Limited filtering: Will hit walls later
Single region only: Latency problems for global apps
No uptime SLA: Risky for production

In Production

Latency creeping up: Time to optimize or scale
Costs growing faster than data: Inefficient queries
Frequent timeouts: Infrastructure problems
Missing features blocking progress: Wrong choice

14.6 The Final Checklist

Before committing, verify:

## Technical Fit
- [ ] Supports required vector dimensions
- [ ] Meets latency requirements
- [ ] Handles projected scale
- [ ] Has needed filtering capabilities
- [ ] Supports hybrid search (if needed)

## Operational Fit
- [ ] Team can operate it
- [ ] Monitoring available
- [ ] Backup/restore works
- [ ] Upgrade path clear

## Business Fit
- [ ] Within budget (including growth)
- [ ] Meets compliance requirements
- [ ] Acceptable vendor risk
- [ ] Support available when needed

## Strategic Fit
- [ ] Aligns with tech stack direction
- [ ] Migration path exists
- [ ] Skills transferable

14.7 Recommendation Summary

For Most Projects

Start with Pinecone if:

You're building a product (not researching)
Time to market matters
Budget is reasonable

Start with pgvector if:

Already using PostgreSQL
Want to minimize new services
Moderate scale expected

Start with Chroma if:

Local development focus
Experimenting with approaches
Cost is primary concern

Then Evaluate

After 3-6 months:

Are costs reasonable?
Is performance acceptable?
Are you hitting limitations?

If yes to all, stay. If not, consider alternatives.

Key Takeaways

No universally best choice—it depends on your context
Start with requirements, not features
Plan for migration—abstract your interface
Monitor in production—adjust as needed
Re-evaluate periodically—the landscape changes

Exercise: Make Your Decision

For your project:

Fill out the requirements template
Apply the elimination criteria
Score remaining options
Make a preliminary choice
List what would make you reconsider
Create an abstraction layer for portability

Document your decision and reasoning for future reference.

Next up: Module 15 - Integration with LangChain & AI SDK