Lesson 1.2: SQL vs NoSQL vs Vector Databases
The Database Landscape in 2025
Let's compare based on real production use cases:
| Use Case | Best Choice | Why |
|---|---|---|
| User accounts, orders, inventory | SQL | ACID transactions, complex joins |
| Real-time analytics, time-series | SQL (Timescale, ClickHouse) | Columnar storage, aggregations |
| Document storage (CMS, catalogs) | NoSQL (MongoDB) | Flexible schema, nested data |
| Pure vector similarity search | Vector DB (Pinecone) | Optimized HNSW, no SQL needed |
| RAG with metadata filtering | SQL + pgvector | Hybrid queries, transactions |
| Session store, caching | Redis | In-memory speed |
| Event logs, append-only data | SQL (Postgres) or NoSQL (Cassandra) | Depends on query patterns |
Why PostgreSQL Dominates AI Workloads
Companies using PostgreSQL for AI:
- Instagram: 1 billion+ users, recommendation engine
- Uber: Real-time matching, ML features
- Spotify: Personalization, playlist generation
- Reddit: Content moderation, recommendations
- Discord: Message storage, spam detection
- Robinhood: Trade execution, fraud detection
Why they chose SQL over NoSQL:
- ACID Transactions: Can't lose user data or money
- Complex Joins: AI features require combining many tables
- Flexible Queries: Data science teams need ad-hoc analysis
- Mature Ecosystem: 40+ years of optimization
- pgvector Extension: Vector search built-in
When NoSQL Actually Makes Sense
MongoDB and similar databases excel when:
- Schema is truly dynamic: Different document structures per record
- Deep nesting needed: JSON trees 5+ levels deep
- Horizontal scaling is primary: Sharding is easier than Postgres
- No complex joins required: Single-document queries only
Example: A CMS where each article type has completely different fields. SQL requires messy JSONB columns or complex joins; MongoDB stores each article naturally.
But for AI systems? You almost always need joins, transactions, and hybrid queries—so SQL wins.
Vector Databases: Specialized Tools
Dedicated vector databases (Pinecone, Weaviate, Qdrant, Milvus) are excellent when:
- Pure vector search: No metadata filtering needed
- Billions of vectors: Specialized sharding and indexing
- Managed service: Don't want to run infrastructure
But PostgreSQL + pgvector is better when:
- Hybrid queries: Vector similarity + SQL filters
- Transactions required: Consistency with other data
- Existing Postgres infrastructure: One database for everything
- Cost-sensitive: No per-query or per-vector pricing
The trend: Most companies start with pgvector, only move to dedicated vector databases at massive scale (100M+ vectors).
Key Takeaways
- SQL databases dominate AI workloads due to ACID guarantees and complex joins
- NoSQL excels for truly dynamic schemas and deep nesting, but AI systems rarely need this
- Vector databases are specialized tools for pure vector search at massive scale
- PostgreSQL + pgvector is the best starting point for most AI applications
- The right database choice depends on your specific query patterns and scale requirements

