Lesson 1.4: Case Study - How TikTok Uses SQL
TikTok's Scale
- 1 billion+ active users
- Billions of videos
- Real-time recommendation engine
- Personalized "For You" feed
Their Architecture (Simplified)
Database Layer:
- Primary Database: MySQL clusters (sharded by user_id)
- Read Replicas: Scaled horizontally for recommendations
- Cache Layer: Redis for hot data
- Analytics: ClickHouse for OLAP queries
Why MySQL, not a "modern" database?
- Proven at scale: MySQL has run at billion-user scale for decades
- Horizontal sharding: Partition users across clusters
- Read replicas: Scale recommendation reads independently
- ACID transactions: Critical for user data integrity
- Mature tooling: Backup, monitoring, migration tools
The Recommendation Pipeline
User opens app
↓
[MySQL: Fetch user preferences, watch history] (5ms)
↓
[ML Model: Generate candidate videos] (20ms)
↓
[MySQL: Fetch video metadata for candidates] (3ms)
↓
[ML Ranking Model: Score videos] (15ms)
↓
[MySQL: Log impression, update metrics] (2ms)
↓
Return personalized feed (45ms total)
Key Observations:
- SQL queries happen 3 times per request
- Total SQL time: ~10ms out of 45ms
- ML models depend on fast SQL for context
- Logging back to SQL closes the feedback loop
How They Handle Scale
Sharding Strategy:
-- User data sharded by user_id hash
-- Shard 1: user_id % 100 = 0-9
-- Shard 2: user_id % 100 = 10-19
-- ...
-- Shard 10: user_id % 100 = 90-99
-- Video data sharded by video_id hash
-- Cross-shard joins avoided
Read Replica Architecture:
Primary (writes)
↓
├─ Replica 1 (recommendations)
├─ Replica 2 (recommendations)
├─ Replica 3 (user lookups)
└─ Replica 4 (analytics)
Caching Layer:
Request → Check Redis → Miss → MySQL → Cache in Redis
↓ Hit → Return
Result: Sub-50ms database latency at billion-user scale.
Lessons for AI Developers
- SQL scales if architected properly: Sharding + replicas + caching
- Keep it simple: MySQL from 2005 beats fancy NoSQL for most use cases
- Separate reads and writes: Different replicas for different workloads
- Cache aggressively: Redis for hot data, SQL for source of truth
- Measure everything: Know your p50, p95, p99 latencies
Key Takeaways
- TikTok serves 1 billion+ users with MySQL, not a "modern" database
- SQL appears 3 times in every recommendation request
- Proper architecture (sharding, replicas, caching) enables massive scale
- Simple, proven technology often outperforms cutting-edge alternatives
- Measuring and optimizing latency is critical for AI systems at scale

