Lesson 1.4: Case Study - How TikTok Uses SQL

TikTok's Scale

1 billion+ active users
Billions of videos
Real-time recommendation engine
Personalized "For You" feed

Their Architecture (Simplified)

Database Layer:

Primary Database: MySQL clusters (sharded by user_id)
Read Replicas: Scaled horizontally for recommendations
Cache Layer: Redis for hot data
Analytics: ClickHouse for OLAP queries

Why MySQL, not a "modern" database?

Proven at scale: MySQL has run at billion-user scale for decades
Horizontal sharding: Partition users across clusters
Read replicas: Scale recommendation reads independently
ACID transactions: Critical for user data integrity
Mature tooling: Backup, monitoring, migration tools

The Recommendation Pipeline

User opens app
    ↓
[MySQL: Fetch user preferences, watch history] (5ms)
    ↓
[ML Model: Generate candidate videos] (20ms)
    ↓
[MySQL: Fetch video metadata for candidates] (3ms)
    ↓
[ML Ranking Model: Score videos] (15ms)
    ↓
[MySQL: Log impression, update metrics] (2ms)
    ↓
Return personalized feed (45ms total)

Key Observations:

SQL queries happen 3 times per request
Total SQL time: ~10ms out of 45ms
ML models depend on fast SQL for context
Logging back to SQL closes the feedback loop

How They Handle Scale

Sharding Strategy:

-- User data sharded by user_id hash
-- Shard 1: user_id % 100 = 0-9
-- Shard 2: user_id % 100 = 10-19
-- ...
-- Shard 10: user_id % 100 = 90-99

-- Video data sharded by video_id hash
-- Cross-shard joins avoided

Read Replica Architecture:

Primary (writes)
    ↓
├─ Replica 1 (recommendations)
├─ Replica 2 (recommendations)
├─ Replica 3 (user lookups)
└─ Replica 4 (analytics)

Caching Layer:

Request → Check Redis → Miss → MySQL → Cache in Redis
                     ↓ Hit → Return

Result: Sub-50ms database latency at billion-user scale.

Lessons for AI Developers

SQL scales if architected properly: Sharding + replicas + caching
Keep it simple: MySQL from 2005 beats fancy NoSQL for most use cases
Separate reads and writes: Different replicas for different workloads
Cache aggressively: Redis for hot data, SQL for source of truth
Measure everything: Know your p50, p95, p99 latencies

Key Takeaways

TikTok serves 1 billion+ users with MySQL, not a "modern" database
SQL appears 3 times in every recommendation request
Proper architecture (sharding, replicas, caching) enables massive scale
Simple, proven technology often outperforms cutting-edge alternatives
Measuring and optimizing latency is critical for AI systems at scale