Lesson 3.3: Partial Indexes
The Concept
Index only the rows you actually query:
-- Problem: Table has 10M rows, but only 100k are 'active'
SELECT * FROM users WHERE status = 'active' AND city = 'NYC';
-- Regular index: 10M entries
CREATE INDEX users_city_idx ON users(city);
-- Partial index: 100k entries (100x smaller!)
CREATE INDEX users_active_city_idx ON users(city)
WHERE status = 'active';
Real-World Examples
1. Recent data queries:
-- AI logs table: 1B rows, but queries only last 7 days
CREATE INDEX model_logs_recent_idx ON model_logs(created_at, model_version)
WHERE created_at > NOW() - INTERVAL '7 days';
-- Index stays small (auto-expires old data)
-- Much faster than indexing all 1B rows
2. Soft deletes:
-- Only index non-deleted rows
CREATE INDEX documents_active_idx ON documents(user_id, category)
WHERE deleted_at IS NULL;
-- Queries on active documents use small index
SELECT * FROM documents
WHERE user_id = 123 AND deleted_at IS NULL;
3. Status-based queries:
-- E-commerce: Only index pending/processing orders
CREATE INDEX orders_active_idx ON orders(user_id, created_at)
WHERE status IN ('pending', 'processing');
-- Completed orders (99% of data) not indexed
-- Index stays small and fast
Partial Index Strategy for AI Workloads
Feature stores:
-- Index only recent features (last 30 days)
CREATE INDEX features_recent_idx ON user_features(user_id)
INCLUDE (feature_vector)
WHERE computed_at > NOW() - INTERVAL '30 days';
Model predictions:
-- Index only successful predictions
CREATE INDEX predictions_success_idx ON ml_predictions(user_id, model_version)
INCLUDE (prediction)
WHERE status = 'success' AND confidence > 0.7;
Key Takeaways
- Partial indexes only index rows matching a WHERE condition
- Can be 100x smaller than full indexes for selective queries
- Perfect for time-based queries (recent data only)
- Excellent for soft deletes and status-based filtering
- Query must include the partial index condition to use the index
- Dramatically reduces storage and maintenance costs for large tables

