Lesson 4.3: JSONB and Semi-Structured Data
When to Use JSONB
Good use cases:
-- 1. Flexible attributes (vary by row)
CREATE TABLE products (
id BIGSERIAL PRIMARY KEY,
name TEXT NOT NULL,
category TEXT NOT NULL,
attributes JSONB -- Different per category
);
-- Electronics: {"brand": "Apple", "warranty_years": 2}
-- Clothing: {"size": "L", "color": "blue", "material": "cotton"}
-- 2. API responses (store as-is)
CREATE TABLE api_logs (
id BIGSERIAL PRIMARY KEY,
endpoint TEXT,
request JSONB,
response JSONB,
created_at TIMESTAMPTZ
);
-- 3. Model configurations
CREATE TABLE ml_models (
id BIGSERIAL PRIMARY KEY,
name TEXT,
config JSONB -- Hyperparameters, architecture
);
Bad use cases:
-- 1. Structured data that should be normalized
-- Bad:
CREATE TABLE users (
id BIGSERIAL PRIMARY KEY,
data JSONB -- {"email": "...", "name": "...", "age": 30}
);
-- Good:
CREATE TABLE users (
id BIGSERIAL PRIMARY KEY,
email TEXT,
name TEXT,
age INT
);
-- 2. Data you need to JOIN on (put in columns)
-- 3. Data that needs strong typing (use columns)
JSONB Performance
Indexing JSONB:
-- GIN index for containment queries
CREATE INDEX products_attrs_gin ON products USING GIN(attributes);
-- Fast queries
SELECT * FROM products WHERE attributes @> '{"brand": "Apple"}';
SELECT * FROM products WHERE attributes ? 'color';
-- Index specific field
CREATE INDEX products_price_idx ON products((attributes->>'price')::numeric);
SELECT * FROM products WHERE (attributes->>'price')::numeric < 100;
JSONB vs columns:
-- Test data
INSERT INTO products_jsonb SELECT
generate_series(1, 1000000),
'Product ' || generate_series(1, 1000000),
'{"price": 99.99, "brand": "Apple"}'::jsonb;
-- Query JSONB (slower)
SELECT * FROM products_jsonb
WHERE attributes->>'brand' = 'Apple';
-- Time: ~300ms
-- Query column (faster)
SELECT * FROM products_cols WHERE brand = 'Apple';
-- Time: ~50ms (6x faster with same index)
Rule: Use JSONB for truly flexible data. Use columns for anything you query frequently.
Key Takeaways
- JSONB is perfect for flexible attributes that vary by row
- Use JSONB for API responses and model configurations
- Don't use JSONB for structured data that should be in columns
- JSONB queries are 6x slower than column queries even with indexes
- GIN indexes enable fast JSONB containment queries
- Index specific JSONB fields with expression indexes for better performance
- Use columns for frequently queried data, JSONB for everything else

