Indexes - Performance Optimization

Introduction

Imagine searching for a word in a 1000-page book without an index—you'd read every page! Databases face the same problem. Indexes are like a book's index: they speed up data retrieval by creating optimized lookup structures.

Without indexes, queries scan entire tables (slow). With indexes, databases jump straight to relevant rows (fast).

What is an Index?

An index is a data structure (usually a B-tree) that stores a sorted copy of specific columns, pointing to the actual table rows.

Trade-off:

✅ Faster SELECT queries
❌ Slower INSERT/UPDATE/DELETE (index must be updated)
❌ Extra storage space

Creating Indexes

Basic Syntax

CREATE INDEX index_name ON table_name(column_name);

Simple Index Example

-- Without index: Full table scan
SELECT * FROM customers WHERE email = 'alice@example.com';
-- Scans all 1 million rows!

-- Create index
CREATE INDEX idx_customers_email ON customers(email);

-- Now fast!
SELECT * FROM customers WHERE email = 'alice@example.com';
-- Uses index: instant lookup

Index Types

B-Tree Index (Default)

CREATE INDEX idx_products_price ON products(price);

Use for:

Equality searches (=)
Range queries (<, >, BETWEEN)
Sorting (ORDER BY)
Pattern matching starting with text (LIKE 'abc%')

Most common index type.

Unique Index

CREATE UNIQUE INDEX idx_users_email ON users(email);

Enforces uniqueness + speeds up lookups.

Equivalent to:

ALTER TABLE users ADD CONSTRAINT uq_users_email UNIQUE (email);

Partial Index (PostgreSQL)

Index only rows matching a condition:

-- Index only active products
CREATE INDEX idx_active_products ON products(name)
WHERE active = TRUE;

Benefits:

Smaller index
Faster updates
Perfect for filtering

Expression Index

Index on computed values:

-- Index lowercase email for case-insensitive search
CREATE INDEX idx_users_email_lower ON users(LOWER(email));

-- Now fast:
SELECT * FROM users WHERE LOWER(email) = 'alice@example.com';

Multi-Column (Composite) Index

CREATE INDEX idx_orders_customer_date ON orders(customer_id, order_date);

Use for:

Queries filtering on multiple columns
Queries with specific column order

Index works for:

WHERE customer_id = 1
WHERE customer_id = 1 AND order_date = '2024-01-15'

Index doesn't help:

WHERE order_date = '2024-01-15' (order_date not first!)

Rule: Left-to-right prefix matching!

Covering Index

Include extra columns for index-only scans:

CREATE INDEX idx_orders_customer_total ON orders(customer_id)
INCLUDE (total, order_date);

Query can be satisfied entirely from index (no table lookup needed).

When to Create Indexes

Always Index

✅ Primary keys (automatic)
✅ Foreign keys (manual!)
✅ Columns in WHERE clauses
✅ Columns in JOIN conditions
✅ Columns in ORDER BY
✅ Columns in GROUP BY

Example Schema with Indexes

CREATE TABLE customers (
    id SERIAL PRIMARY KEY,  -- Automatic index
    email VARCHAR(255) UNIQUE NOT NULL,  -- Automatic unique index
    name VARCHAR(100),
    city VARCHAR(50),
    created_at TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP
);

-- Manual indexes
CREATE INDEX idx_customers_city ON customers(city);
CREATE INDEX idx_customers_created ON customers(created_at);

CREATE TABLE orders (
    id SERIAL PRIMARY KEY,  -- Automatic index
    customer_id INTEGER NOT NULL REFERENCES customers(id),
    total DECIMAL(10,2),
    status VARCHAR(20),
    order_date DATE
);

-- Manual indexes
CREATE INDEX idx_orders_customer ON orders(customer_id);  -- FK index!
CREATE INDEX idx_orders_status ON orders(status);
CREATE INDEX idx_orders_date ON orders(order_date);
CREATE INDEX idx_orders_customer_date ON orders(customer_id, order_date);  -- Composite

When NOT to Create Indexes

❌ Small tables (under 1000 rows) - Full scan is fast enough
❌ Columns with low cardinality (few unique values, e.g., boolean)
❌ Columns rarely queried
❌ Tables with many writes - Index overhead slows INSERTs

Checking Index Usage

EXPLAIN Query Plan

EXPLAIN SELECT * FROM customers WHERE email = 'alice@example.com';

Without index:

Seq Scan on customers  (cost=0.00..1000.00 rows=1)
  Filter: (email = 'alice@example.com')

With index:

Index Scan using idx_customers_email on customers  (cost=0.00..8.27 rows=1)
  Index Cond: (email = 'alice@example.com')

EXPLAIN ANALYZE (with execution)

EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 1;

Shows actual execution time!

Listing Indexes

PostgreSQL

-- All indexes on a table
SELECT * FROM pg_indexes WHERE tablename = 'customers';

-- Index sizes
SELECT
    indexname,
    pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
ORDER BY pg_relation_size(indexrelid) DESC;

Dropping Indexes

DROP INDEX idx_customers_email;

-- Safe drop
DROP INDEX IF EXISTS idx_customers_email;

Reindexing

Rebuild corrupted or bloated indexes:

-- Reindex a specific index
REINDEX INDEX idx_customers_email;

-- Reindex all indexes on a table
REINDEX TABLE customers;

-- Reindex entire database
REINDEX DATABASE mydb;

Index Maintenance

Autovacuum (PostgreSQL)

Automatically cleans up dead rows and updates statistics.

-- Check autovacuum status
SELECT * FROM pg_stat_user_tables WHERE relname = 'customers';

Manual VACUUM and ANALYZE

-- Reclaim space and update statistics
VACUUM ANALYZE customers;

Common Index Patterns

Pattern 1: Foreign Key Indexes

-- Always index foreign keys!
CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    customer_id INTEGER NOT NULL REFERENCES customers(id)
);

CREATE INDEX idx_orders_customer ON orders(customer_id);

Pattern 2: Composite Index for Queries

-- Query: Find active orders for customer
SELECT * FROM orders
WHERE customer_id = 1 AND status = 'active';

-- Ideal index:
CREATE INDEX idx_orders_customer_status ON orders(customer_id, status);

Pattern 3: Partial Index for Filtering

-- Only index recent orders
CREATE INDEX idx_recent_orders ON orders(customer_id, order_date)
WHERE order_date >= '2024-01-01';

Pattern 4: Text Search Index

-- Full-text search index
CREATE INDEX idx_products_search ON products USING gin(to_tsvector('english', description));

SELECT * FROM products
WHERE to_tsvector('english', description) @@ to_tsquery('laptop');

Performance Tips

1. Measure Before Optimizing

-- Use EXPLAIN ANALYZE
EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 1;

2. Don't Over-Index

More indexes = slower writes. Balance is key!

3. Use Composite Indexes Wisely

-- ✅ Good: Matches query pattern
CREATE INDEX idx ON orders(customer_id, order_date);
-- Query: WHERE customer_id = 1 AND order_date = '2024-01-15'

-- ❌ Bad: Wrong column order
CREATE INDEX idx ON orders(order_date, customer_id);
-- Doesn't help: WHERE customer_id = 1

4. Monitor Index Usage

SELECT
    schemaname,
    tablename,
    indexname,
    idx_scan AS index_scans,
    idx_tup_read AS tuples_read,
    idx_tup_fetch AS tuples_fetched
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
ORDER BY idx_scan ASC;

Indexes with 0 scans = unused (consider dropping!).

Practice Exercises

Exercise 1

Create indexes for this query: SELECT * FROM products WHERE category = 'Electronics' AND price > 100 ORDER BY price;

Solution

-- Option 1: Composite index
CREATE INDEX idx_products_category_price ON products(category, price);

-- Option 2: Separate indexes (less efficient)
CREATE INDEX idx_products_category ON products(category);
CREATE INDEX idx_products_price ON products(price);

Exercise 2

Create a partial index for active users created in 2024.

Solution

CREATE INDEX idx_users_2024_active ON users(created_at)
WHERE active = TRUE AND EXTRACT(YEAR FROM created_at) = 2024;

Key Takeaways

✅ Indexes speed up SELECT but slow down INSERT/UPDATE/DELETE
✅ B-tree is the default and most common index type
✅ Always index foreign keys manually
✅ Composite indexes for multi-column queries
✅ Partial indexes for filtered queries
✅ Use EXPLAIN to verify index usage
✅ Monitor and drop unused indexes
✅ Don't over-index - balance read vs write performance

Next Steps

You've mastered indexes! In the next lesson, we'll explore Normalization—database design principles for reducing redundancy and ensuring data integrity.

Let's design better schemas!

Indexes - Performance Optimization

Introduction

Without indexes, queries scan entire tables (slow). With indexes, databases jump straight to relevant rows (fast).

What is an Index?

An index is a data structure (usually a B-tree) that stores a sorted copy of specific columns, pointing to the actual table rows.

Trade-off:

✅ Faster SELECT queries
❌ Slower INSERT/UPDATE/DELETE (index must be updated)
❌ Extra storage space

Creating Indexes

Basic Syntax

CREATE INDEX index_name ON table_name(column_name);

Simple Index Example

-- Without index: Full table scan
SELECT * FROM customers WHERE email = 'alice@example.com';
-- Scans all 1 million rows!

-- Create index
CREATE INDEX idx_customers_email ON customers(email);

-- Now fast!
SELECT * FROM customers WHERE email = 'alice@example.com';
-- Uses index: instant lookup

Index Types

B-Tree Index (Default)

CREATE INDEX idx_products_price ON products(price);

Use for:

Equality searches (=)
Range queries (<, >, BETWEEN)
Sorting (ORDER BY)
Pattern matching starting with text (LIKE 'abc%')

Most common index type.

Unique Index

CREATE UNIQUE INDEX idx_users_email ON users(email);

Enforces uniqueness + speeds up lookups.

Equivalent to:

ALTER TABLE users ADD CONSTRAINT uq_users_email UNIQUE (email);

Partial Index (PostgreSQL)

Index only rows matching a condition:

-- Index only active products
CREATE INDEX idx_active_products ON products(name)
WHERE active = TRUE;

Benefits:

Smaller index
Faster updates
Perfect for filtering

Expression Index

Index on computed values:

-- Index lowercase email for case-insensitive search
CREATE INDEX idx_users_email_lower ON users(LOWER(email));

-- Now fast:
SELECT * FROM users WHERE LOWER(email) = 'alice@example.com';

Multi-Column (Composite) Index

CREATE INDEX idx_orders_customer_date ON orders(customer_id, order_date);

Use for:

Queries filtering on multiple columns
Queries with specific column order

Index works for:

WHERE customer_id = 1
WHERE customer_id = 1 AND order_date = '2024-01-15'

Index doesn't help:

WHERE order_date = '2024-01-15' (order_date not first!)

Rule: Left-to-right prefix matching!

Covering Index

Include extra columns for index-only scans:

CREATE INDEX idx_orders_customer_total ON orders(customer_id)
INCLUDE (total, order_date);

Query can be satisfied entirely from index (no table lookup needed).

When to Create Indexes

Always Index

✅ Primary keys (automatic)
✅ Foreign keys (manual!)
✅ Columns in WHERE clauses
✅ Columns in JOIN conditions
✅ Columns in ORDER BY
✅ Columns in GROUP BY

Example Schema with Indexes

CREATE TABLE customers (
    id SERIAL PRIMARY KEY,  -- Automatic index
    email VARCHAR(255) UNIQUE NOT NULL,  -- Automatic unique index
    name VARCHAR(100),
    city VARCHAR(50),
    created_at TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP
);

-- Manual indexes
CREATE INDEX idx_customers_city ON customers(city);
CREATE INDEX idx_customers_created ON customers(created_at);

CREATE TABLE orders (
    id SERIAL PRIMARY KEY,  -- Automatic index
    customer_id INTEGER NOT NULL REFERENCES customers(id),
    total DECIMAL(10,2),
    status VARCHAR(20),
    order_date DATE
);

-- Manual indexes
CREATE INDEX idx_orders_customer ON orders(customer_id);  -- FK index!
CREATE INDEX idx_orders_status ON orders(status);
CREATE INDEX idx_orders_date ON orders(order_date);
CREATE INDEX idx_orders_customer_date ON orders(customer_id, order_date);  -- Composite

When NOT to Create Indexes

❌ Small tables (under 1000 rows) - Full scan is fast enough
❌ Columns with low cardinality (few unique values, e.g., boolean)
❌ Columns rarely queried
❌ Tables with many writes - Index overhead slows INSERTs

Checking Index Usage

EXPLAIN Query Plan

EXPLAIN SELECT * FROM customers WHERE email = 'alice@example.com';

Without index:

Seq Scan on customers  (cost=0.00..1000.00 rows=1)
  Filter: (email = 'alice@example.com')

With index:

Index Scan using idx_customers_email on customers  (cost=0.00..8.27 rows=1)
  Index Cond: (email = 'alice@example.com')

EXPLAIN ANALYZE (with execution)

EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 1;

Shows actual execution time!

Listing Indexes

PostgreSQL

-- All indexes on a table
SELECT * FROM pg_indexes WHERE tablename = 'customers';

-- Index sizes
SELECT
    indexname,
    pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
ORDER BY pg_relation_size(indexrelid) DESC;

Dropping Indexes

DROP INDEX idx_customers_email;

-- Safe drop
DROP INDEX IF EXISTS idx_customers_email;

Reindexing

Rebuild corrupted or bloated indexes:

-- Reindex a specific index
REINDEX INDEX idx_customers_email;

-- Reindex all indexes on a table
REINDEX TABLE customers;

-- Reindex entire database
REINDEX DATABASE mydb;

Index Maintenance

Autovacuum (PostgreSQL)

Automatically cleans up dead rows and updates statistics.

-- Check autovacuum status
SELECT * FROM pg_stat_user_tables WHERE relname = 'customers';

Manual VACUUM and ANALYZE

-- Reclaim space and update statistics
VACUUM ANALYZE customers;

Common Index Patterns

Pattern 1: Foreign Key Indexes

-- Always index foreign keys!
CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    customer_id INTEGER NOT NULL REFERENCES customers(id)
);

CREATE INDEX idx_orders_customer ON orders(customer_id);

Pattern 2: Composite Index for Queries

-- Query: Find active orders for customer
SELECT * FROM orders
WHERE customer_id = 1 AND status = 'active';

-- Ideal index:
CREATE INDEX idx_orders_customer_status ON orders(customer_id, status);

Pattern 3: Partial Index for Filtering

-- Only index recent orders
CREATE INDEX idx_recent_orders ON orders(customer_id, order_date)
WHERE order_date >= '2024-01-01';

Pattern 4: Text Search Index

-- Full-text search index
CREATE INDEX idx_products_search ON products USING gin(to_tsvector('english', description));

SELECT * FROM products
WHERE to_tsvector('english', description) @@ to_tsquery('laptop');

Performance Tips

1. Measure Before Optimizing

-- Use EXPLAIN ANALYZE
EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 1;

2. Don't Over-Index

More indexes = slower writes. Balance is key!

3. Use Composite Indexes Wisely

-- ✅ Good: Matches query pattern
CREATE INDEX idx ON orders(customer_id, order_date);
-- Query: WHERE customer_id = 1 AND order_date = '2024-01-15'

-- ❌ Bad: Wrong column order
CREATE INDEX idx ON orders(order_date, customer_id);
-- Doesn't help: WHERE customer_id = 1

4. Monitor Index Usage

SELECT
    schemaname,
    tablename,
    indexname,
    idx_scan AS index_scans,
    idx_tup_read AS tuples_read,
    idx_tup_fetch AS tuples_fetched
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
ORDER BY idx_scan ASC;

Indexes with 0 scans = unused (consider dropping!).

Practice Exercises

Exercise 1

Create indexes for this query: SELECT * FROM products WHERE category = 'Electronics' AND price > 100 ORDER BY price;

Solution

-- Option 1: Composite index
CREATE INDEX idx_products_category_price ON products(category, price);

-- Option 2: Separate indexes (less efficient)
CREATE INDEX idx_products_category ON products(category);
CREATE INDEX idx_products_price ON products(price);

Exercise 2

Create a partial index for active users created in 2024.

Solution

CREATE INDEX idx_users_2024_active ON users(created_at)
WHERE active = TRUE AND EXTRACT(YEAR FROM created_at) = 2024;

Key Takeaways

✅ Indexes speed up SELECT but slow down INSERT/UPDATE/DELETE
✅ B-tree is the default and most common index type
✅ Always index foreign keys manually
✅ Composite indexes for multi-column queries
✅ Partial indexes for filtered queries
✅ Use EXPLAIN to verify index usage
✅ Monitor and drop unused indexes
✅ Don't over-index - balance read vs write performance

Next Steps

You've mastered indexes! In the next lesson, we'll explore Normalization—database design principles for reducing redundancy and ensuring data integrity.

Let's design better schemas!

Indexes - Performance Optimization

Introduction

What is an Index?

Creating Indexes

Basic Syntax

Simple Index Example

Index Types

B-Tree Index (Default)

Unique Index

Partial Index (PostgreSQL)

Expression Index

Multi-Column (Composite) Index

Covering Index

When to Create Indexes

Always Index

Example Schema with Indexes

When NOT to Create Indexes

Checking Index Usage

EXPLAIN Query Plan

EXPLAIN ANALYZE (with execution)

Listing Indexes

PostgreSQL

Dropping Indexes

Reindexing

Index Maintenance

Autovacuum (PostgreSQL)

Manual VACUUM and ANALYZE

Common Index Patterns

Pattern 1: Foreign Key Indexes

Pattern 2: Composite Index for Queries

Pattern 3: Partial Index for Filtering

Pattern 4: Text Search Index

Performance Tips

1. Measure Before Optimizing

2. Don't Over-Index

3. Use Composite Indexes Wisely

4. Monitor Index Usage

Practice Exercises

Exercise 1

Exercise 2

Key Takeaways

Next Steps

Quiz

Indexes - Performance Optimization

Introduction

What is an Index?

Creating Indexes

Basic Syntax

Simple Index Example

Index Types

B-Tree Index (Default)

Unique Index

Partial Index (PostgreSQL)

Expression Index

Multi-Column (Composite) Index

Covering Index

When to Create Indexes

Always Index

Example Schema with Indexes

When NOT to Create Indexes

Checking Index Usage

EXPLAIN Query Plan

EXPLAIN ANALYZE (with execution)

Listing Indexes

PostgreSQL

Dropping Indexes

Reindexing

Index Maintenance

Autovacuum (PostgreSQL)

Manual VACUUM and ANALYZE

Common Index Patterns

Pattern 1: Foreign Key Indexes

Pattern 2: Composite Index for Queries

Pattern 3: Partial Index for Filtering

Pattern 4: Text Search Index

Performance Tips

1. Measure Before Optimizing

2. Don't Over-Index

3. Use Composite Indexes Wisely

4. Monitor Index Usage