Normalization - Database Design Principles

Introduction

What if you store customer address in every order? When they move, you'd update hundreds of rows! This is a data anomaly caused by poor schema design.

Normalization is the process of organizing tables to minimize redundancy and dependency. It prevents anomalies and ensures data integrity.

The Problem: Denormalized Data

Bad Design Example

CREATE TABLE orders_bad (
    order_id INTEGER,
    customer_name VARCHAR(100),
    customer_email VARCHAR(255),
    customer_address TEXT,
    product_name VARCHAR(200),
    product_price DECIMAL(10,2),
    quantity INTEGER
);

Problems:

Update Anomaly: Customer moves? Update every order!
Insertion Anomaly: Can't add customer without an order
Deletion Anomaly: Delete last order? Lose customer data!
Data Redundancy: Customer info repeated for every order

The Solution: Normalized Design

CREATE TABLE customers (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    email VARCHAR(255) UNIQUE NOT NULL,
    address TEXT
);

CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name VARCHAR(200) NOT NULL,
    price DECIMAL(10,2) NOT NULL
);

CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    customer_id INTEGER REFERENCES customers(id),
    order_date TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE order_items (
    order_id INTEGER REFERENCES orders(id),
    product_id INTEGER REFERENCES products(id),
    quantity INTEGER NOT NULL,
    price DECIMAL(10,2) NOT NULL,  -- Price at time of purchase
    PRIMARY KEY (order_id, product_id)
);

Benefits:

Customer info stored once
Update address once
No redundancy

Normal Forms

First Normal Form (1NF)

Rule: Each column contains atomic (indivisible) values. No repeating groups.

❌ Violates 1NF:

CREATE TABLE students_bad (
    id INTEGER,
    name VARCHAR(100),
    courses VARCHAR(255)  -- 'Math,Science,History' (non-atomic!)
);

✅ Follows 1NF:

CREATE TABLE students (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100)
);

CREATE TABLE student_courses (
    student_id INTEGER REFERENCES students(id),
    course_name VARCHAR(100),
    PRIMARY KEY (student_id, course_name)
);

Second Normal Form (2NF)

Rule: 1NF + No partial dependencies (all non-key columns depend on the entire primary key).

❌ Violates 2NF:

CREATE TABLE order_items_bad (
    order_id INTEGER,
    product_id INTEGER,
    product_name VARCHAR(200),  -- Depends only on product_id!
    product_price DECIMAL(10,2),  -- Depends only on product_id!
    quantity INTEGER,
    PRIMARY KEY (order_id, product_id)
);

Problem: product_name and product_price depend only on product_id, not the full key!

✅ Follows 2NF:

CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name VARCHAR(200),
    price DECIMAL(10,2)
);

CREATE TABLE order_items (
    order_id INTEGER,
    product_id INTEGER REFERENCES products(id),
    quantity INTEGER,
    PRIMARY KEY (order_id, product_id)
);

Third Normal Form (3NF)

Rule: 2NF + No transitive dependencies (non-key columns don't depend on other non-key columns).

❌ Violates 3NF:

CREATE TABLE employees_bad (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    department_id INTEGER,
    department_name VARCHAR(100),  -- Depends on department_id!
    department_location VARCHAR(100)  -- Depends on department_id!
);

Problem: department_name depends on department_id (not directly on employee id).

✅ Follows 3NF:

CREATE TABLE departments (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    location VARCHAR(100)
);

CREATE TABLE employees (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    department_id INTEGER REFERENCES departments(id)
);

Practical Normalization Example

Step 1: Denormalized (Spreadsheet-style)

CREATE TABLE sales_denormalized (
    sale_id INTEGER,
    sale_date DATE,
    customer_name VARCHAR(100),
    customer_email VARCHAR(255),
    customer_city VARCHAR(50),
    product_name VARCHAR(200),
    product_category VARCHAR(50),
    quantity INTEGER,
    unit_price DECIMAL(10,2),
    total DECIMAL(10,2)
);

Step 2: Apply 1NF (Atomic Values)

Already atomic—no changes needed.

Step 3: Apply 2NF (Remove Partial Dependencies)

Extract customers and products:

CREATE TABLE customers (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    email VARCHAR(255) UNIQUE,
    city VARCHAR(50)
);

CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name VARCHAR(200),
    category VARCHAR(50),
    price DECIMAL(10,2)
);

CREATE TABLE sales (
    id SERIAL PRIMARY KEY,
    customer_id INTEGER REFERENCES customers(id),
    product_id INTEGER REFERENCES products(id),
    quantity INTEGER,
    unit_price DECIMAL(10,2),  -- Price at time of sale
    sale_date DATE
);

Step 4: Apply 3NF (Remove Transitive Dependencies)

Extract product categories:

CREATE TABLE categories (
    id SERIAL PRIMARY KEY,
    name VARCHAR(50) UNIQUE
);

CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name VARCHAR(200),
    category_id INTEGER REFERENCES categories(id),
    price DECIMAL(10,2)
);

Final normalized schema:

customers
categories
products
sales

When to Denormalize

Normalization is ideal for data integrity, but sometimes performance requires denormalization.

Denormalization for Reads

-- Normalized (3 joins for a query)
SELECT c.name, cat.name, SUM(s.quantity)
FROM sales s
JOIN customers c ON s.customer_id = c.id
JOIN products p ON s.product_id = p.id
JOIN categories cat ON p.category_id = cat.id
GROUP BY c.name, cat.name;

-- Denormalized (faster, but redundant)
CREATE TABLE sales_denorm (
    id SERIAL PRIMARY KEY,
    customer_name VARCHAR(100),  -- Redundant
    product_name VARCHAR(200),    -- Redundant
    category_name VARCHAR(50),    -- Redundant
    quantity INTEGER,
    unit_price DECIMAL(10,2)
);

Trade-off:

✅ Faster reads (no joins)
❌ Slower writes (more data to update)
❌ Risk of inconsistency

Use cases:

Data warehouses
Read-heavy analytics
Reporting tables

Materialized Views (Best of Both Worlds)

CREATE MATERIALIZED VIEW sales_summary AS
SELECT
    c.name AS customer_name,
    cat.name AS category,
    SUM(s.quantity) AS total_quantity
FROM sales s
JOIN customers c ON s.customer_id = c.id
JOIN products p ON s.product_id = p.id
JOIN categories cat ON p.category_id = cat.id
GROUP BY c.name, cat.name;

-- Refresh periodically
REFRESH MATERIALIZED VIEW sales_summary;

Normalization Checklist

✅ Each table has a primary key
✅ No repeating groups (1NF)
✅ All columns depend on the entire key (2NF)
✅ No transitive dependencies (3NF)
✅ Foreign keys reference parent tables
✅ No redundant data

Common Patterns

Pattern 1: User Profiles (One-to-One)

CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    username VARCHAR(50) UNIQUE,
    password_hash VARCHAR(255)
);

CREATE TABLE user_profiles (
    user_id INTEGER PRIMARY KEY REFERENCES users(id),
    full_name VARCHAR(100),
    bio TEXT,
    avatar_url VARCHAR(255)
);

Pattern 2: Many-to-Many

CREATE TABLE tags (
    id SERIAL PRIMARY KEY,
    name VARCHAR(50) UNIQUE
);

CREATE TABLE posts (
    id SERIAL PRIMARY KEY,
    title VARCHAR(200),
    content TEXT
);

CREATE TABLE post_tags (
    post_id INTEGER REFERENCES posts(id),
    tag_id INTEGER REFERENCES tags(id),
    PRIMARY KEY (post_id, tag_id)
);

Pattern 3: Lookup Tables

CREATE TABLE order_statuses (
    id SERIAL PRIMARY KEY,
    name VARCHAR(20) UNIQUE  -- 'pending', 'shipped', 'delivered'
);

CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    customer_id INTEGER REFERENCES customers(id),
    status_id INTEGER REFERENCES order_statuses(id) DEFAULT 1
);

Practice Exercises

Exercise 1

Normalize this denormalized table:

CREATE TABLE library_bad (
    book_title VARCHAR(200),
    author_name VARCHAR(100),
    author_birth_year INTEGER,
    publisher_name VARCHAR(100),
    publisher_city VARCHAR(50),
    isbn VARCHAR(13),
    genre VARCHAR(50)
);

Solution

CREATE TABLE authors (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    birth_year INTEGER
);

CREATE TABLE publishers (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    city VARCHAR(50)
);

CREATE TABLE books (
    isbn VARCHAR(13) PRIMARY KEY,
    title VARCHAR(200),
    author_id INTEGER REFERENCES authors(id),
    publisher_id INTEGER REFERENCES publishers(id),
    genre VARCHAR(50)
);

Key Takeaways

✅ 1NF: Atomic values, no repeating groups
✅ 2NF: No partial dependencies on composite keys
✅ 3NF: No transitive dependencies
✅ Normalization reduces redundancy and prevents anomalies
✅ Denormalization can improve read performance at cost of integrity
✅ Materialized views offer denormalized performance with normalized integrity

Next Steps

You've mastered normalization! In the final lesson of Module 7, we'll explore Schema Design Best Practices—putting everything together to design robust, scalable databases.

Almost done with database design!

Normalization - Database Design Principles

Introduction

What if you store customer address in every order? When they move, you'd update hundreds of rows! This is a data anomaly caused by poor schema design.

Normalization is the process of organizing tables to minimize redundancy and dependency. It prevents anomalies and ensures data integrity.

The Problem: Denormalized Data

Bad Design Example

CREATE TABLE orders_bad (
    order_id INTEGER,
    customer_name VARCHAR(100),
    customer_email VARCHAR(255),
    customer_address TEXT,
    product_name VARCHAR(200),
    product_price DECIMAL(10,2),
    quantity INTEGER
);

Problems:

Update Anomaly: Customer moves? Update every order!
Insertion Anomaly: Can't add customer without an order
Deletion Anomaly: Delete last order? Lose customer data!
Data Redundancy: Customer info repeated for every order

The Solution: Normalized Design

CREATE TABLE customers (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    email VARCHAR(255) UNIQUE NOT NULL,
    address TEXT
);

CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name VARCHAR(200) NOT NULL,
    price DECIMAL(10,2) NOT NULL
);

CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    customer_id INTEGER REFERENCES customers(id),
    order_date TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE order_items (
    order_id INTEGER REFERENCES orders(id),
    product_id INTEGER REFERENCES products(id),
    quantity INTEGER NOT NULL,
    price DECIMAL(10,2) NOT NULL,  -- Price at time of purchase
    PRIMARY KEY (order_id, product_id)
);

Benefits:

Customer info stored once
Update address once
No redundancy

Normal Forms

First Normal Form (1NF)

Rule: Each column contains atomic (indivisible) values. No repeating groups.

❌ Violates 1NF:

CREATE TABLE students_bad (
    id INTEGER,
    name VARCHAR(100),
    courses VARCHAR(255)  -- 'Math,Science,History' (non-atomic!)
);

✅ Follows 1NF:

CREATE TABLE students (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100)
);

CREATE TABLE student_courses (
    student_id INTEGER REFERENCES students(id),
    course_name VARCHAR(100),
    PRIMARY KEY (student_id, course_name)
);

Second Normal Form (2NF)

Rule: 1NF + No partial dependencies (all non-key columns depend on the entire primary key).

❌ Violates 2NF:

CREATE TABLE order_items_bad (
    order_id INTEGER,
    product_id INTEGER,
    product_name VARCHAR(200),  -- Depends only on product_id!
    product_price DECIMAL(10,2),  -- Depends only on product_id!
    quantity INTEGER,
    PRIMARY KEY (order_id, product_id)
);

Problem: product_name and product_price depend only on product_id, not the full key!

✅ Follows 2NF:

CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name VARCHAR(200),
    price DECIMAL(10,2)
);

CREATE TABLE order_items (
    order_id INTEGER,
    product_id INTEGER REFERENCES products(id),
    quantity INTEGER,
    PRIMARY KEY (order_id, product_id)
);

Third Normal Form (3NF)

Rule: 2NF + No transitive dependencies (non-key columns don't depend on other non-key columns).

❌ Violates 3NF:

CREATE TABLE employees_bad (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    department_id INTEGER,
    department_name VARCHAR(100),  -- Depends on department_id!
    department_location VARCHAR(100)  -- Depends on department_id!
);

Problem: department_name depends on department_id (not directly on employee id).

✅ Follows 3NF:

CREATE TABLE departments (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    location VARCHAR(100)
);

CREATE TABLE employees (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    department_id INTEGER REFERENCES departments(id)
);

Practical Normalization Example

Step 1: Denormalized (Spreadsheet-style)

CREATE TABLE sales_denormalized (
    sale_id INTEGER,
    sale_date DATE,
    customer_name VARCHAR(100),
    customer_email VARCHAR(255),
    customer_city VARCHAR(50),
    product_name VARCHAR(200),
    product_category VARCHAR(50),
    quantity INTEGER,
    unit_price DECIMAL(10,2),
    total DECIMAL(10,2)
);

Step 2: Apply 1NF (Atomic Values)

Already atomic—no changes needed.

Step 3: Apply 2NF (Remove Partial Dependencies)

Extract customers and products:

CREATE TABLE customers (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    email VARCHAR(255) UNIQUE,
    city VARCHAR(50)
);

CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name VARCHAR(200),
    category VARCHAR(50),
    price DECIMAL(10,2)
);

CREATE TABLE sales (
    id SERIAL PRIMARY KEY,
    customer_id INTEGER REFERENCES customers(id),
    product_id INTEGER REFERENCES products(id),
    quantity INTEGER,
    unit_price DECIMAL(10,2),  -- Price at time of sale
    sale_date DATE
);

Step 4: Apply 3NF (Remove Transitive Dependencies)

Extract product categories:

CREATE TABLE categories (
    id SERIAL PRIMARY KEY,
    name VARCHAR(50) UNIQUE
);

CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name VARCHAR(200),
    category_id INTEGER REFERENCES categories(id),
    price DECIMAL(10,2)
);

Final normalized schema:

customers
categories
products
sales

When to Denormalize

Normalization is ideal for data integrity, but sometimes performance requires denormalization.

Denormalization for Reads

-- Normalized (3 joins for a query)
SELECT c.name, cat.name, SUM(s.quantity)
FROM sales s
JOIN customers c ON s.customer_id = c.id
JOIN products p ON s.product_id = p.id
JOIN categories cat ON p.category_id = cat.id
GROUP BY c.name, cat.name;

-- Denormalized (faster, but redundant)
CREATE TABLE sales_denorm (
    id SERIAL PRIMARY KEY,
    customer_name VARCHAR(100),  -- Redundant
    product_name VARCHAR(200),    -- Redundant
    category_name VARCHAR(50),    -- Redundant
    quantity INTEGER,
    unit_price DECIMAL(10,2)
);

Trade-off:

✅ Faster reads (no joins)
❌ Slower writes (more data to update)
❌ Risk of inconsistency

Use cases:

Data warehouses
Read-heavy analytics
Reporting tables

Materialized Views (Best of Both Worlds)

CREATE MATERIALIZED VIEW sales_summary AS
SELECT
    c.name AS customer_name,
    cat.name AS category,
    SUM(s.quantity) AS total_quantity
FROM sales s
JOIN customers c ON s.customer_id = c.id
JOIN products p ON s.product_id = p.id
JOIN categories cat ON p.category_id = cat.id
GROUP BY c.name, cat.name;

-- Refresh periodically
REFRESH MATERIALIZED VIEW sales_summary;

Normalization Checklist

✅ Each table has a primary key
✅ No repeating groups (1NF)
✅ All columns depend on the entire key (2NF)
✅ No transitive dependencies (3NF)
✅ Foreign keys reference parent tables
✅ No redundant data

Common Patterns

Pattern 1: User Profiles (One-to-One)

CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    username VARCHAR(50) UNIQUE,
    password_hash VARCHAR(255)
);

CREATE TABLE user_profiles (
    user_id INTEGER PRIMARY KEY REFERENCES users(id),
    full_name VARCHAR(100),
    bio TEXT,
    avatar_url VARCHAR(255)
);

Pattern 2: Many-to-Many

CREATE TABLE tags (
    id SERIAL PRIMARY KEY,
    name VARCHAR(50) UNIQUE
);

CREATE TABLE posts (
    id SERIAL PRIMARY KEY,
    title VARCHAR(200),
    content TEXT
);

CREATE TABLE post_tags (
    post_id INTEGER REFERENCES posts(id),
    tag_id INTEGER REFERENCES tags(id),
    PRIMARY KEY (post_id, tag_id)
);

Pattern 3: Lookup Tables

CREATE TABLE order_statuses (
    id SERIAL PRIMARY KEY,
    name VARCHAR(20) UNIQUE  -- 'pending', 'shipped', 'delivered'
);

CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    customer_id INTEGER REFERENCES customers(id),
    status_id INTEGER REFERENCES order_statuses(id) DEFAULT 1
);

Practice Exercises

Exercise 1

Normalize this denormalized table:

CREATE TABLE library_bad (
    book_title VARCHAR(200),
    author_name VARCHAR(100),
    author_birth_year INTEGER,
    publisher_name VARCHAR(100),
    publisher_city VARCHAR(50),
    isbn VARCHAR(13),
    genre VARCHAR(50)
);

Solution

CREATE TABLE authors (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    birth_year INTEGER
);

CREATE TABLE publishers (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    city VARCHAR(50)
);

CREATE TABLE books (
    isbn VARCHAR(13) PRIMARY KEY,
    title VARCHAR(200),
    author_id INTEGER REFERENCES authors(id),
    publisher_id INTEGER REFERENCES publishers(id),
    genre VARCHAR(50)
);

Key Takeaways

✅ 1NF: Atomic values, no repeating groups
✅ 2NF: No partial dependencies on composite keys
✅ 3NF: No transitive dependencies
✅ Normalization reduces redundancy and prevents anomalies
✅ Denormalization can improve read performance at cost of integrity
✅ Materialized views offer denormalized performance with normalized integrity

Next Steps

You've mastered normalization! In the final lesson of Module 7, we'll explore Schema Design Best Practices—putting everything together to design robust, scalable databases.

Almost done with database design!

Normalization - Database Design Principles

Introduction

The Problem: Denormalized Data

Bad Design Example

The Solution: Normalized Design

Normal Forms

First Normal Form (1NF)

Second Normal Form (2NF)

Third Normal Form (3NF)

Practical Normalization Example

Step 1: Denormalized (Spreadsheet-style)

Step 2: Apply 1NF (Atomic Values)

Step 3: Apply 2NF (Remove Partial Dependencies)

Step 4: Apply 3NF (Remove Transitive Dependencies)

When to Denormalize

Denormalization for Reads

Materialized Views (Best of Both Worlds)

Normalization Checklist

Common Patterns

Pattern 1: User Profiles (One-to-One)

Pattern 2: Many-to-Many

Pattern 3: Lookup Tables

Practice Exercises

Exercise 1

Key Takeaways

Next Steps

Quiz

Normalization - Database Design Principles

Introduction

The Problem: Denormalized Data

Bad Design Example

The Solution: Normalized Design

Normal Forms

First Normal Form (1NF)

Second Normal Form (2NF)

Third Normal Form (3NF)

Practical Normalization Example

Step 1: Denormalized (Spreadsheet-style)

Step 2: Apply 1NF (Atomic Values)

Step 3: Apply 2NF (Remove Partial Dependencies)

Step 4: Apply 3NF (Remove Transitive Dependencies)

When to Denormalize

Denormalization for Reads

Materialized Views (Best of Both Worlds)

Normalization Checklist

Common Patterns

Pattern 1: User Profiles (One-to-One)

Pattern 2: Many-to-Many

Pattern 3: Lookup Tables

Practice Exercises

Exercise 1

Key Takeaways

Next Steps

Quiz