GROUP BY - Grouping Data for Analysis

Introduction

Aggregate functions are powerful, but what if you want to calculate aggregates for each category instead of the entire table?

"How many orders per customer?"
"Total revenue per month?"
"Average price per product category?"

GROUP BY divides rows into groups and calculates aggregate functions for each group separately. It's one of SQL's most powerful features for data analysis and reporting.

The Problem Without GROUP BY

Aggregating Everything

-- Total order count (all customers combined)
SELECT COUNT(*) FROM orders;

Result:

 count
-------
  1000

Question: How many orders does each customer have?

You can't do this without GROUP BY!

GROUP BY Syntax

SELECT
    column1,
    column2,
    AGGREGATE_FUNCTION(column3)
FROM table
GROUP BY column1, column2;

Basic GROUP BY Example

-- Order count per customer
SELECT
    customer_id,
    COUNT(*) AS order_count
FROM orders
GROUP BY customer_id;

Result:

 customer_id | order_count
-------------+-------------
      1      |      12
      2      |       5
      3      |       8
      4      |       0  ← Won't appear (no orders)
     ...

Explanation:

Rows are grouped by customer_id
COUNT(*) is calculated for each group
Each unique customer_id gets one result row

How GROUP BY Works

Execution Order

SELECT customer_id, COUNT(*)
FROM orders
WHERE total > 50
GROUP BY customer_id
ORDER BY COUNT(*) DESC;

Execution steps:

FROM: Get data from orders table
WHERE: Filter orders with total > 50
GROUP BY: Group remaining rows by customer_id
SELECT: Calculate COUNT(*) for each group
ORDER BY: Sort results

Visualizing GROUP BY

Orders Table:
┌────┬─────────────┬───────┐
│ id │ customer_id │ total │
├────┼─────────────┼───────┤
│ 1  │      1      │ 100   │  ┐
│ 2  │      1      │  75   │  ├─ Group 1 (customer_id = 1)
│ 3  │      1      │ 200   │  ┘
│ 4  │      2      │ 150   │  ┐
│ 5  │      2      │  90   │  ├─ Group 2 (customer_id = 2)
│ 6  │      2      │ 120   │  ┘
│ 7  │      3      │  80   │  ──── Group 3 (customer_id = 3)
└────┴─────────────┴───────┘

After GROUP BY customer_id:
┌─────────────┬──────────────┬───────────┐
│ customer_id │ order_count  │ total_sum │
├─────────────┼──────────────┼───────────┤
│      1      │       3      │    375    │
│      2      │       3      │    360    │
│      3      │       1      │     80    │
└─────────────┴──────────────┴───────────┘

Grouping by Single Column

Count Orders per Customer

SELECT
    customer_id,
    COUNT(*) AS order_count
FROM orders
GROUP BY customer_id
ORDER BY order_count DESC;

Sum Revenue per Product

SELECT
    product_id,
    SUM(quantity * price) AS total_revenue
FROM order_items
GROUP BY product_id
ORDER BY total_revenue DESC;

Average Price per Category

SELECT
    category,
    COUNT(*) AS product_count,
    ROUND(AVG(price), 2) AS avg_price
FROM products
GROUP BY category;

Result:

   category   | product_count | avg_price
--------------+---------------+-----------
 Electronics  |      45       |   324.99
 Clothing     |     120       |    45.75
 Books        |      78       |    18.50

Grouping by Multiple Columns

Order Count by Customer and Year

SELECT
    customer_id,
    EXTRACT(YEAR FROM order_date) AS year,
    COUNT(*) AS order_count,
    SUM(total) AS total_spent
FROM orders
GROUP BY customer_id, EXTRACT(YEAR FROM order_date)
ORDER BY customer_id, year;

Result:

 customer_id | year | order_count | total_spent
-------------+------+-------------+-------------
      1      | 2023 |      5      |    487.25
      1      | 2024 |      7      |    687.50
      2      | 2023 |      2      |    150.00
      2      | 2024 |      3      |    425.75

Each unique combination of (customer_id, year) creates a separate group.

Sales by Category and Month

SELECT
    p.category,
    DATE_TRUNC('month', o.order_date) AS month,
    SUM(oi.quantity) AS units_sold,
    SUM(oi.quantity * oi.price) AS revenue
FROM products p
INNER JOIN order_items oi ON p.id = oi.product_id
INNER JOIN orders o ON oi.order_id = o.id
GROUP BY p.category, DATE_TRUNC('month', o.order_date)
ORDER BY category, month;

GROUP BY with JOINs

Customer Purchase Summary

SELECT
    c.id,
    c.name,
    COUNT(o.id) AS order_count,
    COALESCE(SUM(o.total), 0) AS total_spent,
    COALESCE(AVG(o.total), 0) AS avg_order_value
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
GROUP BY c.id, c.name
ORDER BY total_spent DESC;

Result:

 id |      name       | order_count | total_spent | avg_order_value
----+-----------------+-------------+-------------+-----------------
  1 | Alice Smith     |      12     |    1547.82  |      128.99
  2 | Bob Jones       |       5     |     687.25  |      137.45
  3 | Carol White     |       0     |       0.00  |        0.00

Important: Use LEFT JOIN to include customers with 0 orders!

Product Sales by Category

SELECT
    p.category,
    COUNT(DISTINCT p.id) AS product_count,
    COALESCE(SUM(oi.quantity), 0) AS units_sold,
    COALESCE(SUM(oi.quantity * oi.price), 0) AS revenue
FROM products p
LEFT JOIN order_items oi ON p.id = oi.product_id
GROUP BY p.category
ORDER BY revenue DESC;

The GROUP BY Rule

You Must Include Non-Aggregated Columns in GROUP BY

-- ❌ Error: name must be in GROUP BY
SELECT
    customer_id,
    name,           -- Not aggregated, not in GROUP BY!
    COUNT(*)
FROM orders
GROUP BY customer_id;

-- ✅ Correct
SELECT
    customer_id,
    name,
    COUNT(*)
FROM orders
GROUP BY customer_id, name;

Rule: Every column in SELECT must be either:

In the GROUP BY clause, OR
Inside an aggregate function

Valid Examples

-- ✅ All non-aggregates in GROUP BY
SELECT
    customer_id,
    product_id,
    SUM(quantity)
FROM order_items
GROUP BY customer_id, product_id;

-- ✅ Only aggregates in SELECT
SELECT
    COUNT(*),
    SUM(total),
    AVG(total)
FROM orders;

-- ✅ Mix of grouped columns and aggregates
SELECT
    category,
    COUNT(*),
    AVG(price)
FROM products
GROUP BY category;

Grouping by Expressions

Group by Year

SELECT
    EXTRACT(YEAR FROM order_date) AS year,
    COUNT(*) AS order_count,
    SUM(total) AS revenue
FROM orders
GROUP BY EXTRACT(YEAR FROM order_date)
ORDER BY year;

Group by Month

SELECT
    DATE_TRUNC('month', order_date) AS month,
    COUNT(*) AS orders,
    SUM(total) AS revenue
FROM orders
WHERE order_date >= '2024-01-01'
GROUP BY DATE_TRUNC('month', order_date)
ORDER BY month;

Result:

        month        | orders | revenue
---------------------+--------+---------
 2024-01-01 00:00:00 |    87  | 10247.50
 2024-02-01 00:00:00 |    92  | 11580.25
 2024-03-01 00:00:00 |   104  | 13245.75

Group by Price Range

SELECT
    CASE
        WHEN price < 20 THEN 'Budget'
        WHEN price < 100 THEN 'Mid-range'
        ELSE 'Premium'
    END AS price_tier,
    COUNT(*) AS product_count,
    ROUND(AVG(price), 2) AS avg_price
FROM products
GROUP BY
    CASE
        WHEN price < 20 THEN 'Budget'
        WHEN price < 100 THEN 'Mid-range'
        ELSE 'Premium'
    END
ORDER BY avg_price;

Result:

 price_tier | product_count | avg_price
------------+---------------+-----------
 Budget     |      78       |    12.45
 Mid-range  |     120       |    54.80
 Premium    |      45       |   324.99

DISTINCT vs GROUP BY

Finding Unique Values

-- Using DISTINCT
SELECT DISTINCT category
FROM products;

-- Using GROUP BY
SELECT category
FROM products
GROUP BY category;

Both return the same result, but GROUP BY allows aggregates:

-- ✅ With GROUP BY
SELECT
    category,
    COUNT(*) AS product_count
FROM products
GROUP BY category;

-- ❌ Can't do this with DISTINCT
SELECT DISTINCT
    category,
    COUNT(*)  -- Error!
FROM products;

Use DISTINCT for unique values, GROUP BY for aggregates.

Common GROUP BY Patterns

Pattern 1: Top N per Category

-- Top 3 products per category by revenue
WITH product_revenue AS (
    SELECT
        p.category,
        p.name,
        SUM(oi.quantity * oi.price) AS revenue,
        ROW_NUMBER() OVER (PARTITION BY p.category ORDER BY SUM(oi.quantity * oi.price) DESC) AS rank
    FROM products p
    INNER JOIN order_items oi ON p.id = oi.product_id
    GROUP BY p.category, p.id, p.name
)
SELECT category, name, revenue
FROM product_revenue
WHERE rank <= 3
ORDER BY category, rank;

(Window functions covered in advanced courses)

Pattern 2: Percentage of Total

-- Each category's share of total revenue
SELECT
    category,
    SUM(quantity * price) AS revenue,
    ROUND(100.0 * SUM(quantity * price) / (SELECT SUM(quantity * price) FROM order_items), 2) AS pct_of_total
FROM products p
INNER JOIN order_items oi ON p.id = oi.product_id
GROUP BY category
ORDER BY revenue DESC;

Result:

   category   |  revenue  | pct_of_total
--------------+-----------+--------------
 Electronics  |  87500.25 |        45.67
 Clothing     |  52400.75 |        27.34
 Books        |  32100.50 |        16.75

Pattern 3: Cohort Analysis

-- Customer cohorts by signup month
SELECT
    DATE_TRUNC('month', created_at) AS cohort_month,
    COUNT(*) AS new_customers,
    SUM(COUNT(*)) OVER (ORDER BY DATE_TRUNC('month', created_at)) AS cumulative_customers
FROM customers
GROUP BY DATE_TRUNC('month', created_at)
ORDER BY cohort_month;

Pattern 4: Daily/Weekly/Monthly Aggregates

-- Daily order summary
SELECT
    DATE(order_date) AS date,
    COUNT(*) AS orders,
    SUM(total) AS revenue,
    AVG(total) AS avg_order_value
FROM orders
WHERE order_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY DATE(order_date)
ORDER BY date DESC;

GROUP BY with NULL Values

NULL Creates Its Own Group

CREATE TABLE employees (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    department VARCHAR(50)
);

INSERT INTO employees VALUES
    (1, 'Alice', 'Sales'),
    (2, 'Bob', 'Sales'),
    (3, 'Carol', NULL),      -- No department
    (4, 'Dave', NULL);       -- No department

SELECT
    department,
    COUNT(*) AS employee_count
FROM employees
GROUP BY department;

Result:

 department | employee_count
------------+----------------
 Sales      |       2
 NULL       |       2        ← NULL is a group

Handling NULL in Groups

SELECT
    COALESCE(department, 'Unassigned') AS department,
    COUNT(*) AS employee_count
FROM employees
GROUP BY department;

Result:

 department  | employee_count
-------------+----------------
 Sales       |       2
 Unassigned  |       2

Performance Tips

Index Grouped Columns

-- If you frequently GROUP BY category
CREATE INDEX idx_products_category ON products(category);

-- Speeds up this query:
SELECT category, COUNT(*)
FROM products
GROUP BY category;

Filter Before Grouping

-- ✅ Better: Filter first
SELECT
    category,
    COUNT(*)
FROM products
WHERE active = TRUE  -- Filter reduces rows before grouping
GROUP BY category;

-- ❌ Slower: Group everything, then filter
SELECT
    category,
    COUNT(*)
FROM products
GROUP BY category
HAVING SUM(CASE WHEN active = TRUE THEN 1 ELSE 0 END) > 0;

Limit Groups with LIMIT

-- Top 10 categories by product count
SELECT
    category,
    COUNT(*) AS product_count
FROM products
GROUP BY category
ORDER BY product_count DESC
LIMIT 10;

Common Mistakes

Mistake 1: Selecting Non-Grouped Columns

-- ❌ Error
SELECT
    customer_id,
    name,           -- Must be in GROUP BY!
    COUNT(*)
FROM orders
GROUP BY customer_id;

Mistake 2: Using WHERE Instead of HAVING

-- ❌ Error: Can't use aggregate in WHERE
SELECT customer_id, COUNT(*)
FROM orders
WHERE COUNT(*) > 5  -- Error!
GROUP BY customer_id;

-- ✅ Use HAVING (next lesson)
SELECT customer_id, COUNT(*)
FROM orders
GROUP BY customer_id
HAVING COUNT(*) > 5;

Mistake 3: Forgetting COALESCE with LEFT JOIN

-- ❌ Wrong: Customers without orders won't show 0
SELECT
    c.name,
    SUM(o.total) AS total_spent
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
GROUP BY c.id, c.name;
-- Carol with no orders: total_spent = NULL

-- ✅ Correct
SELECT
    c.name,
    COALESCE(SUM(o.total), 0) AS total_spent
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
GROUP BY c.id, c.name;
-- Carol with no orders: total_spent = 0

Practice Exercises

Exercise 1

Find the number of products in each category.

Solution

SELECT
    category,
    COUNT(*) AS product_count
FROM products
GROUP BY category
ORDER BY product_count DESC;

Exercise 2

Calculate total revenue per customer (include all customers, even those with $0).

Solution

SELECT
    c.id,
    c.name,
    COALESCE(SUM(o.total), 0) AS total_spent
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
GROUP BY c.id, c.name
ORDER BY total_spent DESC;

Exercise 3

Find monthly revenue for 2024.

Solution

SELECT
    DATE_TRUNC('month', order_date) AS month,
    SUM(total) AS revenue
FROM orders
WHERE EXTRACT(YEAR FROM order_date) = 2024
GROUP BY DATE_TRUNC('month', order_date)
ORDER BY month;

Exercise 4

Count how many products fall into each price tier (Budget < $20, Mid-range $20-$100, Premium > $100).

Solution

SELECT
    CASE
        WHEN price < 20 THEN 'Budget'
        WHEN price <= 100 THEN 'Mid-range'
        ELSE 'Premium'
    END AS price_tier,
    COUNT(*) AS product_count
FROM products
GROUP BY
    CASE
        WHEN price < 20 THEN 'Budget'
        WHEN price <= 100 THEN 'Mid-range'
        ELSE 'Premium'
    END
ORDER BY product_count DESC;

Key Takeaways

✅ GROUP BY divides rows into groups and calculates aggregates per group
✅ Every non-aggregated column in SELECT must be in GROUP BY
✅ Execution order: FROM → WHERE → GROUP BY → SELECT → ORDER BY
✅ Group by multiple columns for finer-grained analysis
✅ Use with JOINs to aggregate across tables
✅ NULL values create their own group
✅ Index grouped columns for better performance
✅ COALESCE handles NULLs in aggregates with LEFT JOIN
✅ Filter before grouping with WHERE (filter groups with HAVING)

Next Steps

You've mastered GROUP BY! In the next lesson, we'll learn HAVING—how to filter groups based on aggregate conditions (e.g., "customers with more than 5 orders" or "categories with average price > $100").

Almost done with Module 5!

GROUP BY - Grouping Data for Analysis

Introduction

Aggregate functions are powerful, but what if you want to calculate aggregates for each category instead of the entire table?

"How many orders per customer?"
"Total revenue per month?"
"Average price per product category?"

GROUP BY divides rows into groups and calculates aggregate functions for each group separately. It's one of SQL's most powerful features for data analysis and reporting.

The Problem Without GROUP BY

Aggregating Everything

-- Total order count (all customers combined)
SELECT COUNT(*) FROM orders;

Result:

 count
-------
  1000

Question: How many orders does each customer have?

You can't do this without GROUP BY!

GROUP BY Syntax

SELECT
    column1,
    column2,
    AGGREGATE_FUNCTION(column3)
FROM table
GROUP BY column1, column2;

Basic GROUP BY Example

-- Order count per customer
SELECT
    customer_id,
    COUNT(*) AS order_count
FROM orders
GROUP BY customer_id;

Result:

 customer_id | order_count
-------------+-------------
      1      |      12
      2      |       5
      3      |       8
      4      |       0  ← Won't appear (no orders)
     ...

Explanation:

Rows are grouped by customer_id
COUNT(*) is calculated for each group
Each unique customer_id gets one result row

How GROUP BY Works

Execution Order

SELECT customer_id, COUNT(*)
FROM orders
WHERE total > 50
GROUP BY customer_id
ORDER BY COUNT(*) DESC;

Execution steps:

FROM: Get data from orders table
WHERE: Filter orders with total > 50
GROUP BY: Group remaining rows by customer_id
SELECT: Calculate COUNT(*) for each group
ORDER BY: Sort results

Visualizing GROUP BY

Orders Table:
┌────┬─────────────┬───────┐
│ id │ customer_id │ total │
├────┼─────────────┼───────┤
│ 1  │      1      │ 100   │  ┐
│ 2  │      1      │  75   │  ├─ Group 1 (customer_id = 1)
│ 3  │      1      │ 200   │  ┘
│ 4  │      2      │ 150   │  ┐
│ 5  │      2      │  90   │  ├─ Group 2 (customer_id = 2)
│ 6  │      2      │ 120   │  ┘
│ 7  │      3      │  80   │  ──── Group 3 (customer_id = 3)
└────┴─────────────┴───────┘

After GROUP BY customer_id:
┌─────────────┬──────────────┬───────────┐
│ customer_id │ order_count  │ total_sum │
├─────────────┼──────────────┼───────────┤
│      1      │       3      │    375    │
│      2      │       3      │    360    │
│      3      │       1      │     80    │
└─────────────┴──────────────┴───────────┘

Grouping by Single Column

Count Orders per Customer

SELECT
    customer_id,
    COUNT(*) AS order_count
FROM orders
GROUP BY customer_id
ORDER BY order_count DESC;

Sum Revenue per Product

SELECT
    product_id,
    SUM(quantity * price) AS total_revenue
FROM order_items
GROUP BY product_id
ORDER BY total_revenue DESC;

Average Price per Category

SELECT
    category,
    COUNT(*) AS product_count,
    ROUND(AVG(price), 2) AS avg_price
FROM products
GROUP BY category;

Result:

   category   | product_count | avg_price
--------------+---------------+-----------
 Electronics  |      45       |   324.99
 Clothing     |     120       |    45.75
 Books        |      78       |    18.50

Grouping by Multiple Columns

Order Count by Customer and Year

SELECT
    customer_id,
    EXTRACT(YEAR FROM order_date) AS year,
    COUNT(*) AS order_count,
    SUM(total) AS total_spent
FROM orders
GROUP BY customer_id, EXTRACT(YEAR FROM order_date)
ORDER BY customer_id, year;

Result:

 customer_id | year | order_count | total_spent
-------------+------+-------------+-------------
      1      | 2023 |      5      |    487.25
      1      | 2024 |      7      |    687.50
      2      | 2023 |      2      |    150.00
      2      | 2024 |      3      |    425.75

Each unique combination of (customer_id, year) creates a separate group.

Sales by Category and Month

SELECT
    p.category,
    DATE_TRUNC('month', o.order_date) AS month,
    SUM(oi.quantity) AS units_sold,
    SUM(oi.quantity * oi.price) AS revenue
FROM products p
INNER JOIN order_items oi ON p.id = oi.product_id
INNER JOIN orders o ON oi.order_id = o.id
GROUP BY p.category, DATE_TRUNC('month', o.order_date)
ORDER BY category, month;

GROUP BY with JOINs

Customer Purchase Summary

SELECT
    c.id,
    c.name,
    COUNT(o.id) AS order_count,
    COALESCE(SUM(o.total), 0) AS total_spent,
    COALESCE(AVG(o.total), 0) AS avg_order_value
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
GROUP BY c.id, c.name
ORDER BY total_spent DESC;

Result:

 id |      name       | order_count | total_spent | avg_order_value
----+-----------------+-------------+-------------+-----------------
  1 | Alice Smith     |      12     |    1547.82  |      128.99
  2 | Bob Jones       |       5     |     687.25  |      137.45
  3 | Carol White     |       0     |       0.00  |        0.00

Important: Use LEFT JOIN to include customers with 0 orders!

Product Sales by Category

SELECT
    p.category,
    COUNT(DISTINCT p.id) AS product_count,
    COALESCE(SUM(oi.quantity), 0) AS units_sold,
    COALESCE(SUM(oi.quantity * oi.price), 0) AS revenue
FROM products p
LEFT JOIN order_items oi ON p.id = oi.product_id
GROUP BY p.category
ORDER BY revenue DESC;

The GROUP BY Rule

You Must Include Non-Aggregated Columns in GROUP BY

-- ❌ Error: name must be in GROUP BY
SELECT
    customer_id,
    name,           -- Not aggregated, not in GROUP BY!
    COUNT(*)
FROM orders
GROUP BY customer_id;

-- ✅ Correct
SELECT
    customer_id,
    name,
    COUNT(*)
FROM orders
GROUP BY customer_id, name;

Rule: Every column in SELECT must be either:

In the GROUP BY clause, OR
Inside an aggregate function

Valid Examples

-- ✅ All non-aggregates in GROUP BY
SELECT
    customer_id,
    product_id,
    SUM(quantity)
FROM order_items
GROUP BY customer_id, product_id;

-- ✅ Only aggregates in SELECT
SELECT
    COUNT(*),
    SUM(total),
    AVG(total)
FROM orders;

-- ✅ Mix of grouped columns and aggregates
SELECT
    category,
    COUNT(*),
    AVG(price)
FROM products
GROUP BY category;

Grouping by Expressions

Group by Year

SELECT
    EXTRACT(YEAR FROM order_date) AS year,
    COUNT(*) AS order_count,
    SUM(total) AS revenue
FROM orders
GROUP BY EXTRACT(YEAR FROM order_date)
ORDER BY year;

Group by Month

SELECT
    DATE_TRUNC('month', order_date) AS month,
    COUNT(*) AS orders,
    SUM(total) AS revenue
FROM orders
WHERE order_date >= '2024-01-01'
GROUP BY DATE_TRUNC('month', order_date)
ORDER BY month;

Result:

        month        | orders | revenue
---------------------+--------+---------
 2024-01-01 00:00:00 |    87  | 10247.50
 2024-02-01 00:00:00 |    92  | 11580.25
 2024-03-01 00:00:00 |   104  | 13245.75

Group by Price Range

SELECT
    CASE
        WHEN price < 20 THEN 'Budget'
        WHEN price < 100 THEN 'Mid-range'
        ELSE 'Premium'
    END AS price_tier,
    COUNT(*) AS product_count,
    ROUND(AVG(price), 2) AS avg_price
FROM products
GROUP BY
    CASE
        WHEN price < 20 THEN 'Budget'
        WHEN price < 100 THEN 'Mid-range'
        ELSE 'Premium'
    END
ORDER BY avg_price;

Result:

 price_tier | product_count | avg_price
------------+---------------+-----------
 Budget     |      78       |    12.45
 Mid-range  |     120       |    54.80
 Premium    |      45       |   324.99

DISTINCT vs GROUP BY

Finding Unique Values

-- Using DISTINCT
SELECT DISTINCT category
FROM products;

-- Using GROUP BY
SELECT category
FROM products
GROUP BY category;

Both return the same result, but GROUP BY allows aggregates:

-- ✅ With GROUP BY
SELECT
    category,
    COUNT(*) AS product_count
FROM products
GROUP BY category;

-- ❌ Can't do this with DISTINCT
SELECT DISTINCT
    category,
    COUNT(*)  -- Error!
FROM products;

Use DISTINCT for unique values, GROUP BY for aggregates.

Common GROUP BY Patterns

Pattern 1: Top N per Category

-- Top 3 products per category by revenue
WITH product_revenue AS (
    SELECT
        p.category,
        p.name,
        SUM(oi.quantity * oi.price) AS revenue,
        ROW_NUMBER() OVER (PARTITION BY p.category ORDER BY SUM(oi.quantity * oi.price) DESC) AS rank
    FROM products p
    INNER JOIN order_items oi ON p.id = oi.product_id
    GROUP BY p.category, p.id, p.name
)
SELECT category, name, revenue
FROM product_revenue
WHERE rank <= 3
ORDER BY category, rank;

(Window functions covered in advanced courses)

Pattern 2: Percentage of Total

-- Each category's share of total revenue
SELECT
    category,
    SUM(quantity * price) AS revenue,
    ROUND(100.0 * SUM(quantity * price) / (SELECT SUM(quantity * price) FROM order_items), 2) AS pct_of_total
FROM products p
INNER JOIN order_items oi ON p.id = oi.product_id
GROUP BY category
ORDER BY revenue DESC;

Result:

   category   |  revenue  | pct_of_total
--------------+-----------+--------------
 Electronics  |  87500.25 |        45.67
 Clothing     |  52400.75 |        27.34
 Books        |  32100.50 |        16.75

Pattern 3: Cohort Analysis

-- Customer cohorts by signup month
SELECT
    DATE_TRUNC('month', created_at) AS cohort_month,
    COUNT(*) AS new_customers,
    SUM(COUNT(*)) OVER (ORDER BY DATE_TRUNC('month', created_at)) AS cumulative_customers
FROM customers
GROUP BY DATE_TRUNC('month', created_at)
ORDER BY cohort_month;

Pattern 4: Daily/Weekly/Monthly Aggregates

-- Daily order summary
SELECT
    DATE(order_date) AS date,
    COUNT(*) AS orders,
    SUM(total) AS revenue,
    AVG(total) AS avg_order_value
FROM orders
WHERE order_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY DATE(order_date)
ORDER BY date DESC;

GROUP BY with NULL Values

NULL Creates Its Own Group

CREATE TABLE employees (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    department VARCHAR(50)
);

INSERT INTO employees VALUES
    (1, 'Alice', 'Sales'),
    (2, 'Bob', 'Sales'),
    (3, 'Carol', NULL),      -- No department
    (4, 'Dave', NULL);       -- No department

SELECT
    department,
    COUNT(*) AS employee_count
FROM employees
GROUP BY department;

Result:

 department | employee_count
------------+----------------
 Sales      |       2
 NULL       |       2        ← NULL is a group

Handling NULL in Groups

SELECT
    COALESCE(department, 'Unassigned') AS department,
    COUNT(*) AS employee_count
FROM employees
GROUP BY department;

Result:

 department  | employee_count
-------------+----------------
 Sales       |       2
 Unassigned  |       2

Performance Tips

Index Grouped Columns

-- If you frequently GROUP BY category
CREATE INDEX idx_products_category ON products(category);

-- Speeds up this query:
SELECT category, COUNT(*)
FROM products
GROUP BY category;

Filter Before Grouping

-- ✅ Better: Filter first
SELECT
    category,
    COUNT(*)
FROM products
WHERE active = TRUE  -- Filter reduces rows before grouping
GROUP BY category;

-- ❌ Slower: Group everything, then filter
SELECT
    category,
    COUNT(*)
FROM products
GROUP BY category
HAVING SUM(CASE WHEN active = TRUE THEN 1 ELSE 0 END) > 0;

Limit Groups with LIMIT

-- Top 10 categories by product count
SELECT
    category,
    COUNT(*) AS product_count
FROM products
GROUP BY category
ORDER BY product_count DESC
LIMIT 10;

Common Mistakes

Mistake 1: Selecting Non-Grouped Columns

-- ❌ Error
SELECT
    customer_id,
    name,           -- Must be in GROUP BY!
    COUNT(*)
FROM orders
GROUP BY customer_id;

Mistake 2: Using WHERE Instead of HAVING

-- ❌ Error: Can't use aggregate in WHERE
SELECT customer_id, COUNT(*)
FROM orders
WHERE COUNT(*) > 5  -- Error!
GROUP BY customer_id;

-- ✅ Use HAVING (next lesson)
SELECT customer_id, COUNT(*)
FROM orders
GROUP BY customer_id
HAVING COUNT(*) > 5;

Mistake 3: Forgetting COALESCE with LEFT JOIN

-- ❌ Wrong: Customers without orders won't show 0
SELECT
    c.name,
    SUM(o.total) AS total_spent
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
GROUP BY c.id, c.name;
-- Carol with no orders: total_spent = NULL

-- ✅ Correct
SELECT
    c.name,
    COALESCE(SUM(o.total), 0) AS total_spent
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
GROUP BY c.id, c.name;
-- Carol with no orders: total_spent = 0

Practice Exercises

Exercise 1

Find the number of products in each category.

Solution

SELECT
    category,
    COUNT(*) AS product_count
FROM products
GROUP BY category
ORDER BY product_count DESC;

Exercise 2

Calculate total revenue per customer (include all customers, even those with $0).

Solution

SELECT
    c.id,
    c.name,
    COALESCE(SUM(o.total), 0) AS total_spent
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
GROUP BY c.id, c.name
ORDER BY total_spent DESC;

Exercise 3

Find monthly revenue for 2024.

Solution

SELECT
    DATE_TRUNC('month', order_date) AS month,
    SUM(total) AS revenue
FROM orders
WHERE EXTRACT(YEAR FROM order_date) = 2024
GROUP BY DATE_TRUNC('month', order_date)
ORDER BY month;

Exercise 4

Count how many products fall into each price tier (Budget < $20, Mid-range $20-$100, Premium > $100).

Solution

SELECT
    CASE
        WHEN price < 20 THEN 'Budget'
        WHEN price <= 100 THEN 'Mid-range'
        ELSE 'Premium'
    END AS price_tier,
    COUNT(*) AS product_count
FROM products
GROUP BY
    CASE
        WHEN price < 20 THEN 'Budget'
        WHEN price <= 100 THEN 'Mid-range'
        ELSE 'Premium'
    END
ORDER BY product_count DESC;

Key Takeaways

✅ GROUP BY divides rows into groups and calculates aggregates per group
✅ Every non-aggregated column in SELECT must be in GROUP BY
✅ Execution order: FROM → WHERE → GROUP BY → SELECT → ORDER BY
✅ Group by multiple columns for finer-grained analysis
✅ Use with JOINs to aggregate across tables
✅ NULL values create their own group
✅ Index grouped columns for better performance
✅ COALESCE handles NULLs in aggregates with LEFT JOIN
✅ Filter before grouping with WHERE (filter groups with HAVING)

Next Steps

Almost done with Module 5!

GROUP BY - Grouping Data for Analysis

Introduction

The Problem Without GROUP BY

Aggregating Everything

GROUP BY Syntax

Basic GROUP BY Example

How GROUP BY Works

Execution Order

Visualizing GROUP BY

Grouping by Single Column

Count Orders per Customer

Sum Revenue per Product

Average Price per Category

Grouping by Multiple Columns

Order Count by Customer and Year

Sales by Category and Month

GROUP BY with JOINs

Customer Purchase Summary

Product Sales by Category

The GROUP BY Rule

You Must Include Non-Aggregated Columns in GROUP BY

Valid Examples

Grouping by Expressions

Group by Year

Group by Month

Group by Price Range

DISTINCT vs GROUP BY

Finding Unique Values

Common GROUP BY Patterns

Pattern 1: Top N per Category

Pattern 2: Percentage of Total

Pattern 3: Cohort Analysis

Pattern 4: Daily/Weekly/Monthly Aggregates

GROUP BY with NULL Values

NULL Creates Its Own Group

Handling NULL in Groups

Performance Tips

Index Grouped Columns

Filter Before Grouping

Limit Groups with LIMIT

Common Mistakes

Mistake 1: Selecting Non-Grouped Columns

Mistake 2: Using WHERE Instead of HAVING

Mistake 3: Forgetting COALESCE with LEFT JOIN

Practice Exercises

Exercise 1

Exercise 2

Exercise 3

Exercise 4

Key Takeaways

Next Steps

Quiz

GROUP BY - Grouping Data for Analysis

Introduction

The Problem Without GROUP BY

Aggregating Everything

GROUP BY Syntax

Basic GROUP BY Example

How GROUP BY Works

Execution Order

Visualizing GROUP BY

Grouping by Single Column

Count Orders per Customer

Sum Revenue per Product

Average Price per Category

Grouping by Multiple Columns

Order Count by Customer and Year

Sales by Category and Month

GROUP BY with JOINs

Customer Purchase Summary

Product Sales by Category

The GROUP BY Rule

You Must Include Non-Aggregated Columns in GROUP BY

Valid Examples

Grouping by Expressions

Group by Year

Group by Month

Group by Price Range

DISTINCT vs GROUP BY

Finding Unique Values