GROUP BY - Grouping Data for Analysis
Introduction
Aggregate functions are powerful, but what if you want to calculate aggregates for each category instead of the entire table?
- "How many orders per customer?"
- "Total revenue per month?"
- "Average price per product category?"
GROUP BY divides rows into groups and calculates aggregate functions for each group separately. It's one of SQL's most powerful features for data analysis and reporting.
The Problem Without GROUP BY
Aggregating Everything
-- Total order count (all customers combined)
SELECT COUNT(*) FROM orders;
Result:
count
-------
1000
Question: How many orders does each customer have?
You can't do this without GROUP BY!
GROUP BY Syntax
SELECT
column1,
column2,
AGGREGATE_FUNCTION(column3)
FROM table
GROUP BY column1, column2;
Basic GROUP BY Example
-- Order count per customer
SELECT
customer_id,
COUNT(*) AS order_count
FROM orders
GROUP BY customer_id;
Result:
customer_id | order_count
-------------+-------------
1 | 12
2 | 5
3 | 8
4 | 0 ← Won't appear (no orders)
...
Explanation:
- Rows are grouped by
customer_id COUNT(*)is calculated for each group- Each unique
customer_idgets one result row
How GROUP BY Works
Execution Order
SELECT customer_id, COUNT(*)
FROM orders
WHERE total > 50
GROUP BY customer_id
ORDER BY COUNT(*) DESC;
Execution steps:
- FROM: Get data from orders table
- WHERE: Filter orders with total > 50
- GROUP BY: Group remaining rows by customer_id
- SELECT: Calculate COUNT(*) for each group
- ORDER BY: Sort results
Visualizing GROUP BY
Orders Table:
┌────┬─────────────┬───────┐
│ id │ customer_id │ total │
├────┼─────────────┼───────┤
│ 1 │ 1 │ 100 │ ┐
│ 2 │ 1 │ 75 │ ├─ Group 1 (customer_id = 1)
│ 3 │ 1 │ 200 │ ┘
│ 4 │ 2 │ 150 │ ┐
│ 5 │ 2 │ 90 │ ├─ Group 2 (customer_id = 2)
│ 6 │ 2 │ 120 │ ┘
│ 7 │ 3 │ 80 │ ──── Group 3 (customer_id = 3)
└────┴─────────────┴───────┘
After GROUP BY customer_id:
┌─────────────┬──────────────┬───────────┐
│ customer_id │ order_count │ total_sum │
├─────────────┼──────────────┼───────────┤
│ 1 │ 3 │ 375 │
│ 2 │ 3 │ 360 │
│ 3 │ 1 │ 80 │
└─────────────┴──────────────┴───────────┘
Grouping by Single Column
Count Orders per Customer
SELECT
customer_id,
COUNT(*) AS order_count
FROM orders
GROUP BY customer_id
ORDER BY order_count DESC;
Sum Revenue per Product
SELECT
product_id,
SUM(quantity * price) AS total_revenue
FROM order_items
GROUP BY product_id
ORDER BY total_revenue DESC;
Average Price per Category
SELECT
category,
COUNT(*) AS product_count,
ROUND(AVG(price), 2) AS avg_price
FROM products
GROUP BY category;
Result:
category | product_count | avg_price
--------------+---------------+-----------
Electronics | 45 | 324.99
Clothing | 120 | 45.75
Books | 78 | 18.50
Grouping by Multiple Columns
Order Count by Customer and Year
SELECT
customer_id,
EXTRACT(YEAR FROM order_date) AS year,
COUNT(*) AS order_count,
SUM(total) AS total_spent
FROM orders
GROUP BY customer_id, EXTRACT(YEAR FROM order_date)
ORDER BY customer_id, year;
Result:
customer_id | year | order_count | total_spent
-------------+------+-------------+-------------
1 | 2023 | 5 | 487.25
1 | 2024 | 7 | 687.50
2 | 2023 | 2 | 150.00
2 | 2024 | 3 | 425.75
Each unique combination of (customer_id, year) creates a separate group.
Sales by Category and Month
SELECT
p.category,
DATE_TRUNC('month', o.order_date) AS month,
SUM(oi.quantity) AS units_sold,
SUM(oi.quantity * oi.price) AS revenue
FROM products p
INNER JOIN order_items oi ON p.id = oi.product_id
INNER JOIN orders o ON oi.order_id = o.id
GROUP BY p.category, DATE_TRUNC('month', o.order_date)
ORDER BY category, month;
GROUP BY with JOINs
Customer Purchase Summary
SELECT
c.id,
c.name,
COUNT(o.id) AS order_count,
COALESCE(SUM(o.total), 0) AS total_spent,
COALESCE(AVG(o.total), 0) AS avg_order_value
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
GROUP BY c.id, c.name
ORDER BY total_spent DESC;
Result:
id | name | order_count | total_spent | avg_order_value
----+-----------------+-------------+-------------+-----------------
1 | Alice Smith | 12 | 1547.82 | 128.99
2 | Bob Jones | 5 | 687.25 | 137.45
3 | Carol White | 0 | 0.00 | 0.00
Important: Use LEFT JOIN to include customers with 0 orders!
Product Sales by Category
SELECT
p.category,
COUNT(DISTINCT p.id) AS product_count,
COALESCE(SUM(oi.quantity), 0) AS units_sold,
COALESCE(SUM(oi.quantity * oi.price), 0) AS revenue
FROM products p
LEFT JOIN order_items oi ON p.id = oi.product_id
GROUP BY p.category
ORDER BY revenue DESC;
The GROUP BY Rule
You Must Include Non-Aggregated Columns in GROUP BY
-- ❌ Error: name must be in GROUP BY
SELECT
customer_id,
name, -- Not aggregated, not in GROUP BY!
COUNT(*)
FROM orders
GROUP BY customer_id;
-- ✅ Correct
SELECT
customer_id,
name,
COUNT(*)
FROM orders
GROUP BY customer_id, name;
Rule: Every column in SELECT must be either:
- In the GROUP BY clause, OR
- Inside an aggregate function
Valid Examples
-- ✅ All non-aggregates in GROUP BY
SELECT
customer_id,
product_id,
SUM(quantity)
FROM order_items
GROUP BY customer_id, product_id;
-- ✅ Only aggregates in SELECT
SELECT
COUNT(*),
SUM(total),
AVG(total)
FROM orders;
-- ✅ Mix of grouped columns and aggregates
SELECT
category,
COUNT(*),
AVG(price)
FROM products
GROUP BY category;
Grouping by Expressions
Group by Year
SELECT
EXTRACT(YEAR FROM order_date) AS year,
COUNT(*) AS order_count,
SUM(total) AS revenue
FROM orders
GROUP BY EXTRACT(YEAR FROM order_date)
ORDER BY year;
Group by Month
SELECT
DATE_TRUNC('month', order_date) AS month,
COUNT(*) AS orders,
SUM(total) AS revenue
FROM orders
WHERE order_date >= '2024-01-01'
GROUP BY DATE_TRUNC('month', order_date)
ORDER BY month;
Result:
month | orders | revenue
---------------------+--------+---------
2024-01-01 00:00:00 | 87 | 10247.50
2024-02-01 00:00:00 | 92 | 11580.25
2024-03-01 00:00:00 | 104 | 13245.75
Group by Price Range
SELECT
CASE
WHEN price < 20 THEN 'Budget'
WHEN price < 100 THEN 'Mid-range'
ELSE 'Premium'
END AS price_tier,
COUNT(*) AS product_count,
ROUND(AVG(price), 2) AS avg_price
FROM products
GROUP BY
CASE
WHEN price < 20 THEN 'Budget'
WHEN price < 100 THEN 'Mid-range'
ELSE 'Premium'
END
ORDER BY avg_price;
Result:
price_tier | product_count | avg_price
------------+---------------+-----------
Budget | 78 | 12.45
Mid-range | 120 | 54.80
Premium | 45 | 324.99
DISTINCT vs GROUP BY
Finding Unique Values
-- Using DISTINCT
SELECT DISTINCT category
FROM products;
-- Using GROUP BY
SELECT category
FROM products
GROUP BY category;
Both return the same result, but GROUP BY allows aggregates:
-- ✅ With GROUP BY
SELECT
category,
COUNT(*) AS product_count
FROM products
GROUP BY category;
-- ❌ Can't do this with DISTINCT
SELECT DISTINCT
category,
COUNT(*) -- Error!
FROM products;
Use DISTINCT for unique values, GROUP BY for aggregates.
Common GROUP BY Patterns
Pattern 1: Top N per Category
-- Top 3 products per category by revenue
WITH product_revenue AS (
SELECT
p.category,
p.name,
SUM(oi.quantity * oi.price) AS revenue,
ROW_NUMBER() OVER (PARTITION BY p.category ORDER BY SUM(oi.quantity * oi.price) DESC) AS rank
FROM products p
INNER JOIN order_items oi ON p.id = oi.product_id
GROUP BY p.category, p.id, p.name
)
SELECT category, name, revenue
FROM product_revenue
WHERE rank <= 3
ORDER BY category, rank;
(Window functions covered in advanced courses)
Pattern 2: Percentage of Total
-- Each category's share of total revenue
SELECT
category,
SUM(quantity * price) AS revenue,
ROUND(100.0 * SUM(quantity * price) / (SELECT SUM(quantity * price) FROM order_items), 2) AS pct_of_total
FROM products p
INNER JOIN order_items oi ON p.id = oi.product_id
GROUP BY category
ORDER BY revenue DESC;
Result:
category | revenue | pct_of_total
--------------+-----------+--------------
Electronics | 87500.25 | 45.67
Clothing | 52400.75 | 27.34
Books | 32100.50 | 16.75
Pattern 3: Cohort Analysis
-- Customer cohorts by signup month
SELECT
DATE_TRUNC('month', created_at) AS cohort_month,
COUNT(*) AS new_customers,
SUM(COUNT(*)) OVER (ORDER BY DATE_TRUNC('month', created_at)) AS cumulative_customers
FROM customers
GROUP BY DATE_TRUNC('month', created_at)
ORDER BY cohort_month;
Pattern 4: Daily/Weekly/Monthly Aggregates
-- Daily order summary
SELECT
DATE(order_date) AS date,
COUNT(*) AS orders,
SUM(total) AS revenue,
AVG(total) AS avg_order_value
FROM orders
WHERE order_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY DATE(order_date)
ORDER BY date DESC;
GROUP BY with NULL Values
NULL Creates Its Own Group
CREATE TABLE employees (
id SERIAL PRIMARY KEY,
name VARCHAR(100),
department VARCHAR(50)
);
INSERT INTO employees VALUES
(1, 'Alice', 'Sales'),
(2, 'Bob', 'Sales'),
(3, 'Carol', NULL), -- No department
(4, 'Dave', NULL); -- No department
SELECT
department,
COUNT(*) AS employee_count
FROM employees
GROUP BY department;
Result:
department | employee_count
------------+----------------
Sales | 2
NULL | 2 ← NULL is a group
Handling NULL in Groups
SELECT
COALESCE(department, 'Unassigned') AS department,
COUNT(*) AS employee_count
FROM employees
GROUP BY department;
Result:
department | employee_count
-------------+----------------
Sales | 2
Unassigned | 2
Performance Tips
Index Grouped Columns
-- If you frequently GROUP BY category
CREATE INDEX idx_products_category ON products(category);
-- Speeds up this query:
SELECT category, COUNT(*)
FROM products
GROUP BY category;
Filter Before Grouping
-- ✅ Better: Filter first
SELECT
category,
COUNT(*)
FROM products
WHERE active = TRUE -- Filter reduces rows before grouping
GROUP BY category;
-- ❌ Slower: Group everything, then filter
SELECT
category,
COUNT(*)
FROM products
GROUP BY category
HAVING SUM(CASE WHEN active = TRUE THEN 1 ELSE 0 END) > 0;
Limit Groups with LIMIT
-- Top 10 categories by product count
SELECT
category,
COUNT(*) AS product_count
FROM products
GROUP BY category
ORDER BY product_count DESC
LIMIT 10;
Common Mistakes
Mistake 1: Selecting Non-Grouped Columns
-- ❌ Error
SELECT
customer_id,
name, -- Must be in GROUP BY!
COUNT(*)
FROM orders
GROUP BY customer_id;
Mistake 2: Using WHERE Instead of HAVING
-- ❌ Error: Can't use aggregate in WHERE
SELECT customer_id, COUNT(*)
FROM orders
WHERE COUNT(*) > 5 -- Error!
GROUP BY customer_id;
-- ✅ Use HAVING (next lesson)
SELECT customer_id, COUNT(*)
FROM orders
GROUP BY customer_id
HAVING COUNT(*) > 5;
Mistake 3: Forgetting COALESCE with LEFT JOIN
-- ❌ Wrong: Customers without orders won't show 0
SELECT
c.name,
SUM(o.total) AS total_spent
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
GROUP BY c.id, c.name;
-- Carol with no orders: total_spent = NULL
-- ✅ Correct
SELECT
c.name,
COALESCE(SUM(o.total), 0) AS total_spent
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
GROUP BY c.id, c.name;
-- Carol with no orders: total_spent = 0
Practice Exercises
Exercise 1
Find the number of products in each category.
Solution
SELECT
category,
COUNT(*) AS product_count
FROM products
GROUP BY category
ORDER BY product_count DESC;
Exercise 2
Calculate total revenue per customer (include all customers, even those with $0).
Solution
SELECT
c.id,
c.name,
COALESCE(SUM(o.total), 0) AS total_spent
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
GROUP BY c.id, c.name
ORDER BY total_spent DESC;
Exercise 3
Find monthly revenue for 2024.
Solution
SELECT
DATE_TRUNC('month', order_date) AS month,
SUM(total) AS revenue
FROM orders
WHERE EXTRACT(YEAR FROM order_date) = 2024
GROUP BY DATE_TRUNC('month', order_date)
ORDER BY month;
Exercise 4
Count how many products fall into each price tier (Budget < $20, Mid-range $20-$100, Premium > $100).
Solution
SELECT
CASE
WHEN price < 20 THEN 'Budget'
WHEN price <= 100 THEN 'Mid-range'
ELSE 'Premium'
END AS price_tier,
COUNT(*) AS product_count
FROM products
GROUP BY
CASE
WHEN price < 20 THEN 'Budget'
WHEN price <= 100 THEN 'Mid-range'
ELSE 'Premium'
END
ORDER BY product_count DESC;
Key Takeaways
- ✅ GROUP BY divides rows into groups and calculates aggregates per group
- ✅ Every non-aggregated column in SELECT must be in GROUP BY
- ✅ Execution order: FROM → WHERE → GROUP BY → SELECT → ORDER BY
- ✅ Group by multiple columns for finer-grained analysis
- ✅ Use with JOINs to aggregate across tables
- ✅ NULL values create their own group
- ✅ Index grouped columns for better performance
- ✅ COALESCE handles NULLs in aggregates with LEFT JOIN
- ✅ Filter before grouping with WHERE (filter groups with HAVING)
Next Steps
You've mastered GROUP BY! In the next lesson, we'll learn HAVING—how to filter groups based on aggregate conditions (e.g., "customers with more than 5 orders" or "categories with average price > $100").
Almost done with Module 5!

