Aggregates with JOINs - Cross-Table Analysis
Introduction
You've learned aggregate functions, GROUP BY, and HAVING. You've also mastered JOINs for combining tables. Now it's time to combine these powerful techniques!
Aggregates with JOINs enable sophisticated cross-table analysis:
- "Total revenue per customer across all their orders"
- "Average order value per product category"
- "Monthly sales by region"
This lesson shows you how to combine grouping and joining for real-world data analysis.
The Basic Pattern
Aggregate After Joining
SELECT
c.name,
COUNT(o.id) AS order_count,
SUM(o.total) AS total_spent
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id
GROUP BY c.id, c.name;
Pattern:
- JOIN tables
- GROUP BY columns from the primary table
- Aggregate columns from the joined table
Result:
name | order_count | total_spent
-----------------+-------------+-------------
Alice Smith | 12 | 1547.82
Bob Jones | 5 | 687.25
Carol White | 3 | 425.00
LEFT JOIN vs INNER JOIN with Aggregates
INNER JOIN: Only Customers with Orders
SELECT
c.name,
COUNT(o.id) AS order_count,
SUM(o.total) AS total_spent
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id
GROUP BY c.id, c.name;
Result:
name | order_count | total_spent
-----------------+-------------+-------------
Alice Smith | 12 | 1547.82
Bob Jones | 5 | 687.25
Customers without orders (like Dave) are excluded.
LEFT JOIN: All Customers (Including Those with 0 Orders)
SELECT
c.name,
COUNT(o.id) AS order_count,
COALESCE(SUM(o.total), 0) AS total_spent
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
GROUP BY c.id, c.name;
Result:
name | order_count | total_spent
-----------------+-------------+-------------
Alice Smith | 12 | 1547.82
Bob Jones | 5 | 687.25
Dave Wilson | 0 | 0.00 ← Included!
Key differences:
COUNT(o.id)counts only non-NULL values (0 for Dave)COALESCE(SUM(o.total), 0)converts NULL to 0
The COALESCE Pattern
-- ❌ Without COALESCE: NULL for customers with no orders
SELECT
c.name,
SUM(o.total) AS total_spent
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
GROUP BY c.id, c.name;
-- Dave: total_spent = NULL
-- ✅ With COALESCE: 0 for customers with no orders
SELECT
c.name,
COALESCE(SUM(o.total), 0) AS total_spent
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
GROUP BY c.id, c.name;
-- Dave: total_spent = 0
Aggregating Across Multiple Tables
Three-Table JOIN with Aggregates
-- Revenue by product category
SELECT
p.category,
COUNT(DISTINCT o.id) AS order_count,
SUM(oi.quantity) AS units_sold,
SUM(oi.quantity * oi.price) AS total_revenue
FROM products p
INNER JOIN order_items oi ON p.id = oi.product_id
INNER JOIN orders o ON oi.order_id = o.id
GROUP BY p.category
ORDER BY total_revenue DESC;
Result:
category | order_count | units_sold | total_revenue
--------------+-------------+------------+---------------
Electronics | 487 | 1247 | 87500.25
Clothing | 823 | 3456 | 52400.75
Books | 612 | 2187 | 32100.50
Customer Lifetime Value (CLV)
-- Comprehensive customer analysis
SELECT
c.id,
c.name,
c.email,
COUNT(DISTINCT o.id) AS order_count,
COUNT(DISTINCT oi.product_id) AS unique_products,
SUM(oi.quantity) AS total_items,
COALESCE(SUM(o.total), 0) AS lifetime_value,
COALESCE(AVG(o.total), 0) AS avg_order_value,
MAX(o.order_date) AS last_order_date
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
LEFT JOIN order_items oi ON o.id = oi.order_id
GROUP BY c.id, c.name, c.email
ORDER BY lifetime_value DESC;
Result:
id | name | email | order_count | unique_products | total_items | lifetime_value | avg_order_value | last_order_date
----+-----------------+----------------------+-------------+-----------------+-------------+----------------+-----------------+-----------------
1 | Alice Smith | alice@example.com | 12 | 24 | 47 | 1547.82 | 128.99 | 2024-12-15
2 | Bob Jones | bob@example.com | 5 | 12 | 18 | 687.25 | 137.45 | 2024-11-20
4 | Dave Wilson | dave@example.com | 0 | 0 | 0 | 0.00 | 0.00 | NULL
Common Aggregation Patterns
Pattern 1: Sales by Time Period
-- Monthly revenue by category
SELECT
p.category,
DATE_TRUNC('month', o.order_date) AS month,
SUM(oi.quantity * oi.price) AS revenue
FROM products p
INNER JOIN order_items oi ON p.id = oi.product_id
INNER JOIN orders o ON oi.order_id = o.id
WHERE o.order_date >= '2024-01-01'
GROUP BY p.category, DATE_TRUNC('month', o.order_date)
ORDER BY category, month;
Result:
category | month | revenue
--------------+---------------------+-----------
Books | 2024-01-01 00:00:00 | 2547.50
Books | 2024-02-01 00:00:00 | 3125.25
Electronics | 2024-01-01 00:00:00 | 8750.00
Electronics | 2024-02-01 00:00:00 | 9450.75
Pattern 2: Product Popularity
-- Top 10 products by order frequency
SELECT
p.id,
p.name,
COUNT(DISTINCT oi.order_id) AS times_ordered,
SUM(oi.quantity) AS total_units_sold,
SUM(oi.quantity * oi.price) AS total_revenue
FROM products p
INNER JOIN order_items oi ON p.id = oi.product_id
GROUP BY p.id, p.name
ORDER BY times_ordered DESC
LIMIT 10;
Pattern 3: Customer Segmentation
-- Segment customers by spending
SELECT
CASE
WHEN COALESCE(SUM(o.total), 0) = 0 THEN 'No purchases'
WHEN SUM(o.total) < 100 THEN 'Low value'
WHEN SUM(o.total) < 500 THEN 'Medium value'
WHEN SUM(o.total) < 1000 THEN 'High value'
ELSE 'VIP'
END AS segment,
COUNT(DISTINCT c.id) AS customer_count,
COALESCE(SUM(o.total), 0) AS total_revenue
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
GROUP BY
CASE
WHEN COALESCE(SUM(o.total), 0) = 0 THEN 'No purchases'
WHEN SUM(o.total) < 100 THEN 'Low value'
WHEN SUM(o.total) < 500 THEN 'Medium value'
WHEN SUM(o.total) < 1000 THEN 'High value'
ELSE 'VIP'
END
ORDER BY total_revenue DESC;
Result:
segment | customer_count | total_revenue
----------------+----------------+---------------
VIP | 12 | 18750.50
High value | 28 | 19845.25
Medium value | 45 | 15687.75
Low value | 38 | 2547.00
No purchases | 27 | 0.00
Pattern 4: Average Items per Order by Category
SELECT
p.category,
COUNT(DISTINCT o.id) AS order_count,
SUM(oi.quantity) AS total_items,
ROUND(SUM(oi.quantity)::NUMERIC / COUNT(DISTINCT o.id), 2) AS avg_items_per_order
FROM products p
INNER JOIN order_items oi ON p.id = oi.product_id
INNER JOIN orders o ON oi.order_id = o.id
GROUP BY p.category
ORDER BY avg_items_per_order DESC;
DISTINCT in Aggregates with JOINs
Why DISTINCT Matters
-- ❌ Without DISTINCT: Over-counting
SELECT
c.name,
COUNT(o.id) AS order_count
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id
INNER JOIN order_items oi ON o.id = oi.order_id
GROUP BY c.id, c.name;
-- If an order has 3 items, it's counted 3 times!
-- ✅ With DISTINCT: Correct count
SELECT
c.name,
COUNT(DISTINCT o.id) AS order_count
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id
INNER JOIN order_items oi ON o.id = oi.order_id
GROUP BY c.id, c.name;
Multiple DISTINCT Aggregates
-- Customers: distinct orders and distinct products
SELECT
c.name,
COUNT(DISTINCT o.id) AS order_count,
COUNT(DISTINCT oi.product_id) AS unique_products_purchased
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
LEFT JOIN order_items oi ON o.id = oi.order_id
GROUP BY c.id, c.name;
HAVING with JOINs
Filter Groups After Aggregation
-- Customers who spent more than $500
SELECT
c.name,
COUNT(o.id) AS order_count,
SUM(o.total) AS total_spent
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id
GROUP BY c.id, c.name
HAVING SUM(o.total) > 500
ORDER BY total_spent DESC;
Combining WHERE and HAVING
-- Customers with 5+ orders in 2024, each over $50
SELECT
c.name,
COUNT(o.id) AS order_count,
SUM(o.total) AS total_spent
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id
WHERE o.total > 50
AND EXTRACT(YEAR FROM o.order_date) = 2024
GROUP BY c.id, c.name
HAVING COUNT(o.id) >= 5
ORDER BY total_spent DESC;
Complex Multi-Table Aggregations
Revenue by Customer and Product Category
SELECT
c.name AS customer,
p.category,
COUNT(DISTINCT o.id) AS orders,
SUM(oi.quantity) AS items,
SUM(oi.quantity * oi.price) AS revenue
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id
INNER JOIN order_items oi ON o.id = oi.order_id
INNER JOIN products p ON oi.product_id = p.id
GROUP BY c.id, c.name, p.category
ORDER BY c.name, revenue DESC;
Result:
customer | category | orders | items | revenue
-----------------+--------------+--------+-------+-----------
Alice Smith | Electronics | 8 | 15 | 987.50
Alice Smith | Books | 6 | 12 | 325.75
Alice Smith | Clothing | 4 | 8 | 234.57
Bob Jones | Electronics | 3 | 5 | 487.25
Sales Performance by Month and Category
SELECT
DATE_TRUNC('month', o.order_date) AS month,
p.category,
COUNT(DISTINCT o.id) AS order_count,
SUM(oi.quantity) AS units_sold,
SUM(oi.quantity * oi.price) AS revenue,
ROUND(AVG(oi.price), 2) AS avg_unit_price
FROM orders o
INNER JOIN order_items oi ON o.id = oi.order_id
INNER JOIN products p ON oi.product_id = p.id
WHERE o.order_date >= '2024-01-01'
GROUP BY DATE_TRUNC('month', o.order_date), p.category
ORDER BY month, revenue DESC;
Subqueries in Aggregates
Percent of Total Revenue
SELECT
p.category,
SUM(oi.quantity * oi.price) AS revenue,
ROUND(
100.0 * SUM(oi.quantity * oi.price) /
(SELECT SUM(quantity * price) FROM order_items),
2
) AS pct_of_total
FROM products p
INNER JOIN order_items oi ON p.id = oi.product_id
GROUP BY p.category
ORDER BY revenue DESC;
Result:
category | revenue | pct_of_total
--------------+------------+--------------
Electronics | 87500.25 | 45.67
Clothing | 52400.75 | 27.34
Books | 32100.50 | 16.75
Customer Ranking
SELECT
c.name,
SUM(o.total) AS total_spent,
RANK() OVER (ORDER BY SUM(o.total) DESC) AS spending_rank
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id
GROUP BY c.id, c.name
ORDER BY total_spent DESC;
(Window functions covered in advanced courses)
Performance Optimization
Index Foreign Keys
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
CREATE INDEX idx_order_items_order_id ON order_items(order_id);
CREATE INDEX idx_order_items_product_id ON order_items(product_id);
Filter Early with WHERE
-- ✅ Better: Filter before joining
SELECT
c.name,
COUNT(o.id) AS order_count
FROM customers c
INNER JOIN (
SELECT * FROM orders WHERE order_date >= '2024-01-01'
) o ON c.id = o.customer_id
GROUP BY c.id, c.name;
-- Also good: WHERE after FROM
SELECT
c.name,
COUNT(o.id) AS order_count
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id
WHERE o.order_date >= '2024-01-01'
GROUP BY c.id, c.name;
Avoid Cartesian Products
-- ❌ Cartesian product (missing join condition)
SELECT
c.name,
COUNT(*)
FROM customers c, orders o
GROUP BY c.id, c.name;
-- Returns customers × orders rows!
-- ✅ Proper JOIN
SELECT
c.name,
COUNT(o.id)
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id
GROUP BY c.id, c.name;
Common Mistakes
Mistake 1: Forgetting DISTINCT with Multiple JOINs
-- ❌ Over-counts orders (one per item)
SELECT c.name, COUNT(o.id)
FROM customers c
JOIN orders o ON c.id = o.customer_id
JOIN order_items oi ON o.id = oi.order_id
GROUP BY c.id, c.name;
-- ✅ Correct count
SELECT c.name, COUNT(DISTINCT o.id)
FROM customers c
JOIN orders o ON c.id = o.customer_id
JOIN order_items oi ON o.id = oi.order_id
GROUP BY c.id, c.name;
Mistake 2: Using INNER JOIN Instead of LEFT JOIN
-- ❌ Excludes customers with no orders
SELECT c.name, COUNT(o.id)
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id
GROUP BY c.id, c.name;
-- ✅ Includes all customers
SELECT c.name, COUNT(o.id)
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
GROUP BY c.id, c.name;
Mistake 3: Not Using COALESCE with LEFT JOIN
-- ❌ NULL for customers with no orders
SELECT c.name, SUM(o.total)
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
GROUP BY c.id, c.name;
-- ✅ 0 for customers with no orders
SELECT c.name, COALESCE(SUM(o.total), 0)
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
GROUP BY c.id, c.name;
Practice Exercises
Exercise 1
Find the total revenue per category.
Solution
SELECT
p.category,
SUM(oi.quantity * oi.price) AS total_revenue
FROM products p
INNER JOIN order_items oi ON p.id = oi.product_id
GROUP BY p.category
ORDER BY total_revenue DESC;
Exercise 2
Find customers with at least 5 orders, showing their order count and total spent.
Solution
SELECT
c.name,
COUNT(o.id) AS order_count,
SUM(o.total) AS total_spent
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id
GROUP BY c.id, c.name
HAVING COUNT(o.id) >= 5
ORDER BY total_spent DESC;
Exercise 3
Find monthly revenue for 2024, including months with 0 revenue.
Solution
-- Generate all months first (PostgreSQL)
WITH months AS (
SELECT generate_series(
'2024-01-01'::DATE,
'2024-12-01'::DATE,
'1 month'::INTERVAL
)::DATE AS month
)
SELECT
m.month,
COALESCE(SUM(o.total), 0) AS revenue
FROM months m
LEFT JOIN orders o ON DATE_TRUNC('month', o.order_date)::DATE = m.month
GROUP BY m.month
ORDER BY m.month;
Exercise 4
Find the average order value per customer (only customers with orders).
Solution
SELECT
c.name,
COUNT(o.id) AS order_count,
ROUND(AVG(o.total), 2) AS avg_order_value
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id
GROUP BY c.id, c.name
ORDER BY avg_order_value DESC;
Key Takeaways
- ✅ Combine JOINs and GROUP BY for cross-table aggregation
- ✅ Use LEFT JOIN to include rows without matches (with COALESCE for 0)
- ✅ Use INNER JOIN to include only matched rows
- ✅ COUNT(DISTINCT column) avoids over-counting with multiple JOINs
- ✅ GROUP BY all non-aggregated columns
- ✅ HAVING filters groups after aggregation
- ✅ Index foreign keys for performance
- ✅ Filter early with WHERE before GROUP BY
Next Steps
You've completed Module 5 and mastered aggregate functions, grouping, and filtering! In the next lesson, we'll explore Common Aggregation Patterns—real-world examples and advanced techniques for data analysis.
Almost ready for Module 6!

