Aggregate Functions - Summarizing Data
Introduction
So far, you've queried individual rows of data. But what if you want to answer questions like:
- "How many customers do we have?"
- "What's the average order total?"
- "What's the highest price in our inventory?"
Aggregate functions compute a single result from a set of rows. They're essential for data analysis, reporting, and understanding your database at a high level.
The Five Core Aggregate Functions
PostgreSQL provides five fundamental aggregate functions:
| Function | Purpose | Example |
|---|---|---|
COUNT() | Count rows | How many orders? |
SUM() | Add values | Total revenue |
AVG() | Calculate average | Average order value |
MIN() | Find minimum | Cheapest product |
MAX() | Find maximum | Most expensive product |
COUNT() - Counting Rows
COUNT(*) - Count All Rows
-- How many customers do we have?
SELECT COUNT(*) FROM customers;
Result:
count
-------
150
COUNT(*) counts all rows, including those with NULL values.
COUNT(column) - Count Non-NULL Values
-- How many customers have a phone number?
SELECT COUNT(phone) FROM customers;
Result:
count
-------
127
COUNT(column) only counts rows where the column is NOT NULL.
COUNT(DISTINCT column) - Count Unique Values
-- How many different cities do our customers live in?
SELECT COUNT(DISTINCT city) FROM customers;
Result:
count
-------
42
Removes duplicates before counting.
SUM() - Adding Values
Basic SUM
-- What's our total revenue from all orders?
SELECT SUM(total) AS total_revenue
FROM orders;
Result:
total_revenue
---------------
125847.50
SUM with WHERE
-- Total revenue from orders in 2024
SELECT SUM(total) AS revenue_2024
FROM orders
WHERE EXTRACT(YEAR FROM order_date) = 2024;
Result:
revenue_2024
--------------
87520.25
SUM Ignores NULLs
CREATE TABLE donations (
id SERIAL PRIMARY KEY,
donor VARCHAR(100),
amount DECIMAL(10,2)
);
INSERT INTO donations VALUES
(1, 'Alice', 100.00),
(2, 'Bob', NULL), -- No donation yet
(3, 'Carol', 50.00);
SELECT SUM(amount) FROM donations;
Result:
sum
--------
150.00
Bob's NULL is ignored, not treated as 0.
AVG() - Calculating Average
Basic AVG
-- What's the average order total?
SELECT AVG(total) AS avg_order_value
FROM orders;
Result:
avg_order_value
-----------------
125.85
AVG vs SUM/COUNT
-- These are equivalent:
SELECT AVG(total) FROM orders;
SELECT SUM(total) / COUNT(total) FROM orders;
AVG Ignores NULLs
-- Average donation (NULL excluded)
SELECT AVG(amount) FROM donations;
-- Result: 75.00 (not 50.00!)
-- Calculation: (100 + 50) / 2 = 75
Rounding AVG Results
SELECT ROUND(AVG(total), 2) AS avg_order_value
FROM orders;
Result:
avg_order_value
-----------------
125.85
MIN() and MAX() - Finding Extremes
Numeric MIN/MAX
-- Cheapest and most expensive products
SELECT
MIN(price) AS cheapest,
MAX(price) AS most_expensive
FROM products;
Result:
cheapest | most_expensive
----------+----------------
5.99 | 999.99
Date MIN/MAX
-- First and most recent order dates
SELECT
MIN(order_date) AS first_order,
MAX(order_date) AS latest_order
FROM orders;
Result:
first_order | latest_order
-------------+--------------
2023-01-05 | 2024-12-20
Text MIN/MAX
-- First and last customer names alphabetically
SELECT
MIN(name) AS first_alphabetically,
MAX(name) AS last_alphabetically
FROM customers;
Result:
first_alphabetically | last_alphabetically
----------------------+---------------------
Aaron Smith | Zoe Williams
Combining Multiple Aggregates
-- Order statistics
SELECT
COUNT(*) AS total_orders,
SUM(total) AS total_revenue,
AVG(total) AS avg_order_value,
MIN(total) AS smallest_order,
MAX(total) AS largest_order
FROM orders;
Result:
total_orders | total_revenue | avg_order_value | smallest_order | largest_order
--------------+---------------+-----------------+----------------+---------------
1000 | 125847.50 | 125.85 | 5.99 | 999.99
Aggregate Functions with WHERE
Filter Before Aggregating
-- Statistics for orders over $100
SELECT
COUNT(*) AS large_order_count,
AVG(total) AS avg_large_order
FROM orders
WHERE total > 100;
Result:
large_order_count | avg_large_order
-------------------+-----------------
487 | 215.42
Execution order:
- Filter rows with WHERE
- Apply aggregate functions to filtered rows
Common Pattern: Active vs Total
-- Active vs inactive customers
SELECT
COUNT(*) AS total_customers,
COUNT(CASE WHEN active = TRUE THEN 1 END) AS active_customers,
COUNT(CASE WHEN active = FALSE THEN 1 END) AS inactive_customers
FROM customers;
Result:
total_customers | active_customers | inactive_customers
-----------------+------------------+--------------------
150 | 127 | 23
Aggregate Functions with JOINs
Count Related Records
-- Total order count and revenue per customer
SELECT
c.name,
COUNT(o.id) AS order_count,
COALESCE(SUM(o.total), 0) AS total_spent
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
GROUP BY c.id, c.name;
Result:
name | order_count | total_spent
-----------------+-------------+-------------
Alice Smith | 12 | 1547.82
Bob Jones | 5 | 687.25
Carol White | 0 | 0.00
Important: Use LEFT JOIN to include customers with 0 orders!
Aggregate from Multiple Tables
-- Product sales summary
SELECT
p.name,
COUNT(oi.id) AS times_ordered,
COALESCE(SUM(oi.quantity), 0) AS units_sold,
COALESCE(SUM(oi.quantity * oi.price), 0) AS revenue
FROM products p
LEFT JOIN order_items oi ON p.id = oi.product_id
GROUP BY p.id, p.name;
NULL Handling in Aggregates
COUNT(*) vs COUNT(column)
CREATE TABLE employees (
id SERIAL PRIMARY KEY,
name VARCHAR(100),
department VARCHAR(50),
manager_id INTEGER
);
INSERT INTO employees VALUES
(1, 'Alice', 'Sales', NULL), -- No manager (CEO)
(2, 'Bob', 'Sales', 1),
(3, 'Carol', 'Engineering', 1);
-- Count all employees
SELECT COUNT(*) FROM employees;
-- Result: 3
-- Count employees with managers
SELECT COUNT(manager_id) FROM employees;
-- Result: 2 (Alice's NULL excluded)
SUM/AVG/MIN/MAX Ignore NULLs
-- Average with NULLs
SELECT AVG(manager_id) FROM employees;
-- Result: 1.0 (only Bob and Carol counted)
-- NOT 0.67 (which would be if NULL counted as 0)
COALESCE for Default Values
-- Treat NULL as 0
SELECT
name,
COALESCE(manager_id, 0) AS manager_id
FROM employees;
Common Aggregate Patterns
Pattern 1: Data Quality Check
-- How complete is our customer data?
SELECT
COUNT(*) AS total_customers,
COUNT(email) AS has_email,
COUNT(phone) AS has_phone,
COUNT(address) AS has_address,
ROUND(100.0 * COUNT(email) / COUNT(*), 2) AS email_pct,
ROUND(100.0 * COUNT(phone) / COUNT(*), 2) AS phone_pct
FROM customers;
Result:
total_customers | has_email | has_phone | has_address | email_pct | phone_pct
-----------------+-----------+-----------+-------------+-----------+-----------
150 | 150 | 127 | 142 | 100.00 | 84.67
Pattern 2: Revenue Summary
-- Monthly revenue summary for 2024
SELECT
DATE_TRUNC('month', order_date) AS month,
COUNT(*) AS order_count,
SUM(total) AS revenue,
AVG(total) AS avg_order_value,
MIN(total) AS smallest_order,
MAX(total) AS largest_order
FROM orders
WHERE EXTRACT(YEAR FROM order_date) = 2024
GROUP BY DATE_TRUNC('month', order_date)
ORDER BY month;
Pattern 3: Top Performers
-- Top 5 customers by total spending
SELECT
c.name,
COUNT(o.id) AS order_count,
SUM(o.total) AS total_spent
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id
GROUP BY c.id, c.name
ORDER BY total_spent DESC
LIMIT 5;
FILTER Clause (PostgreSQL Extension)
Conditional Aggregates
-- Orders by status
SELECT
COUNT(*) AS total_orders,
COUNT(*) FILTER (WHERE status = 'pending') AS pending,
COUNT(*) FILTER (WHERE status = 'shipped') AS shipped,
COUNT(*) FILTER (WHERE status = 'delivered') AS delivered
FROM orders;
Result:
total_orders | pending | shipped | delivered
--------------+---------+---------+-----------
1000 | 127 | 218 | 655
Alternative (CASE method):
SELECT
COUNT(*) AS total_orders,
COUNT(CASE WHEN status = 'pending' THEN 1 END) AS pending,
COUNT(CASE WHEN status = 'shipped' THEN 1 END) AS shipped,
COUNT(CASE WHEN status = 'delivered' THEN 1 END) AS delivered
FROM orders;
SUM with FILTER
-- Revenue by year
SELECT
SUM(total) FILTER (WHERE EXTRACT(YEAR FROM order_date) = 2023) AS revenue_2023,
SUM(total) FILTER (WHERE EXTRACT(YEAR FROM order_date) = 2024) AS revenue_2024
FROM orders;
Performance Considerations
Aggregates Can Be Slow on Large Tables
-- Slow on 10M+ rows without index
SELECT COUNT(*) FROM huge_table WHERE status = 'active';
Solution: Add index on filtered column:
CREATE INDEX idx_huge_table_status ON huge_table(status);
COUNT(*) vs COUNT(1)
-- These are identical in PostgreSQL
SELECT COUNT(*) FROM orders;
SELECT COUNT(1) FROM orders;
Both are optimized the same way. Use COUNT(*) for clarity.
Approximate Counts for Very Large Tables
-- Exact count (slow on huge tables)
SELECT COUNT(*) FROM orders;
-- Approximate count (fast, uses statistics)
SELECT reltuples::BIGINT AS approx_count
FROM pg_class
WHERE relname = 'orders';
Common Mistakes
Mistake 1: Forgetting GROUP BY
-- ❌ Error: Column must appear in GROUP BY or be used in aggregate
SELECT customer_id, COUNT(*)
FROM orders;
-- ✅ Correct
SELECT customer_id, COUNT(*)
FROM orders
GROUP BY customer_id;
(We'll cover GROUP BY in the next lesson)
Mistake 2: Treating NULL as 0
-- ❌ Wrong assumption
SELECT AVG(rating) FROM reviews;
-- NULL ratings are excluded, not counted as 0!
-- ✅ If you want NULL as 0
SELECT AVG(COALESCE(rating, 0)) FROM reviews;
Mistake 3: Using Aggregate in WHERE
-- ❌ Error: Cannot use aggregate in WHERE
SELECT customer_id
FROM orders
WHERE COUNT(*) > 5;
-- ✅ Use HAVING (covered in Lesson 5.3)
SELECT customer_id
FROM orders
GROUP BY customer_id
HAVING COUNT(*) > 5;
Practice Exercises
Exercise 1
Find the total number of products and the average price.
Solution
SELECT
COUNT(*) AS total_products,
ROUND(AVG(price), 2) AS avg_price
FROM products;
Exercise 2
Find the highest and lowest order totals for customer with ID 5.
Solution
SELECT
MIN(total) AS lowest_order,
MAX(total) AS highest_order
FROM orders
WHERE customer_id = 5;
Exercise 3
Calculate total revenue from all completed orders (status = 'delivered').
Solution
SELECT
COUNT(*) AS delivered_orders,
SUM(total) AS total_revenue
FROM orders
WHERE status = 'delivered';
Exercise 4
Find how many customers have an email address vs how many don't.
Solution
SELECT
COUNT(*) AS total_customers,
COUNT(email) AS has_email,
COUNT(*) - COUNT(email) AS no_email
FROM customers;
Key Takeaways
- ✅ COUNT() counts rows (COUNT(*) includes NULLs, COUNT(column) excludes them)
- ✅ SUM() adds numeric values
- ✅ AVG() calculates the mean (average)
- ✅ MIN() and MAX() find minimum and maximum values
- ✅ All aggregates ignore NULL (except COUNT(*))
- ✅ DISTINCT removes duplicates before aggregating
- ✅ FILTER clause allows conditional aggregates
- ✅ Aggregates work with JOINs for cross-table analysis
- ✅ Use COALESCE to treat NULL as a specific value
Next Steps
You've learned the five core aggregate functions! In the next lesson, we'll explore GROUP BY—how to group rows and calculate aggregates for each group (e.g., "total orders per customer" or "revenue per month").
Get ready to unlock powerful data analysis capabilities!

