Subqueries - Queries Within Queries
Introduction
You've written many SQL queries, but what if you need to use the result of one query inside another query?
Examples:
- "Find customers who spent more than the average"
- "Get products that have never been ordered"
- "Show orders from the customer with the highest total spending"
Subqueries (also called nested queries or inner queries) are queries embedded within another query. They enable complex logic and dynamic filtering that would be difficult or impossible with simple queries.
What is a Subquery?
Basic Structure
SELECT column1
FROM table1
WHERE column2 = (SELECT column3 FROM table2 WHERE condition);
└────────── Subquery ──────────┘
The subquery executes first, then its result is used by the outer query.
Simple Example
-- Find products more expensive than the average
SELECT name, price
FROM products
WHERE price > (SELECT AVG(price) FROM products);
Execution:
- Inner query:
SELECT AVG(price) FROM products→ Returns 45.99 - Outer query:
SELECT name, price FROM products WHERE price > 45.99
Result:
name | price
-------------------+--------
Laptop Pro 15 | 999.99
Wireless Mouse | 79.99
USB-C Hub | 54.99
Types of Subqueries
1. Scalar Subqueries (Single Value)
Returns one row, one column (a single value).
-- Customers who spent more than the average customer
SELECT
name,
(SELECT SUM(total) FROM orders WHERE customer_id = c.id) AS total_spent
FROM customers c
WHERE (SELECT SUM(total) FROM orders WHERE customer_id = c.id) >
(SELECT AVG(total_spent)
FROM (SELECT SUM(total) AS total_spent
FROM orders
GROUP BY customer_id) AS customer_totals);
2. Column Subqueries (Multiple Rows, One Column)
Returns multiple rows, one column (a list of values).
-- Orders from customers in California
SELECT order_id, total
FROM orders
WHERE customer_id IN (
SELECT id
FROM customers
WHERE state = 'CA'
);
3. Row Subqueries (One or More Rows, Multiple Columns)
Returns one or more complete rows.
-- Products with same price and category as product ID 5
SELECT name, price, category
FROM products
WHERE (price, category) = (
SELECT price, category
FROM products
WHERE id = 5
);
4. Table Subqueries (Multiple Rows and Columns)
Returns a full result set (acts like a table).
-- Average order value per customer
SELECT
customer_id,
AVG(total) AS avg_order_value
FROM orders
GROUP BY customer_id
HAVING AVG(total) > (
SELECT AVG(total)
FROM orders
);
WHERE Clause Subqueries
Using = with Scalar Subqueries
-- Find the most expensive product
SELECT name, price
FROM products
WHERE price = (SELECT MAX(price) FROM products);
Using IN with Multiple Values
-- Customers who placed orders in the last 30 days
SELECT name, email
FROM customers
WHERE id IN (
SELECT DISTINCT customer_id
FROM orders
WHERE order_date >= CURRENT_DATE - INTERVAL '30 days'
);
Using NOT IN
-- Customers who have never placed an order
SELECT name, email
FROM customers
WHERE id NOT IN (
SELECT DISTINCT customer_id
FROM orders
WHERE customer_id IS NOT NULL
);
Warning: NOT IN with NULLs can cause unexpected behavior:
-- ❌ Returns no rows if subquery contains NULL
SELECT name FROM customers
WHERE id NOT IN (SELECT customer_id FROM orders);
-- If any customer_id is NULL, entire result is empty!
-- ✅ Better: Use NOT EXISTS or filter NULLs
SELECT name FROM customers
WHERE id NOT IN (
SELECT customer_id FROM orders WHERE customer_id IS NOT NULL
);
Using Comparison Operators with Scalar Subqueries
-- Products cheaper than the average price
SELECT name, price
FROM products
WHERE price < (SELECT AVG(price) FROM products);
-- Orders larger than customer's average order
SELECT o.id, o.total
FROM orders o
WHERE o.total > (
SELECT AVG(total)
FROM orders
WHERE customer_id = o.customer_id
);
FROM Clause Subqueries (Derived Tables)
Basic Derived Table
-- Average order value per customer, then filter
SELECT *
FROM (
SELECT
customer_id,
AVG(total) AS avg_order_value
FROM orders
GROUP BY customer_id
) AS customer_avg
WHERE avg_order_value > 100;
The subquery creates a temporary "table" called customer_avg.
Practical Example: Top Customers
-- Top 10 customers by total spending
SELECT
c.name,
customer_totals.total_spent
FROM customers c
INNER JOIN (
SELECT
customer_id,
SUM(total) AS total_spent
FROM orders
GROUP BY customer_id
ORDER BY total_spent DESC
LIMIT 10
) AS customer_totals ON c.id = customer_totals.customer_id
ORDER BY customer_totals.total_spent DESC;
SELECT Clause Subqueries (Correlated)
Adding Calculated Columns
-- Show each customer with their order count
SELECT
name,
email,
(SELECT COUNT(*) FROM orders WHERE customer_id = c.id) AS order_count
FROM customers c;
Result:
name | email | order_count
-----------------+----------------------+-------------
Alice Smith | alice@example.com | 12
Bob Jones | bob@example.com | 5
Carol White | carol@example.com | 0
Multiple Subqueries in SELECT
SELECT
c.name,
(SELECT COUNT(*) FROM orders WHERE customer_id = c.id) AS order_count,
(SELECT COALESCE(SUM(total), 0) FROM orders WHERE customer_id = c.id) AS total_spent,
(SELECT MAX(order_date) FROM orders WHERE customer_id = c.id) AS last_order_date
FROM customers c;
Correlated Subqueries
What is a Correlated Subquery?
A correlated subquery references columns from the outer query. It executes once for each row of the outer query.
-- Products with above-average price in their category
SELECT name, category, price
FROM products p
WHERE price > (
SELECT AVG(price)
FROM products
WHERE category = p.category -- ← References outer query!
);
Execution:
- For each product, the subquery runs with that product's category
- Compares product's price to its category's average
Correlated vs Non-Correlated
-- Non-correlated: Runs once
SELECT name FROM products
WHERE price > (SELECT AVG(price) FROM products);
-- Correlated: Runs once per row
SELECT name FROM products p
WHERE price > (
SELECT AVG(price)
FROM products
WHERE category = p.category
);
Performance Note
Correlated subqueries can be slow on large tables (runs N times for N rows). Consider using JOINs instead when possible.
EXISTS and NOT EXISTS
EXISTS - Check if Subquery Returns Any Rows
-- Customers who placed at least one order
SELECT name, email
FROM customers c
WHERE EXISTS (
SELECT 1
FROM orders
WHERE customer_id = c.id
);
EXISTS returns TRUE if the subquery returns any rows (even one).
Why SELECT 1? The actual values don't matter—only whether rows exist. SELECT 1 is conventional and efficient.
NOT EXISTS - Check if Subquery Returns No Rows
-- Customers who have never placed an order
SELECT name, email
FROM customers c
WHERE NOT EXISTS (
SELECT 1
FROM orders
WHERE customer_id = c.id
);
EXISTS vs IN
-- These are equivalent:
-- Using IN
SELECT name FROM customers
WHERE id IN (SELECT customer_id FROM orders);
-- Using EXISTS
SELECT name FROM customers c
WHERE EXISTS (
SELECT 1 FROM orders WHERE customer_id = c.id
);
Differences:
- EXISTS stops as soon as it finds one matching row (faster)
- IN evaluates the entire subquery first
- EXISTS handles NULLs better than IN
Recommendation: Use EXISTS for correlated subqueries, IN for simple lists.
ANY and ALL Operators
ANY (or SOME)
-- Products more expensive than ANY product in 'Books' category
SELECT name, price
FROM products
WHERE price > ANY (
SELECT price
FROM products
WHERE category = 'Books'
);
-- Equivalent to: price > (minimum book price)
ALL
-- Products more expensive than ALL products in 'Books' category
SELECT name, price
FROM products
WHERE price > ALL (
SELECT price
FROM products
WHERE category = 'Books'
);
-- Equivalent to: price > (maximum book price)
Practical Example
-- Customers who spent more than any customer from California
SELECT c.name, SUM(o.total) AS total_spent
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id
GROUP BY c.id, c.name
HAVING SUM(o.total) > ANY (
SELECT SUM(total)
FROM orders o2
INNER JOIN customers c2 ON o2.customer_id = c2.id
WHERE c2.state = 'CA'
GROUP BY c2.id
);
Common Subquery Patterns
Pattern 1: Find Maximum with Details
-- Most expensive product (with all details)
SELECT *
FROM products
WHERE price = (SELECT MAX(price) FROM products);
Pattern 2: Above Average
-- Customers who spent more than average
SELECT
c.name,
SUM(o.total) AS total_spent
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id
GROUP BY c.id, c.name
HAVING SUM(o.total) > (
SELECT AVG(total_per_customer)
FROM (
SELECT SUM(total) AS total_per_customer
FROM orders
GROUP BY customer_id
) AS customer_totals
);
Pattern 3: Top N per Group
-- Top 3 most expensive products per category
SELECT *
FROM products p1
WHERE (
SELECT COUNT(*)
FROM products p2
WHERE p2.category = p1.category
AND p2.price > p1.price
) < 3
ORDER BY category, price DESC;
Pattern 4: Latest Record per Group
-- Each customer's most recent order
SELECT c.name, o.order_date, o.total
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id
WHERE o.order_date = (
SELECT MAX(order_date)
FROM orders
WHERE customer_id = c.id
);
Subqueries vs JOINs
Many Subqueries Can Be Rewritten as JOINs
-- Using subquery
SELECT name
FROM customers
WHERE id IN (
SELECT customer_id
FROM orders
WHERE total > 100
);
-- Using JOIN (often faster)
SELECT DISTINCT c.name
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id
WHERE o.total > 100;
When to Use Subqueries
- ✅ When you need a single aggregated value
- ✅ For EXISTS/NOT EXISTS checks
- ✅ When the logic is clearer with nesting
- ✅ For operations on derived tables
When to Use JOINs
- ✅ For better performance on large datasets
- ✅ When combining data from multiple tables
- ✅ When you need columns from both tables
- ✅ For most production queries
Performance Considerations
Subqueries Can Be Slow
-- ❌ Slow: Correlated subquery runs N times
SELECT
c.name,
(SELECT COUNT(*) FROM orders WHERE customer_id = c.id)
FROM customers c;
-- ✅ Faster: Single JOIN + GROUP BY
SELECT
c.name,
COUNT(o.id)
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
GROUP BY c.id, c.name;
Use EXPLAIN to Analyze
EXPLAIN ANALYZE
SELECT name
FROM products
WHERE price > (SELECT AVG(price) FROM products);
Common Mistakes
Mistake 1: Subquery Returns Multiple Rows with =
-- ❌ Error if subquery returns multiple rows
SELECT name FROM products
WHERE price = (SELECT price FROM products WHERE category = 'Books');
-- ✅ Use IN for multiple values
SELECT name FROM products
WHERE price IN (SELECT price FROM products WHERE category = 'Books');
Mistake 2: NOT IN with NULLs
-- ❌ Returns nothing if subquery contains NULL
SELECT name FROM customers
WHERE id NOT IN (SELECT customer_id FROM orders);
-- ✅ Filter NULLs
SELECT name FROM customers
WHERE id NOT IN (SELECT customer_id FROM orders WHERE customer_id IS NOT NULL);
-- ✅ Or use NOT EXISTS
SELECT name FROM customers c
WHERE NOT EXISTS (SELECT 1 FROM orders WHERE customer_id = c.id);
Mistake 3: Forgetting Alias for Derived Tables
-- ❌ Error: Subquery must have an alias
SELECT * FROM (
SELECT customer_id, SUM(total)
FROM orders
GROUP BY customer_id
);
-- ✅ Add alias
SELECT * FROM (
SELECT customer_id, SUM(total) AS total_spent
FROM orders
GROUP BY customer_id
) AS customer_totals;
Practice Exercises
Exercise 1
Find all products more expensive than the average product in their category.
Solution
SELECT name, category, price
FROM products p
WHERE price > (
SELECT AVG(price)
FROM products
WHERE category = p.category
);
Exercise 2
Find customers who have never placed an order.
Solution
SELECT name, email
FROM customers
WHERE id NOT IN (
SELECT DISTINCT customer_id
FROM orders
WHERE customer_id IS NOT NULL
);
-- OR using NOT EXISTS
SELECT name, email
FROM customers c
WHERE NOT EXISTS (
SELECT 1
FROM orders
WHERE customer_id = c.id
);
Exercise 3
Find orders with a total greater than the average order total.
Solution
SELECT id, customer_id, total
FROM orders
WHERE total > (SELECT AVG(total) FROM orders)
ORDER BY total DESC;
Exercise 4
For each customer, show their name and total spent (using a subquery in SELECT).
Solution
SELECT
name,
(SELECT COALESCE(SUM(total), 0)
FROM orders
WHERE customer_id = c.id) AS total_spent
FROM customers c
ORDER BY total_spent DESC;
Key Takeaways
- ✅ Subqueries are queries nested inside other queries
- ✅ Scalar subqueries return a single value (use with comparison operators)
- ✅ Column subqueries return multiple rows (use with IN, ANY, ALL)
- ✅ Correlated subqueries reference the outer query (run once per row)
- ✅ EXISTS checks if subquery returns any rows (efficient)
- ✅ NOT IN with NULLs can cause problems (use NOT EXISTS or filter NULLs)
- ✅ Derived tables (FROM clause subqueries) act like temporary tables
- ✅ JOINs are often faster than subqueries
- ✅ Use EXPLAIN to analyze query performance
Next Steps
You've mastered subqueries! In the next lesson, we'll learn CTEs (Common Table Expressions)—a cleaner, more readable way to write complex queries with multiple subqueries.
Get ready to write more maintainable SQL!

