DISTINCT - Removing Duplicates

Introduction

When querying data, you often get duplicate rows. For example:

-- Get all customer IDs from orders
SELECT customer_id FROM orders;

Result:

 customer_id
-------------
      1
      1       ← Duplicate
      1       ← Duplicate
      2
      3
      3       ← Duplicate

If you want only unique values, use DISTINCT!

DISTINCT Syntax

SELECT DISTINCT column1, column2, ...
FROM table
WHERE condition;

Basic DISTINCT Example

-- Unique customer IDs who placed orders
SELECT DISTINCT customer_id
FROM orders;

Result:

 customer_id
-------------
      1
      2
      3

Duplicates removed!

DISTINCT on Single Column

Unique Categories

SELECT DISTINCT category
FROM products;

Result:

   category
--------------
 Electronics
 Books
 Clothing

Unique Order Dates

SELECT DISTINCT DATE(order_date) AS order_date
FROM orders
ORDER BY order_date DESC;

Result:

 order_date
------------
 2024-12-20
 2024-12-19
 2024-12-18
 ...

DISTINCT on Multiple Columns

When using DISTINCT on multiple columns, it considers the combination of values.

SELECT DISTINCT category, price
FROM products;

Returns unique (category, price) pairs, not unique categories and unique prices separately.

Example

CREATE TABLE products_sample (
    name VARCHAR(100),
    category VARCHAR(50),
    price DECIMAL(10,2)
);

INSERT INTO products_sample VALUES
    ('Laptop A', 'Electronics', 999.99),
    ('Laptop B', 'Electronics', 999.99),  -- Same category + price
    ('Mouse A', 'Electronics', 25.99),
    ('Book A', 'Books', 19.99),
    ('Book B', 'Books', 19.99);           -- Same category + price

SELECT DISTINCT category, price
FROM products_sample;

Result:

   category   | price
--------------+--------
 Electronics  | 999.99  ← Only one row for (Electronics, 999.99)
 Electronics  |  25.99
 Books        |  19.99  ← Only one row for (Books, 19.99)

DISTINCT vs GROUP BY

They Can Produce the Same Result

-- Using DISTINCT
SELECT DISTINCT category
FROM products;

-- Using GROUP BY
SELECT category
FROM products
GROUP BY category;

Both return unique categories.

When to Use Each

Use DISTINCT when:

You only need unique values
No aggregation required
Simpler syntax

Use GROUP BY when:

You need aggregates (COUNT, SUM, AVG, etc.)
Grouping with calculations

Example: COUNT with GROUP BY

-- ❌ Can't do this with DISTINCT
SELECT category, COUNT(*) AS product_count
FROM products
GROUP BY category;

-- ✅ DISTINCT doesn't support aggregates
SELECT DISTINCT category
FROM products;
-- No way to add COUNT(*) here!

COUNT(DISTINCT column)

Count unique values in a column:

-- How many unique customers placed orders?
SELECT COUNT(DISTINCT customer_id) AS unique_customers
FROM orders;

Result:

 unique_customers
------------------
        87

Multiple COUNT(DISTINCT)

SELECT
    COUNT(*) AS total_orders,
    COUNT(DISTINCT customer_id) AS unique_customers,
    COUNT(DISTINCT DATE(order_date)) AS days_with_orders
FROM orders;

Result:

 total_orders | unique_customers | days_with_orders
--------------+------------------+------------------
     1000     |        87        |       124

DISTINCT with JOINs

Unique Customers Who Ordered

SELECT DISTINCT c.id, c.name
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id;

Alternative (often faster):

SELECT c.id, c.name
FROM customers c
WHERE EXISTS (
    SELECT 1 FROM orders WHERE customer_id = c.id
);

Unique Products Sold

SELECT DISTINCT p.name, p.category
FROM products p
INNER JOIN order_items oi ON p.id = oi.product_id;

Caution with Multiple Joins

-- ❌ May not give expected results
SELECT DISTINCT c.name
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id
INNER JOIN order_items oi ON o.id = oi.order_id;
-- Joins can create duplicates; DISTINCT removes them
-- But can be slow on large datasets!

-- ✅ Better: Use EXISTS
SELECT c.name
FROM customers c
WHERE EXISTS (
    SELECT 1 FROM orders WHERE customer_id = c.id
);

DISTINCT in Subqueries

Finding Customers Who Ordered

SELECT name
FROM customers
WHERE id IN (
    SELECT DISTINCT customer_id
    FROM orders
);

Note: DISTINCT inside IN subquery is often redundant (IN already handles duplicates).

-- These are equivalent:
WHERE id IN (SELECT DISTINCT customer_id FROM orders)
WHERE id IN (SELECT customer_id FROM orders)

DISTINCT with NULL

DISTINCT treats all NULL values as identical.

CREATE TABLE test_nulls (
    value INTEGER
);

INSERT INTO test_nulls VALUES (1), (2), (NULL), (NULL), (1);

SELECT DISTINCT value
FROM test_nulls;

Result:

 value
-------
     1
     2
  NULL  ← Only one NULL

DISTINCT with ORDER BY

Ordering Unique Results

SELECT DISTINCT category
FROM products
ORDER BY category;

Result:

   category
--------------
 Books
 Clothing
 Electronics

ORDER BY Non-Selected Columns

-- ❌ Error in most databases
SELECT DISTINCT name
FROM products
ORDER BY price DESC;
-- Error: ORDER BY column must appear in SELECT DISTINCT

-- ✅ Include price in SELECT
SELECT DISTINCT name, price
FROM products
ORDER BY price DESC;

Exception: PostgreSQL allows this if the table has a unique key, but it's not portable.

Performance Considerations

DISTINCT Can Be Expensive

-- Slow on large tables (requires sorting or hashing)
SELECT DISTINCT customer_id
FROM orders;  -- 1 million rows

PostgreSQL uses:

Hash aggregation (faster, more memory)
Sort-based (slower, less memory)

DISTINCT vs EXISTS

-- ❌ Slower: DISTINCT after JOIN
SELECT DISTINCT c.name
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id;

-- ✅ Faster: EXISTS
SELECT c.name
FROM customers c
WHERE EXISTS (SELECT 1 FROM orders WHERE customer_id = c.id);

DISTINCT vs GROUP BY Performance

-- Usually similar performance
SELECT DISTINCT category FROM products;
SELECT category FROM products GROUP BY category;

Check with EXPLAIN ANALYZE for your specific query!

DISTINCT ON (PostgreSQL-Specific)

DISTINCT ON returns the first row for each unique value of specified columns.

Syntax

SELECT DISTINCT ON (column1, column2)
    column1, column2, other_columns
FROM table
ORDER BY column1, column2, sort_column;

Example: Latest Order per Customer

SELECT DISTINCT ON (customer_id)
    customer_id,
    order_date,
    total
FROM orders
ORDER BY customer_id, order_date DESC;

Result:

 customer_id | order_date |  total
-------------+------------+--------
      1      | 2024-12-20 | 157.50  ← Most recent for customer 1
      2      | 2024-12-18 |  87.25  ← Most recent for customer 2
      3      | 2024-12-15 | 247.00  ← Most recent for customer 3

How it works:

Rows are sorted by customer_id, order_date DESC
For each customer_id, only the first row (most recent) is returned

Another Example: Cheapest Product per Category

SELECT DISTINCT ON (category)
    category,
    name,
    price
FROM products
ORDER BY category, price ASC;

Result:

   category   |     name      | price
--------------+---------------+-------
 Books        | SQL Basics    |  9.99  ← Cheapest book
 Clothing     | T-Shirt       | 14.99  ← Cheapest clothing
 Electronics  | USB Cable     |  5.99  ← Cheapest electronics

Note: DISTINCT ON is PostgreSQL-specific. Other databases use ROW_NUMBER() or subqueries.

DISTINCT ON Alternative (Portable)

-- Works in all databases
WITH ranked AS (
    SELECT
        *,
        ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date DESC) AS rn
    FROM orders
)
SELECT customer_id, order_date, total
FROM ranked
WHERE rn = 1;

Common DISTINCT Patterns

Pattern 1: Unique Values Report

-- How many unique values in each column?
SELECT
    COUNT(DISTINCT category) AS unique_categories,
    COUNT(DISTINCT name) AS unique_names,
    COUNT(DISTINCT price) AS unique_prices
FROM products;

Pattern 2: Data Quality Check

-- Find duplicate names
SELECT name, COUNT(*)
FROM products
GROUP BY name
HAVING COUNT(*) > 1;

Pattern 3: Active Users

-- Unique users who logged in last 30 days
SELECT DISTINCT user_id
FROM login_logs
WHERE login_date >= CURRENT_DATE - INTERVAL '30 days';

Pattern 4: Product Variety per Order

-- How many distinct products in each order?
SELECT
    order_id,
    COUNT(DISTINCT product_id) AS unique_products
FROM order_items
GROUP BY order_id;

DISTINCT vs UNION

UNION Automatically Removes Duplicates

SELECT name FROM customers WHERE state = 'CA'
UNION
SELECT name FROM customers WHERE state = 'NY';
-- Duplicates removed automatically!

UNION ALL Keeps Duplicates

SELECT name FROM customers WHERE state = 'CA'
UNION ALL
SELECT name FROM customers WHERE state = 'NY';
-- Duplicates kept (faster than UNION)

Common Mistakes

Mistake 1: DISTINCT on All Columns When Not Needed

-- ❌ Redundant if id is unique
SELECT DISTINCT id, name, email
FROM customers;

-- ✅ Just SELECT (id is already unique)
SELECT id, name, email
FROM customers;

Mistake 2: Using DISTINCT to Fix Bad JOINs

-- ❌ Bad: Using DISTINCT to hide JOIN issues
SELECT DISTINCT c.name
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id
INNER JOIN order_items oi ON o.id = oi.order_id;
-- Better to understand why duplicates occur!

-- ✅ Better: Use appropriate JOIN or EXISTS
SELECT c.name
FROM customers c
WHERE EXISTS (SELECT 1 FROM orders WHERE customer_id = c.id);

Mistake 3: DISTINCT with Aggregates

-- ❌ Doesn't make sense
SELECT DISTINCT customer_id, COUNT(*)
FROM orders
GROUP BY customer_id;
-- GROUP BY already ensures unique customer_id!

-- ✅ No DISTINCT needed
SELECT customer_id, COUNT(*)
FROM orders
GROUP BY customer_id;

Practice Exercises

Exercise 1

Find all unique categories in the products table.

Solution

SELECT DISTINCT category
FROM products
ORDER BY category;

Exercise 2

Count how many unique customers placed orders.

Solution

SELECT COUNT(DISTINCT customer_id) AS unique_customers
FROM orders;

Exercise 3

Find customers who placed orders, showing each customer only once.

Solution

-- Option 1: DISTINCT
SELECT DISTINCT c.id, c.name
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id
ORDER BY c.name;

-- Option 2: EXISTS (often faster)
SELECT c.id, c.name
FROM customers c
WHERE EXISTS (SELECT 1 FROM orders WHERE customer_id = c.id)
ORDER BY c.name;

Exercise 4

For each customer, find their most recent order (PostgreSQL DISTINCT ON).

Solution

-- PostgreSQL DISTINCT ON
SELECT DISTINCT ON (customer_id)
    customer_id,
    order_date,
    total
FROM orders
ORDER BY customer_id, order_date DESC;

-- Portable solution (all databases)
WITH ranked AS (
    SELECT
        customer_id,
        order_date,
        total,
        ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date DESC) AS rn
    FROM orders
)
SELECT customer_id, order_date, total
FROM ranked
WHERE rn = 1;

Key Takeaways

✅ DISTINCT removes duplicate rows from result sets
✅ Works on single or multiple columns (combination of values)
✅ COUNT(DISTINCT column) counts unique values
✅ DISTINCT vs GROUP BY: DISTINCT for unique values, GROUP BY for aggregates
✅ Can be expensive on large datasets (requires sorting/hashing)
✅ EXISTS often faster than DISTINCT with JOINs
✅ DISTINCT ON (PostgreSQL) returns first row per group
✅ All NULLs treated as identical by DISTINCT
✅ UNION removes duplicates, UNION ALL keeps them

Next Steps

You've mastered DISTINCT! In the next lesson, we'll learn LIMIT and OFFSET—controlling result set size and implementing pagination for efficient data retrieval.

Almost done with Module 6!

DISTINCT - Removing Duplicates

Introduction

When querying data, you often get duplicate rows. For example:

-- Get all customer IDs from orders
SELECT customer_id FROM orders;

Result:

 customer_id
-------------
      1
      1       ← Duplicate
      1       ← Duplicate
      2
      3
      3       ← Duplicate

If you want only unique values, use DISTINCT!

DISTINCT Syntax

SELECT DISTINCT column1, column2, ...
FROM table
WHERE condition;

Basic DISTINCT Example

-- Unique customer IDs who placed orders
SELECT DISTINCT customer_id
FROM orders;

Result:

 customer_id
-------------
      1
      2
      3

Duplicates removed!

DISTINCT on Single Column

Unique Categories

SELECT DISTINCT category
FROM products;

Result:

   category
--------------
 Electronics
 Books
 Clothing

Unique Order Dates

SELECT DISTINCT DATE(order_date) AS order_date
FROM orders
ORDER BY order_date DESC;

Result:

 order_date
------------
 2024-12-20
 2024-12-19
 2024-12-18
 ...

DISTINCT on Multiple Columns

When using DISTINCT on multiple columns, it considers the combination of values.

SELECT DISTINCT category, price
FROM products;

Returns unique (category, price) pairs, not unique categories and unique prices separately.

Example

CREATE TABLE products_sample (
    name VARCHAR(100),
    category VARCHAR(50),
    price DECIMAL(10,2)
);

INSERT INTO products_sample VALUES
    ('Laptop A', 'Electronics', 999.99),
    ('Laptop B', 'Electronics', 999.99),  -- Same category + price
    ('Mouse A', 'Electronics', 25.99),
    ('Book A', 'Books', 19.99),
    ('Book B', 'Books', 19.99);           -- Same category + price

SELECT DISTINCT category, price
FROM products_sample;

Result:

   category   | price
--------------+--------
 Electronics  | 999.99  ← Only one row for (Electronics, 999.99)
 Electronics  |  25.99
 Books        |  19.99  ← Only one row for (Books, 19.99)

DISTINCT vs GROUP BY

They Can Produce the Same Result

-- Using DISTINCT
SELECT DISTINCT category
FROM products;

-- Using GROUP BY
SELECT category
FROM products
GROUP BY category;

Both return unique categories.

When to Use Each

Use DISTINCT when:

You only need unique values
No aggregation required
Simpler syntax

Use GROUP BY when:

You need aggregates (COUNT, SUM, AVG, etc.)
Grouping with calculations

Example: COUNT with GROUP BY

-- ❌ Can't do this with DISTINCT
SELECT category, COUNT(*) AS product_count
FROM products
GROUP BY category;

-- ✅ DISTINCT doesn't support aggregates
SELECT DISTINCT category
FROM products;
-- No way to add COUNT(*) here!

COUNT(DISTINCT column)

Count unique values in a column:

-- How many unique customers placed orders?
SELECT COUNT(DISTINCT customer_id) AS unique_customers
FROM orders;

Result:

 unique_customers
------------------
        87

Multiple COUNT(DISTINCT)

SELECT
    COUNT(*) AS total_orders,
    COUNT(DISTINCT customer_id) AS unique_customers,
    COUNT(DISTINCT DATE(order_date)) AS days_with_orders
FROM orders;

Result:

 total_orders | unique_customers | days_with_orders
--------------+------------------+------------------
     1000     |        87        |       124

DISTINCT with JOINs

Unique Customers Who Ordered

SELECT DISTINCT c.id, c.name
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id;

Alternative (often faster):

SELECT c.id, c.name
FROM customers c
WHERE EXISTS (
    SELECT 1 FROM orders WHERE customer_id = c.id
);

Unique Products Sold

SELECT DISTINCT p.name, p.category
FROM products p
INNER JOIN order_items oi ON p.id = oi.product_id;

Caution with Multiple Joins

-- ❌ May not give expected results
SELECT DISTINCT c.name
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id
INNER JOIN order_items oi ON o.id = oi.order_id;
-- Joins can create duplicates; DISTINCT removes them
-- But can be slow on large datasets!

-- ✅ Better: Use EXISTS
SELECT c.name
FROM customers c
WHERE EXISTS (
    SELECT 1 FROM orders WHERE customer_id = c.id
);

DISTINCT in Subqueries

Finding Customers Who Ordered

SELECT name
FROM customers
WHERE id IN (
    SELECT DISTINCT customer_id
    FROM orders
);

Note: DISTINCT inside IN subquery is often redundant (IN already handles duplicates).

-- These are equivalent:
WHERE id IN (SELECT DISTINCT customer_id FROM orders)
WHERE id IN (SELECT customer_id FROM orders)

DISTINCT with NULL

DISTINCT treats all NULL values as identical.

CREATE TABLE test_nulls (
    value INTEGER
);

INSERT INTO test_nulls VALUES (1), (2), (NULL), (NULL), (1);

SELECT DISTINCT value
FROM test_nulls;

Result:

 value
-------
     1
     2
  NULL  ← Only one NULL

DISTINCT with ORDER BY

Ordering Unique Results

SELECT DISTINCT category
FROM products
ORDER BY category;

Result:

   category
--------------
 Books
 Clothing
 Electronics

ORDER BY Non-Selected Columns

-- ❌ Error in most databases
SELECT DISTINCT name
FROM products
ORDER BY price DESC;
-- Error: ORDER BY column must appear in SELECT DISTINCT

-- ✅ Include price in SELECT
SELECT DISTINCT name, price
FROM products
ORDER BY price DESC;

Exception: PostgreSQL allows this if the table has a unique key, but it's not portable.

Performance Considerations

DISTINCT Can Be Expensive

-- Slow on large tables (requires sorting or hashing)
SELECT DISTINCT customer_id
FROM orders;  -- 1 million rows

PostgreSQL uses:

Hash aggregation (faster, more memory)
Sort-based (slower, less memory)

DISTINCT vs EXISTS

-- ❌ Slower: DISTINCT after JOIN
SELECT DISTINCT c.name
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id;

-- ✅ Faster: EXISTS
SELECT c.name
FROM customers c
WHERE EXISTS (SELECT 1 FROM orders WHERE customer_id = c.id);

DISTINCT vs GROUP BY Performance

-- Usually similar performance
SELECT DISTINCT category FROM products;
SELECT category FROM products GROUP BY category;

Check with EXPLAIN ANALYZE for your specific query!

DISTINCT ON (PostgreSQL-Specific)

DISTINCT ON returns the first row for each unique value of specified columns.

Syntax

SELECT DISTINCT ON (column1, column2)
    column1, column2, other_columns
FROM table
ORDER BY column1, column2, sort_column;

Example: Latest Order per Customer

SELECT DISTINCT ON (customer_id)
    customer_id,
    order_date,
    total
FROM orders
ORDER BY customer_id, order_date DESC;

Result:

 customer_id | order_date |  total
-------------+------------+--------
      1      | 2024-12-20 | 157.50  ← Most recent for customer 1
      2      | 2024-12-18 |  87.25  ← Most recent for customer 2
      3      | 2024-12-15 | 247.00  ← Most recent for customer 3

How it works:

Rows are sorted by customer_id, order_date DESC
For each customer_id, only the first row (most recent) is returned

Another Example: Cheapest Product per Category

SELECT DISTINCT ON (category)
    category,
    name,
    price
FROM products
ORDER BY category, price ASC;

Result:

   category   |     name      | price
--------------+---------------+-------
 Books        | SQL Basics    |  9.99  ← Cheapest book
 Clothing     | T-Shirt       | 14.99  ← Cheapest clothing
 Electronics  | USB Cable     |  5.99  ← Cheapest electronics

Note: DISTINCT ON is PostgreSQL-specific. Other databases use ROW_NUMBER() or subqueries.

DISTINCT ON Alternative (Portable)

-- Works in all databases
WITH ranked AS (
    SELECT
        *,
        ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date DESC) AS rn
    FROM orders
)
SELECT customer_id, order_date, total
FROM ranked
WHERE rn = 1;

Common DISTINCT Patterns

Pattern 1: Unique Values Report

-- How many unique values in each column?
SELECT
    COUNT(DISTINCT category) AS unique_categories,
    COUNT(DISTINCT name) AS unique_names,
    COUNT(DISTINCT price) AS unique_prices
FROM products;

Pattern 2: Data Quality Check

-- Find duplicate names
SELECT name, COUNT(*)
FROM products
GROUP BY name
HAVING COUNT(*) > 1;

Pattern 3: Active Users

-- Unique users who logged in last 30 days
SELECT DISTINCT user_id
FROM login_logs
WHERE login_date >= CURRENT_DATE - INTERVAL '30 days';

Pattern 4: Product Variety per Order

-- How many distinct products in each order?
SELECT
    order_id,
    COUNT(DISTINCT product_id) AS unique_products
FROM order_items
GROUP BY order_id;

DISTINCT vs UNION

UNION Automatically Removes Duplicates

SELECT name FROM customers WHERE state = 'CA'
UNION
SELECT name FROM customers WHERE state = 'NY';
-- Duplicates removed automatically!

UNION ALL Keeps Duplicates

SELECT name FROM customers WHERE state = 'CA'
UNION ALL
SELECT name FROM customers WHERE state = 'NY';
-- Duplicates kept (faster than UNION)

Common Mistakes

Mistake 1: DISTINCT on All Columns When Not Needed

-- ❌ Redundant if id is unique
SELECT DISTINCT id, name, email
FROM customers;

-- ✅ Just SELECT (id is already unique)
SELECT id, name, email
FROM customers;

Mistake 2: Using DISTINCT to Fix Bad JOINs

-- ❌ Bad: Using DISTINCT to hide JOIN issues
SELECT DISTINCT c.name
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id
INNER JOIN order_items oi ON o.id = oi.order_id;
-- Better to understand why duplicates occur!

-- ✅ Better: Use appropriate JOIN or EXISTS
SELECT c.name
FROM customers c
WHERE EXISTS (SELECT 1 FROM orders WHERE customer_id = c.id);

Mistake 3: DISTINCT with Aggregates

-- ❌ Doesn't make sense
SELECT DISTINCT customer_id, COUNT(*)
FROM orders
GROUP BY customer_id;
-- GROUP BY already ensures unique customer_id!

-- ✅ No DISTINCT needed
SELECT customer_id, COUNT(*)
FROM orders
GROUP BY customer_id;

Practice Exercises

Exercise 1

Find all unique categories in the products table.

Solution

SELECT DISTINCT category
FROM products
ORDER BY category;

Exercise 2

Count how many unique customers placed orders.

Solution

SELECT COUNT(DISTINCT customer_id) AS unique_customers
FROM orders;

Exercise 3

Find customers who placed orders, showing each customer only once.

Solution

-- Option 1: DISTINCT
SELECT DISTINCT c.id, c.name
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id
ORDER BY c.name;

-- Option 2: EXISTS (often faster)
SELECT c.id, c.name
FROM customers c
WHERE EXISTS (SELECT 1 FROM orders WHERE customer_id = c.id)
ORDER BY c.name;

Exercise 4

For each customer, find their most recent order (PostgreSQL DISTINCT ON).

Solution

-- PostgreSQL DISTINCT ON
SELECT DISTINCT ON (customer_id)
    customer_id,
    order_date,
    total
FROM orders
ORDER BY customer_id, order_date DESC;

-- Portable solution (all databases)
WITH ranked AS (
    SELECT
        customer_id,
        order_date,
        total,
        ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date DESC) AS rn
    FROM orders
)
SELECT customer_id, order_date, total
FROM ranked
WHERE rn = 1;

Key Takeaways

✅ DISTINCT removes duplicate rows from result sets
✅ Works on single or multiple columns (combination of values)
✅ COUNT(DISTINCT column) counts unique values
✅ DISTINCT vs GROUP BY: DISTINCT for unique values, GROUP BY for aggregates
✅ Can be expensive on large datasets (requires sorting/hashing)
✅ EXISTS often faster than DISTINCT with JOINs
✅ DISTINCT ON (PostgreSQL) returns first row per group
✅ All NULLs treated as identical by DISTINCT
✅ UNION removes duplicates, UNION ALL keeps them

Next Steps

You've mastered DISTINCT! In the next lesson, we'll learn LIMIT and OFFSET—controlling result set size and implementing pagination for efficient data retrieval.

Almost done with Module 6!

DISTINCT - Removing Duplicates

Introduction

DISTINCT Syntax

Basic DISTINCT Example

DISTINCT on Single Column

Unique Categories

Unique Order Dates

DISTINCT on Multiple Columns

Example

DISTINCT vs GROUP BY

They Can Produce the Same Result

When to Use Each

Example: COUNT with GROUP BY

COUNT(DISTINCT column)

Multiple COUNT(DISTINCT)

DISTINCT with JOINs

Unique Customers Who Ordered

Unique Products Sold

Caution with Multiple Joins

DISTINCT in Subqueries

Finding Customers Who Ordered

DISTINCT with NULL

DISTINCT with ORDER BY

Ordering Unique Results

ORDER BY Non-Selected Columns

Performance Considerations

DISTINCT Can Be Expensive

DISTINCT vs EXISTS

DISTINCT vs GROUP BY Performance

DISTINCT ON (PostgreSQL-Specific)

Syntax

Example: Latest Order per Customer

Another Example: Cheapest Product per Category

DISTINCT ON Alternative (Portable)

Common DISTINCT Patterns

Pattern 1: Unique Values Report

Pattern 2: Data Quality Check

Pattern 3: Active Users

Pattern 4: Product Variety per Order

DISTINCT vs UNION

UNION Automatically Removes Duplicates

UNION ALL Keeps Duplicates

Common Mistakes

Mistake 1: DISTINCT on All Columns When Not Needed

Mistake 2: Using DISTINCT to Fix Bad JOINs

Mistake 3: DISTINCT with Aggregates

Practice Exercises

Exercise 1

Exercise 2

Exercise 3

Exercise 4

Key Takeaways

Next Steps

Quiz

DISTINCT - Removing Duplicates

Introduction

DISTINCT Syntax

Basic DISTINCT Example

DISTINCT on Single Column

Unique Categories

Unique Order Dates

DISTINCT on Multiple Columns

Example

DISTINCT vs GROUP BY

They Can Produce the Same Result

When to Use Each

Example: COUNT with GROUP BY

COUNT(DISTINCT column)

Multiple COUNT(DISTINCT)

DISTINCT with JOINs

Unique Customers Who Ordered

Unique Products Sold

Caution with Multiple Joins

DISTINCT in Subqueries

Finding Customers Who Ordered

DISTINCT with NULL

DISTINCT with ORDER BY

Ordering Unique Results

ORDER BY Non-Selected Columns

Performance Considerations