String Functions and Pattern Matching

Introduction

Text data is everywhere: names, emails, addresses, descriptions, URLs. SQL provides powerful functions for manipulating and searching strings.

This lesson covers:

String functions (CONCAT, UPPER, LOWER, TRIM, etc.)
Pattern matching with LIKE and ILIKE
Regular expressions
Text search techniques

String Concatenation

CONCAT Function

SELECT CONCAT(first_name, ' ', last_name) AS full_name
FROM users;

Result:

   full_name
----------------
 Alice Smith
 Bob Jones
 Carol White

|| Operator (PostgreSQL, SQLite)

SELECT first_name || ' ' || last_name AS full_name
FROM users;

Handling NULLs

-- ❌ Returns NULL if any part is NULL
SELECT first_name || ' ' || middle_name || ' ' || last_name
FROM users;
-- If middle_name is NULL, entire result is NULL!

-- ✅ Use CONCAT (treats NULL as empty string)
SELECT CONCAT(first_name, ' ', middle_name, ' ', last_name)
FROM users;

-- ✅ Or use COALESCE
SELECT first_name || ' ' || COALESCE(middle_name || ' ', '') || last_name
FROM users;

CONCAT_WS (Concatenate With Separator)

-- Skip NULLs automatically
SELECT CONCAT_WS(' ', first_name, middle_name, last_name) AS full_name
FROM users;
-- NULLs are ignored; no extra spaces

Result:

     full_name
--------------------
 Alice Marie Smith
 Bob Jones          ← No middle name, no extra space
 Carol Ann White

Case Conversion

UPPER and LOWER

SELECT
    name,
    UPPER(name) AS uppercase,
    LOWER(name) AS lowercase
FROM products;

Result:

       name        |   uppercase    |   lowercase
-------------------+----------------+----------------
 Laptop Pro 15     | LAPTOP PRO 15  | laptop pro 15
 Wireless Mouse    | WIRELESS MOUSE | wireless mouse

INITCAP (Title Case)

-- PostgreSQL
SELECT INITCAP('hello world') AS title_case;

Result:

 title_case
------------
 Hello World

Case-Insensitive Comparison

-- ❌ Case-sensitive (in most databases)
SELECT name FROM customers
WHERE name = 'alice smith';
-- Won't match 'Alice Smith'

-- ✅ Case-insensitive
SELECT name FROM customers
WHERE LOWER(name) = LOWER('alice smith');

-- ✅ PostgreSQL: ILIKE
SELECT name FROM customers
WHERE name ILIKE 'alice smith';

String Trimming

TRIM (Remove Leading/Trailing Spaces)

SELECT TRIM('   hello world   ') AS trimmed;

Result:

   trimmed
--------------
 hello world

LTRIM and RTRIM

SELECT
    LTRIM('   hello') AS left_trimmed,   -- 'hello'
    RTRIM('hello   ') AS right_trimmed;  -- 'hello'

Trim Specific Characters

-- Remove leading/trailing dashes
SELECT TRIM('-' FROM '---hello---') AS trimmed;
-- Result: 'hello'

-- Remove multiple characters
SELECT TRIM(BOTH '- ' FROM '--- hello ---') AS trimmed;
-- Result: 'hello'

Substring Extraction

SUBSTRING (or SUBSTR)

-- Extract substring starting at position, for length characters
SELECT SUBSTRING('Hello World', 1, 5) AS substr;
-- Result: 'Hello'

-- PostgreSQL syntax (1-indexed)
SELECT SUBSTRING('Hello World' FROM 7 FOR 5) AS substr;
-- Result: 'World'

-- Extract from position to end
SELECT SUBSTRING('Hello World', 7) AS substr;
-- Result: 'World'

LEFT and RIGHT

SELECT
    LEFT('Hello World', 5) AS left_part,   -- 'Hello'
    RIGHT('Hello World', 5) AS right_part; -- 'World'

Practical Example: Extract Domain from Email

SELECT
    email,
    SUBSTRING(email FROM POSITION('@' IN email) + 1) AS domain
FROM customers;

Result:

        email         |    domain
----------------------+---------------
 alice@example.com    | example.com
 bob@company.org      | company.org

String Length

LENGTH (or CHAR_LENGTH)

SELECT
    name,
    LENGTH(name) AS char_count
FROM products;

Result:

       name        | char_count
-------------------+------------
 Laptop Pro 15     |     13
 Mouse             |      5

OCTET_LENGTH (Byte Length)

-- Important for UTF-8 text
SELECT
    'Hello' AS text,
    LENGTH('Hello') AS char_length,     -- 5
    OCTET_LENGTH('Hello') AS byte_length; -- 5

SELECT
    '你好' AS text,
    LENGTH('你好') AS char_length,         -- 2 characters
    OCTET_LENGTH('你好') AS byte_length;  -- 6 bytes (UTF-8)

String Position and Search

POSITION (or STRPOS)

-- Find position of substring
SELECT POSITION('World' IN 'Hello World') AS pos;
-- Result: 7

-- PostgreSQL: STRPOS
SELECT STRPOS('Hello World', 'World') AS pos;
-- Result: 7

-- Returns 0 if not found
SELECT POSITION('xyz' IN 'Hello World') AS pos;
-- Result: 0

Practical Example: Check Email Validity

SELECT
    email,
    CASE
        WHEN POSITION('@' IN email) > 0
         AND POSITION('.' IN SUBSTRING(email FROM POSITION('@' IN email))) > 0
        THEN 'Valid'
        ELSE 'Invalid'
    END AS email_status
FROM customers;

Pattern Matching with LIKE

LIKE Syntax

SELECT name
FROM products
WHERE name LIKE pattern;

Wildcards:

% matches zero or more characters
_ matches exactly one character

Examples

-- Starts with 'Laptop'
SELECT name FROM products
WHERE name LIKE 'Laptop%';

-- Ends with 'Mouse'
SELECT name FROM products
WHERE name LIKE '%Mouse';

-- Contains 'Pro'
SELECT name FROM products
WHERE name LIKE '%Pro%';

-- Exactly 5 characters
SELECT name FROM products
WHERE name LIKE '_____';

-- Second character is 'a'
SELECT name FROM products
WHERE name LIKE '_a%';

-- Starts with 'A' or 'B' (NOT directly supported, use OR)
SELECT name FROM products
WHERE name LIKE 'A%' OR name LIKE 'B%';

Case-Insensitive LIKE (ILIKE in PostgreSQL)

-- PostgreSQL: ILIKE
SELECT name FROM customers
WHERE name ILIKE 'alice%';
-- Matches 'Alice', 'ALICE', 'alice', etc.

-- Other databases: Use LOWER
SELECT name FROM customers
WHERE LOWER(name) LIKE LOWER('alice%');

Escaping Wildcards

-- Find products with literal '%' in name
SELECT name FROM products
WHERE name LIKE '%\%%' ESCAPE '\';

-- Find products with literal '_'
SELECT name FROM products
WHERE name LIKE '%\_% ESCAPE '\';

NOT LIKE

-- Products NOT starting with 'Laptop'
SELECT name FROM products
WHERE name NOT LIKE 'Laptop%';

Regular Expressions

PostgreSQL: ~ Operator

-- Matches pattern
SELECT name FROM products
WHERE name ~ '^Laptop.*Pro$';
-- ^ = start, .* = any characters, $ = end

-- Case-insensitive: ~*
SELECT name FROM products
WHERE name ~* 'laptop';  -- Matches 'Laptop', 'LAPTOP', etc.

-- Does NOT match: !~
SELECT name FROM products
WHERE name !~ '[0-9]';  -- Products without numbers

REGEXP_MATCHES (PostgreSQL)

-- Extract all numbers from text
SELECT REGEXP_MATCHES('Order #12345 - $99.99', '[0-9]+', 'g') AS numbers;

Result:

  numbers
-----------
 {12345}
 {99}
 {99}

REGEXP_REPLACE

-- Replace all digits with 'X'
SELECT REGEXP_REPLACE('Phone: 555-1234', '[0-9]', 'X', 'g') AS masked;
-- Result: 'Phone: XXX-XXXX'

-- Remove all non-alphanumeric
SELECT REGEXP_REPLACE('Hello, World! 123', '[^a-zA-Z0-9]', '', 'g') AS clean;
-- Result: 'HelloWorld123'

Validate Email with Regex

SELECT
    email,
    email ~ '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$' AS is_valid
FROM customers;

String Replacement

REPLACE

SELECT REPLACE('Hello World', 'World', 'SQL') AS replaced;
-- Result: 'Hello SQL'

-- Remove substring (replace with empty string)
SELECT REPLACE('Hello-World', '-', '') AS cleaned;
-- Result: 'HelloWorld'

Practical Example: Clean Phone Numbers

SELECT
    phone,
    REPLACE(REPLACE(REPLACE(phone, '-', ''), '(', ''), ')', '') AS cleaned
FROM customers;

Result:

     phone      |  cleaned
----------------+------------
 (555) 123-4567 | 5551234567
 555-987-6543   | 5559876543

String Splitting

SPLIT_PART (PostgreSQL)

-- Split string and get Nth part (1-indexed)
SELECT SPLIT_PART('apple,banana,cherry', ',', 2) AS fruit;
-- Result: 'banana'

STRING_TO_ARRAY (PostgreSQL)

SELECT STRING_TO_ARRAY('apple,banana,cherry', ',') AS fruits;
-- Result: {apple,banana,cherry}

Practical Example: Parse CSV Data

SELECT
    SPLIT_PART(csv_row, ',', 1) AS name,
    SPLIT_PART(csv_row, ',', 2) AS email,
    SPLIT_PART(csv_row, ',', 3) AS phone
FROM raw_data;

String Aggregation

STRING_AGG (PostgreSQL)

-- Concatenate values from multiple rows
SELECT
    customer_id,
    STRING_AGG(product_name, ', ' ORDER BY product_name) AS products
FROM order_items
GROUP BY customer_id;

Result:

 customer_id |           products
-------------+--------------------------------
      1      | Laptop Pro 15, Mouse, USB Cable
      2      | Desk, Monitor

Common String Patterns

Pattern 1: Format Full Name

SELECT
    CONCAT_WS(' ', first_name, middle_initial || '.', last_name) AS formatted_name
FROM users;

Pattern 2: Mask Sensitive Data

-- Show last 4 digits of credit card
SELECT
    'XXXX-XXXX-XXXX-' || RIGHT(card_number, 4) AS masked_card
FROM payments;

Pattern 3: Extract Username from Email

SELECT
    email,
    SPLIT_PART(email, '@', 1) AS username
FROM customers;

Pattern 4: Validate Format

-- Find invalid phone numbers (not 10 digits)
SELECT phone
FROM customers
WHERE LENGTH(REGEXP_REPLACE(phone, '[^0-9]', '', 'g')) != 10;

Pattern 5: Search Across Multiple Columns

-- Search in name OR description
SELECT name, description
FROM products
WHERE name ILIKE '%laptop%'
   OR description ILIKE '%laptop%';

-- Better: Use concatenation
SELECT name, description
FROM products
WHERE CONCAT(name, ' ', description) ILIKE '%laptop%';

Full-Text Search (PostgreSQL)

Basic Text Search

-- Convert to tsvector (text search vector)
SELECT
    name,
    to_tsvector('english', description) AS search_vector
FROM products;

-- Search using to_tsquery
SELECT name, description
FROM products
WHERE to_tsvector('english', description) @@ to_tsquery('english', 'laptop & wireless');
-- Matches documents containing both 'laptop' AND 'wireless'

Text Search Operators

-- AND: &
WHERE to_tsvector('english', description) @@ to_tsquery('laptop & mouse');

-- OR: |
WHERE to_tsvector('english', description) @@ to_tsquery('laptop | mouse');

-- NOT: !
WHERE to_tsvector('english', description) @@ to_tsquery('laptop & !expensive');

-- Phrase search
WHERE to_tsvector('english', description) @@ phraseto_tsquery('wireless mouse');

Text Search Ranking

SELECT
    name,
    ts_rank(to_tsvector('english', description), query) AS rank
FROM products,
     to_tsquery('english', 'laptop & wireless') query
WHERE to_tsvector('english', description) @@ query
ORDER BY rank DESC;

Performance Tips

Index for LIKE

-- ❌ Slow: Can't use regular index
SELECT name FROM products
WHERE name LIKE '%laptop%';

-- ✅ Fast: Starts with (uses index)
SELECT name FROM products
WHERE name LIKE 'Laptop%';

CREATE INDEX idx_products_name ON products(name);

Trigram Index for Fuzzy Search (PostgreSQL)

-- Enable pg_trgm extension
CREATE EXTENSION pg_trgm;

-- Create trigram index
CREATE INDEX idx_products_name_trgm ON products USING gin(name gin_trgm_ops);

-- Now LIKE '%pattern%' is fast!
SELECT name FROM products
WHERE name ILIKE '%laptop%';

Full-Text Search Index

-- Add tsvector column
ALTER TABLE products ADD COLUMN search_vector tsvector;

-- Populate search vector
UPDATE products
SET search_vector = to_tsvector('english', COALESCE(name, '') || ' ' || COALESCE(description, ''));

-- Create GIN index
CREATE INDEX idx_products_search ON products USING gin(search_vector);

-- Fast full-text search
SELECT name FROM products
WHERE search_vector @@ to_tsquery('laptop & wireless');

Common Mistakes

Mistake 1: Case-Sensitive Search

-- ❌ Won't match 'Alice'
SELECT name FROM customers
WHERE name LIKE 'alice%';

-- ✅ Case-insensitive
SELECT name FROM customers
WHERE name ILIKE 'alice%';

Mistake 2: NULL Concatenation

-- ❌ Returns NULL if middle_name is NULL
SELECT first_name || ' ' || middle_name || ' ' || last_name
FROM users;

-- ✅ Handle NULLs
SELECT CONCAT_WS(' ', first_name, middle_name, last_name)
FROM users;

Mistake 3: Inefficient Wildcards

-- ❌ Can't use index (starts with %)
WHERE name LIKE '%laptop%'

-- ✅ Uses index (starts with text)
WHERE name LIKE 'laptop%'

Practice Exercises

Exercise 1

Find all customers whose email ends with 'gmail.com'.

Solution

SELECT name, email
FROM customers
WHERE email LIKE '%gmail.com';

-- Or using regex
SELECT name, email
FROM customers
WHERE email ~ '@gmail\.com$';

Exercise 2

Convert all product names to uppercase and remove leading/trailing spaces.

Solution

SELECT
    name AS original,
    UPPER(TRIM(name)) AS cleaned
FROM products;

-- Update in place
UPDATE products
SET name = UPPER(TRIM(name));

Exercise 3

Extract the first word from product names.

Solution

SELECT
    name,
    SPLIT_PART(name, ' ', 1) AS first_word
FROM products;

-- Or using SUBSTRING
SELECT
    name,
    SUBSTRING(name FROM 1 FOR POSITION(' ' IN name) - 1) AS first_word
FROM products;

Exercise 4

Find products with names containing numbers.

Solution

-- Using regex
SELECT name
FROM products
WHERE name ~ '[0-9]';

-- Using LIKE (less precise)
SELECT name
FROM products
WHERE name LIKE '%0%' OR name LIKE '%1%' OR name LIKE '%2%'
   OR name LIKE '%3%' OR name LIKE '%4%' OR name LIKE '%5%'
   OR name LIKE '%6%' OR name LIKE '%7%' OR name LIKE '%8%'
   OR name LIKE '%9%';

Key Takeaways

✅ CONCAT/|| joins strings together
✅ UPPER/LOWER converts case
✅ TRIM/LTRIM/RTRIM removes whitespace
✅ SUBSTRING/LEFT/RIGHT extracts parts of strings
✅ LIKE matches patterns (%, _)
✅ ILIKE is case-insensitive (PostgreSQL)
✅ Regular expressions (~, REGEXP_REPLACE) for advanced patterns
✅ STRING_AGG concatenates rows
✅ Full-text search for complex text queries
✅ Indexes improve LIKE and text search performance

Next Steps

Congratulations! You've completed Module 6 and mastered advanced query techniques including subqueries, CTEs, CASE statements, DISTINCT, LIMIT/pagination, and string functions.

In Module 7, we'll learn Database Design Fundamentals—how to create tables, define constraints, design schemas, and optimize with indexes.

Ready to become a database architect!

String Functions and Pattern Matching

Introduction

Text data is everywhere: names, emails, addresses, descriptions, URLs. SQL provides powerful functions for manipulating and searching strings.

This lesson covers:

String functions (CONCAT, UPPER, LOWER, TRIM, etc.)
Pattern matching with LIKE and ILIKE
Regular expressions
Text search techniques

String Concatenation

CONCAT Function

SELECT CONCAT(first_name, ' ', last_name) AS full_name
FROM users;

Result:

   full_name
----------------
 Alice Smith
 Bob Jones
 Carol White

|| Operator (PostgreSQL, SQLite)

SELECT first_name || ' ' || last_name AS full_name
FROM users;

Handling NULLs

-- ❌ Returns NULL if any part is NULL
SELECT first_name || ' ' || middle_name || ' ' || last_name
FROM users;
-- If middle_name is NULL, entire result is NULL!

-- ✅ Use CONCAT (treats NULL as empty string)
SELECT CONCAT(first_name, ' ', middle_name, ' ', last_name)
FROM users;

-- ✅ Or use COALESCE
SELECT first_name || ' ' || COALESCE(middle_name || ' ', '') || last_name
FROM users;

CONCAT_WS (Concatenate With Separator)

-- Skip NULLs automatically
SELECT CONCAT_WS(' ', first_name, middle_name, last_name) AS full_name
FROM users;
-- NULLs are ignored; no extra spaces

Result:

     full_name
--------------------
 Alice Marie Smith
 Bob Jones          ← No middle name, no extra space
 Carol Ann White

Case Conversion

UPPER and LOWER

SELECT
    name,
    UPPER(name) AS uppercase,
    LOWER(name) AS lowercase
FROM products;

Result:

       name        |   uppercase    |   lowercase
-------------------+----------------+----------------
 Laptop Pro 15     | LAPTOP PRO 15  | laptop pro 15
 Wireless Mouse    | WIRELESS MOUSE | wireless mouse

INITCAP (Title Case)

-- PostgreSQL
SELECT INITCAP('hello world') AS title_case;

Result:

 title_case
------------
 Hello World

Case-Insensitive Comparison

-- ❌ Case-sensitive (in most databases)
SELECT name FROM customers
WHERE name = 'alice smith';
-- Won't match 'Alice Smith'

-- ✅ Case-insensitive
SELECT name FROM customers
WHERE LOWER(name) = LOWER('alice smith');

-- ✅ PostgreSQL: ILIKE
SELECT name FROM customers
WHERE name ILIKE 'alice smith';

String Trimming

TRIM (Remove Leading/Trailing Spaces)

SELECT TRIM('   hello world   ') AS trimmed;

Result:

   trimmed
--------------
 hello world

LTRIM and RTRIM

SELECT
    LTRIM('   hello') AS left_trimmed,   -- 'hello'
    RTRIM('hello   ') AS right_trimmed;  -- 'hello'

Trim Specific Characters

-- Remove leading/trailing dashes
SELECT TRIM('-' FROM '---hello---') AS trimmed;
-- Result: 'hello'

-- Remove multiple characters
SELECT TRIM(BOTH '- ' FROM '--- hello ---') AS trimmed;
-- Result: 'hello'

Substring Extraction

SUBSTRING (or SUBSTR)

-- Extract substring starting at position, for length characters
SELECT SUBSTRING('Hello World', 1, 5) AS substr;
-- Result: 'Hello'

-- PostgreSQL syntax (1-indexed)
SELECT SUBSTRING('Hello World' FROM 7 FOR 5) AS substr;
-- Result: 'World'

-- Extract from position to end
SELECT SUBSTRING('Hello World', 7) AS substr;
-- Result: 'World'

LEFT and RIGHT

SELECT
    LEFT('Hello World', 5) AS left_part,   -- 'Hello'
    RIGHT('Hello World', 5) AS right_part; -- 'World'

Practical Example: Extract Domain from Email

SELECT
    email,
    SUBSTRING(email FROM POSITION('@' IN email) + 1) AS domain
FROM customers;

Result:

        email         |    domain
----------------------+---------------
 alice@example.com    | example.com
 bob@company.org      | company.org

String Length

LENGTH (or CHAR_LENGTH)

SELECT
    name,
    LENGTH(name) AS char_count
FROM products;

Result:

       name        | char_count
-------------------+------------
 Laptop Pro 15     |     13
 Mouse             |      5

OCTET_LENGTH (Byte Length)

-- Important for UTF-8 text
SELECT
    'Hello' AS text,
    LENGTH('Hello') AS char_length,     -- 5
    OCTET_LENGTH('Hello') AS byte_length; -- 5

SELECT
    '你好' AS text,
    LENGTH('你好') AS char_length,         -- 2 characters
    OCTET_LENGTH('你好') AS byte_length;  -- 6 bytes (UTF-8)

String Position and Search

POSITION (or STRPOS)

-- Find position of substring
SELECT POSITION('World' IN 'Hello World') AS pos;
-- Result: 7

-- PostgreSQL: STRPOS
SELECT STRPOS('Hello World', 'World') AS pos;
-- Result: 7

-- Returns 0 if not found
SELECT POSITION('xyz' IN 'Hello World') AS pos;
-- Result: 0

Practical Example: Check Email Validity

SELECT
    email,
    CASE
        WHEN POSITION('@' IN email) > 0
         AND POSITION('.' IN SUBSTRING(email FROM POSITION('@' IN email))) > 0
        THEN 'Valid'
        ELSE 'Invalid'
    END AS email_status
FROM customers;

Pattern Matching with LIKE

LIKE Syntax

SELECT name
FROM products
WHERE name LIKE pattern;

Wildcards:

% matches zero or more characters
_ matches exactly one character

Examples

-- Starts with 'Laptop'
SELECT name FROM products
WHERE name LIKE 'Laptop%';

-- Ends with 'Mouse'
SELECT name FROM products
WHERE name LIKE '%Mouse';

-- Contains 'Pro'
SELECT name FROM products
WHERE name LIKE '%Pro%';

-- Exactly 5 characters
SELECT name FROM products
WHERE name LIKE '_____';

-- Second character is 'a'
SELECT name FROM products
WHERE name LIKE '_a%';

-- Starts with 'A' or 'B' (NOT directly supported, use OR)
SELECT name FROM products
WHERE name LIKE 'A%' OR name LIKE 'B%';

Case-Insensitive LIKE (ILIKE in PostgreSQL)

-- PostgreSQL: ILIKE
SELECT name FROM customers
WHERE name ILIKE 'alice%';
-- Matches 'Alice', 'ALICE', 'alice', etc.

-- Other databases: Use LOWER
SELECT name FROM customers
WHERE LOWER(name) LIKE LOWER('alice%');

Escaping Wildcards

-- Find products with literal '%' in name
SELECT name FROM products
WHERE name LIKE '%\%%' ESCAPE '\';

-- Find products with literal '_'
SELECT name FROM products
WHERE name LIKE '%\_% ESCAPE '\';

NOT LIKE

-- Products NOT starting with 'Laptop'
SELECT name FROM products
WHERE name NOT LIKE 'Laptop%';

Regular Expressions

PostgreSQL: ~ Operator

-- Matches pattern
SELECT name FROM products
WHERE name ~ '^Laptop.*Pro$';
-- ^ = start, .* = any characters, $ = end

-- Case-insensitive: ~*
SELECT name FROM products
WHERE name ~* 'laptop';  -- Matches 'Laptop', 'LAPTOP', etc.

-- Does NOT match: !~
SELECT name FROM products
WHERE name !~ '[0-9]';  -- Products without numbers

REGEXP_MATCHES (PostgreSQL)

-- Extract all numbers from text
SELECT REGEXP_MATCHES('Order #12345 - $99.99', '[0-9]+', 'g') AS numbers;

Result:

  numbers
-----------
 {12345}
 {99}
 {99}

REGEXP_REPLACE

-- Replace all digits with 'X'
SELECT REGEXP_REPLACE('Phone: 555-1234', '[0-9]', 'X', 'g') AS masked;
-- Result: 'Phone: XXX-XXXX'

-- Remove all non-alphanumeric
SELECT REGEXP_REPLACE('Hello, World! 123', '[^a-zA-Z0-9]', '', 'g') AS clean;
-- Result: 'HelloWorld123'

Validate Email with Regex

SELECT
    email,
    email ~ '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$' AS is_valid
FROM customers;

String Replacement

REPLACE

SELECT REPLACE('Hello World', 'World', 'SQL') AS replaced;
-- Result: 'Hello SQL'

-- Remove substring (replace with empty string)
SELECT REPLACE('Hello-World', '-', '') AS cleaned;
-- Result: 'HelloWorld'

Practical Example: Clean Phone Numbers

SELECT
    phone,
    REPLACE(REPLACE(REPLACE(phone, '-', ''), '(', ''), ')', '') AS cleaned
FROM customers;

Result:

     phone      |  cleaned
----------------+------------
 (555) 123-4567 | 5551234567
 555-987-6543   | 5559876543

String Splitting

SPLIT_PART (PostgreSQL)

-- Split string and get Nth part (1-indexed)
SELECT SPLIT_PART('apple,banana,cherry', ',', 2) AS fruit;
-- Result: 'banana'

STRING_TO_ARRAY (PostgreSQL)

SELECT STRING_TO_ARRAY('apple,banana,cherry', ',') AS fruits;
-- Result: {apple,banana,cherry}

Practical Example: Parse CSV Data

SELECT
    SPLIT_PART(csv_row, ',', 1) AS name,
    SPLIT_PART(csv_row, ',', 2) AS email,
    SPLIT_PART(csv_row, ',', 3) AS phone
FROM raw_data;

String Aggregation

STRING_AGG (PostgreSQL)

-- Concatenate values from multiple rows
SELECT
    customer_id,
    STRING_AGG(product_name, ', ' ORDER BY product_name) AS products
FROM order_items
GROUP BY customer_id;

Result:

 customer_id |           products
-------------+--------------------------------
      1      | Laptop Pro 15, Mouse, USB Cable
      2      | Desk, Monitor

Common String Patterns

Pattern 1: Format Full Name

SELECT
    CONCAT_WS(' ', first_name, middle_initial || '.', last_name) AS formatted_name
FROM users;

Pattern 2: Mask Sensitive Data

-- Show last 4 digits of credit card
SELECT
    'XXXX-XXXX-XXXX-' || RIGHT(card_number, 4) AS masked_card
FROM payments;

Pattern 3: Extract Username from Email

SELECT
    email,
    SPLIT_PART(email, '@', 1) AS username
FROM customers;

Pattern 4: Validate Format

-- Find invalid phone numbers (not 10 digits)
SELECT phone
FROM customers
WHERE LENGTH(REGEXP_REPLACE(phone, '[^0-9]', '', 'g')) != 10;

Pattern 5: Search Across Multiple Columns

-- Search in name OR description
SELECT name, description
FROM products
WHERE name ILIKE '%laptop%'
   OR description ILIKE '%laptop%';

-- Better: Use concatenation
SELECT name, description
FROM products
WHERE CONCAT(name, ' ', description) ILIKE '%laptop%';

Full-Text Search (PostgreSQL)

Basic Text Search

-- Convert to tsvector (text search vector)
SELECT
    name,
    to_tsvector('english', description) AS search_vector
FROM products;

-- Search using to_tsquery
SELECT name, description
FROM products
WHERE to_tsvector('english', description) @@ to_tsquery('english', 'laptop & wireless');
-- Matches documents containing both 'laptop' AND 'wireless'

Text Search Operators

-- AND: &
WHERE to_tsvector('english', description) @@ to_tsquery('laptop & mouse');

-- OR: |
WHERE to_tsvector('english', description) @@ to_tsquery('laptop | mouse');

-- NOT: !
WHERE to_tsvector('english', description) @@ to_tsquery('laptop & !expensive');

-- Phrase search
WHERE to_tsvector('english', description) @@ phraseto_tsquery('wireless mouse');

Text Search Ranking

SELECT
    name,
    ts_rank(to_tsvector('english', description), query) AS rank
FROM products,
     to_tsquery('english', 'laptop & wireless') query
WHERE to_tsvector('english', description) @@ query
ORDER BY rank DESC;

Performance Tips

Index for LIKE

-- ❌ Slow: Can't use regular index
SELECT name FROM products
WHERE name LIKE '%laptop%';

-- ✅ Fast: Starts with (uses index)
SELECT name FROM products
WHERE name LIKE 'Laptop%';

CREATE INDEX idx_products_name ON products(name);

Trigram Index for Fuzzy Search (PostgreSQL)

-- Enable pg_trgm extension
CREATE EXTENSION pg_trgm;

-- Create trigram index
CREATE INDEX idx_products_name_trgm ON products USING gin(name gin_trgm_ops);

-- Now LIKE '%pattern%' is fast!
SELECT name FROM products
WHERE name ILIKE '%laptop%';

Full-Text Search Index

-- Add tsvector column
ALTER TABLE products ADD COLUMN search_vector tsvector;

-- Populate search vector
UPDATE products
SET search_vector = to_tsvector('english', COALESCE(name, '') || ' ' || COALESCE(description, ''));

-- Create GIN index
CREATE INDEX idx_products_search ON products USING gin(search_vector);

-- Fast full-text search
SELECT name FROM products
WHERE search_vector @@ to_tsquery('laptop & wireless');

Common Mistakes

Mistake 1: Case-Sensitive Search

-- ❌ Won't match 'Alice'
SELECT name FROM customers
WHERE name LIKE 'alice%';

-- ✅ Case-insensitive
SELECT name FROM customers
WHERE name ILIKE 'alice%';

Mistake 2: NULL Concatenation

-- ❌ Returns NULL if middle_name is NULL
SELECT first_name || ' ' || middle_name || ' ' || last_name
FROM users;

-- ✅ Handle NULLs
SELECT CONCAT_WS(' ', first_name, middle_name, last_name)
FROM users;

Mistake 3: Inefficient Wildcards

-- ❌ Can't use index (starts with %)
WHERE name LIKE '%laptop%'

-- ✅ Uses index (starts with text)
WHERE name LIKE 'laptop%'

Practice Exercises

Exercise 1

Find all customers whose email ends with 'gmail.com'.

Solution

SELECT name, email
FROM customers
WHERE email LIKE '%gmail.com';

-- Or using regex
SELECT name, email
FROM customers
WHERE email ~ '@gmail\.com$';

Exercise 2

Convert all product names to uppercase and remove leading/trailing spaces.

Solution

SELECT
    name AS original,
    UPPER(TRIM(name)) AS cleaned
FROM products;

-- Update in place
UPDATE products
SET name = UPPER(TRIM(name));

Exercise 3

Extract the first word from product names.

Solution

SELECT
    name,
    SPLIT_PART(name, ' ', 1) AS first_word
FROM products;

-- Or using SUBSTRING
SELECT
    name,
    SUBSTRING(name FROM 1 FOR POSITION(' ' IN name) - 1) AS first_word
FROM products;

Exercise 4

Find products with names containing numbers.

Solution

-- Using regex
SELECT name
FROM products
WHERE name ~ '[0-9]';

-- Using LIKE (less precise)
SELECT name
FROM products
WHERE name LIKE '%0%' OR name LIKE '%1%' OR name LIKE '%2%'
   OR name LIKE '%3%' OR name LIKE '%4%' OR name LIKE '%5%'
   OR name LIKE '%6%' OR name LIKE '%7%' OR name LIKE '%8%'
   OR name LIKE '%9%';

Key Takeaways

✅ CONCAT/|| joins strings together
✅ UPPER/LOWER converts case
✅ TRIM/LTRIM/RTRIM removes whitespace
✅ SUBSTRING/LEFT/RIGHT extracts parts of strings
✅ LIKE matches patterns (%, _)
✅ ILIKE is case-insensitive (PostgreSQL)
✅ Regular expressions (~, REGEXP_REPLACE) for advanced patterns
✅ STRING_AGG concatenates rows
✅ Full-text search for complex text queries
✅ Indexes improve LIKE and text search performance

Next Steps

Congratulations! You've completed Module 6 and mastered advanced query techniques including subqueries, CTEs, CASE statements, DISTINCT, LIMIT/pagination, and string functions.

In Module 7, we'll learn Database Design Fundamentals—how to create tables, define constraints, design schemas, and optimize with indexes.

Ready to become a database architect!

String Functions and Pattern Matching

Introduction

String Concatenation

CONCAT Function

|| Operator (PostgreSQL, SQLite)

Handling NULLs

CONCAT_WS (Concatenate With Separator)

Case Conversion

UPPER and LOWER

INITCAP (Title Case)

Case-Insensitive Comparison

String Trimming

TRIM (Remove Leading/Trailing Spaces)

LTRIM and RTRIM

Trim Specific Characters

Substring Extraction

SUBSTRING (or SUBSTR)

LEFT and RIGHT

Practical Example: Extract Domain from Email

String Length

LENGTH (or CHAR_LENGTH)

OCTET_LENGTH (Byte Length)

String Position and Search

POSITION (or STRPOS)

Practical Example: Check Email Validity

Pattern Matching with LIKE

LIKE Syntax

Examples

Case-Insensitive LIKE (ILIKE in PostgreSQL)

Escaping Wildcards

NOT LIKE

Regular Expressions

PostgreSQL: ~ Operator

REGEXP_MATCHES (PostgreSQL)

REGEXP_REPLACE

Validate Email with Regex

String Replacement

REPLACE

Practical Example: Clean Phone Numbers

String Splitting

SPLIT_PART (PostgreSQL)

STRING_TO_ARRAY (PostgreSQL)

Practical Example: Parse CSV Data

String Aggregation

STRING_AGG (PostgreSQL)

Common String Patterns

Pattern 1: Format Full Name

Pattern 2: Mask Sensitive Data

Pattern 3: Extract Username from Email

Pattern 4: Validate Format

Pattern 5: Search Across Multiple Columns

Full-Text Search (PostgreSQL)

Basic Text Search

Text Search Operators

Text Search Ranking

Performance Tips

Index for LIKE

Trigram Index for Fuzzy Search (PostgreSQL)

Full-Text Search Index

Common Mistakes

Mistake 1: Case-Sensitive Search

Mistake 2: NULL Concatenation

Mistake 3: Inefficient Wildcards

Practice Exercises

Exercise 1

Exercise 2

Exercise 3

Exercise 4

Key Takeaways

Next Steps

Quiz

String Functions and Pattern Matching

Introduction

String Concatenation

CONCAT Function

|| Operator (PostgreSQL, SQLite)

Handling NULLs

CONCAT_WS (Concatenate With Separator)

Case Conversion

UPPER and LOWER