String Functions and Pattern Matching
Introduction
Text data is everywhere: names, emails, addresses, descriptions, URLs. SQL provides powerful functions for manipulating and searching strings.
This lesson covers:
- String functions (CONCAT, UPPER, LOWER, TRIM, etc.)
- Pattern matching with LIKE and ILIKE
- Regular expressions
- Text search techniques
String Concatenation
CONCAT Function
SELECT CONCAT(first_name, ' ', last_name) AS full_name
FROM users;
Result:
full_name
----------------
Alice Smith
Bob Jones
Carol White
|| Operator (PostgreSQL, SQLite)
SELECT first_name || ' ' || last_name AS full_name
FROM users;
Handling NULLs
-- ❌ Returns NULL if any part is NULL
SELECT first_name || ' ' || middle_name || ' ' || last_name
FROM users;
-- If middle_name is NULL, entire result is NULL!
-- ✅ Use CONCAT (treats NULL as empty string)
SELECT CONCAT(first_name, ' ', middle_name, ' ', last_name)
FROM users;
-- ✅ Or use COALESCE
SELECT first_name || ' ' || COALESCE(middle_name || ' ', '') || last_name
FROM users;
CONCAT_WS (Concatenate With Separator)
-- Skip NULLs automatically
SELECT CONCAT_WS(' ', first_name, middle_name, last_name) AS full_name
FROM users;
-- NULLs are ignored; no extra spaces
Result:
full_name
--------------------
Alice Marie Smith
Bob Jones ← No middle name, no extra space
Carol Ann White
Case Conversion
UPPER and LOWER
SELECT
name,
UPPER(name) AS uppercase,
LOWER(name) AS lowercase
FROM products;
Result:
name | uppercase | lowercase
-------------------+----------------+----------------
Laptop Pro 15 | LAPTOP PRO 15 | laptop pro 15
Wireless Mouse | WIRELESS MOUSE | wireless mouse
INITCAP (Title Case)
-- PostgreSQL
SELECT INITCAP('hello world') AS title_case;
Result:
title_case
------------
Hello World
Case-Insensitive Comparison
-- ❌ Case-sensitive (in most databases)
SELECT name FROM customers
WHERE name = 'alice smith';
-- Won't match 'Alice Smith'
-- ✅ Case-insensitive
SELECT name FROM customers
WHERE LOWER(name) = LOWER('alice smith');
-- ✅ PostgreSQL: ILIKE
SELECT name FROM customers
WHERE name ILIKE 'alice smith';
String Trimming
TRIM (Remove Leading/Trailing Spaces)
SELECT TRIM(' hello world ') AS trimmed;
Result:
trimmed
--------------
hello world
LTRIM and RTRIM
SELECT
LTRIM(' hello') AS left_trimmed, -- 'hello'
RTRIM('hello ') AS right_trimmed; -- 'hello'
Trim Specific Characters
-- Remove leading/trailing dashes
SELECT TRIM('-' FROM '---hello---') AS trimmed;
-- Result: 'hello'
-- Remove multiple characters
SELECT TRIM(BOTH '- ' FROM '--- hello ---') AS trimmed;
-- Result: 'hello'
Substring Extraction
SUBSTRING (or SUBSTR)
-- Extract substring starting at position, for length characters
SELECT SUBSTRING('Hello World', 1, 5) AS substr;
-- Result: 'Hello'
-- PostgreSQL syntax (1-indexed)
SELECT SUBSTRING('Hello World' FROM 7 FOR 5) AS substr;
-- Result: 'World'
-- Extract from position to end
SELECT SUBSTRING('Hello World', 7) AS substr;
-- Result: 'World'
LEFT and RIGHT
SELECT
LEFT('Hello World', 5) AS left_part, -- 'Hello'
RIGHT('Hello World', 5) AS right_part; -- 'World'
Practical Example: Extract Domain from Email
SELECT
email,
SUBSTRING(email FROM POSITION('@' IN email) + 1) AS domain
FROM customers;
Result:
email | domain
----------------------+---------------
alice@example.com | example.com
bob@company.org | company.org
String Length
LENGTH (or CHAR_LENGTH)
SELECT
name,
LENGTH(name) AS char_count
FROM products;
Result:
name | char_count
-------------------+------------
Laptop Pro 15 | 13
Mouse | 5
OCTET_LENGTH (Byte Length)
-- Important for UTF-8 text
SELECT
'Hello' AS text,
LENGTH('Hello') AS char_length, -- 5
OCTET_LENGTH('Hello') AS byte_length; -- 5
SELECT
'你好' AS text,
LENGTH('你好') AS char_length, -- 2 characters
OCTET_LENGTH('你好') AS byte_length; -- 6 bytes (UTF-8)
String Position and Search
POSITION (or STRPOS)
-- Find position of substring
SELECT POSITION('World' IN 'Hello World') AS pos;
-- Result: 7
-- PostgreSQL: STRPOS
SELECT STRPOS('Hello World', 'World') AS pos;
-- Result: 7
-- Returns 0 if not found
SELECT POSITION('xyz' IN 'Hello World') AS pos;
-- Result: 0
Practical Example: Check Email Validity
SELECT
email,
CASE
WHEN POSITION('@' IN email) > 0
AND POSITION('.' IN SUBSTRING(email FROM POSITION('@' IN email))) > 0
THEN 'Valid'
ELSE 'Invalid'
END AS email_status
FROM customers;
Pattern Matching with LIKE
LIKE Syntax
SELECT name
FROM products
WHERE name LIKE pattern;
Wildcards:
%matches zero or more characters_matches exactly one character
Examples
-- Starts with 'Laptop'
SELECT name FROM products
WHERE name LIKE 'Laptop%';
-- Ends with 'Mouse'
SELECT name FROM products
WHERE name LIKE '%Mouse';
-- Contains 'Pro'
SELECT name FROM products
WHERE name LIKE '%Pro%';
-- Exactly 5 characters
SELECT name FROM products
WHERE name LIKE '_____';
-- Second character is 'a'
SELECT name FROM products
WHERE name LIKE '_a%';
-- Starts with 'A' or 'B' (NOT directly supported, use OR)
SELECT name FROM products
WHERE name LIKE 'A%' OR name LIKE 'B%';
Case-Insensitive LIKE (ILIKE in PostgreSQL)
-- PostgreSQL: ILIKE
SELECT name FROM customers
WHERE name ILIKE 'alice%';
-- Matches 'Alice', 'ALICE', 'alice', etc.
-- Other databases: Use LOWER
SELECT name FROM customers
WHERE LOWER(name) LIKE LOWER('alice%');
Escaping Wildcards
-- Find products with literal '%' in name
SELECT name FROM products
WHERE name LIKE '%\%%' ESCAPE '\';
-- Find products with literal '_'
SELECT name FROM products
WHERE name LIKE '%\_% ESCAPE '\';
NOT LIKE
-- Products NOT starting with 'Laptop'
SELECT name FROM products
WHERE name NOT LIKE 'Laptop%';
Regular Expressions
PostgreSQL: ~ Operator
-- Matches pattern
SELECT name FROM products
WHERE name ~ '^Laptop.*Pro$';
-- ^ = start, .* = any characters, $ = end
-- Case-insensitive: ~*
SELECT name FROM products
WHERE name ~* 'laptop'; -- Matches 'Laptop', 'LAPTOP', etc.
-- Does NOT match: !~
SELECT name FROM products
WHERE name !~ '[0-9]'; -- Products without numbers
REGEXP_MATCHES (PostgreSQL)
-- Extract all numbers from text
SELECT REGEXP_MATCHES('Order #12345 - $99.99', '[0-9]+', 'g') AS numbers;
Result:
numbers
-----------
{12345}
{99}
{99}
REGEXP_REPLACE
-- Replace all digits with 'X'
SELECT REGEXP_REPLACE('Phone: 555-1234', '[0-9]', 'X', 'g') AS masked;
-- Result: 'Phone: XXX-XXXX'
-- Remove all non-alphanumeric
SELECT REGEXP_REPLACE('Hello, World! 123', '[^a-zA-Z0-9]', '', 'g') AS clean;
-- Result: 'HelloWorld123'
Validate Email with Regex
SELECT
email,
email ~ '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$' AS is_valid
FROM customers;
String Replacement
REPLACE
SELECT REPLACE('Hello World', 'World', 'SQL') AS replaced;
-- Result: 'Hello SQL'
-- Remove substring (replace with empty string)
SELECT REPLACE('Hello-World', '-', '') AS cleaned;
-- Result: 'HelloWorld'
Practical Example: Clean Phone Numbers
SELECT
phone,
REPLACE(REPLACE(REPLACE(phone, '-', ''), '(', ''), ')', '') AS cleaned
FROM customers;
Result:
phone | cleaned
----------------+------------
(555) 123-4567 | 5551234567
555-987-6543 | 5559876543
String Splitting
SPLIT_PART (PostgreSQL)
-- Split string and get Nth part (1-indexed)
SELECT SPLIT_PART('apple,banana,cherry', ',', 2) AS fruit;
-- Result: 'banana'
STRING_TO_ARRAY (PostgreSQL)
SELECT STRING_TO_ARRAY('apple,banana,cherry', ',') AS fruits;
-- Result: {apple,banana,cherry}
Practical Example: Parse CSV Data
SELECT
SPLIT_PART(csv_row, ',', 1) AS name,
SPLIT_PART(csv_row, ',', 2) AS email,
SPLIT_PART(csv_row, ',', 3) AS phone
FROM raw_data;
String Aggregation
STRING_AGG (PostgreSQL)
-- Concatenate values from multiple rows
SELECT
customer_id,
STRING_AGG(product_name, ', ' ORDER BY product_name) AS products
FROM order_items
GROUP BY customer_id;
Result:
customer_id | products
-------------+--------------------------------
1 | Laptop Pro 15, Mouse, USB Cable
2 | Desk, Monitor
Common String Patterns
Pattern 1: Format Full Name
SELECT
CONCAT_WS(' ', first_name, middle_initial || '.', last_name) AS formatted_name
FROM users;
Pattern 2: Mask Sensitive Data
-- Show last 4 digits of credit card
SELECT
'XXXX-XXXX-XXXX-' || RIGHT(card_number, 4) AS masked_card
FROM payments;
Pattern 3: Extract Username from Email
SELECT
email,
SPLIT_PART(email, '@', 1) AS username
FROM customers;
Pattern 4: Validate Format
-- Find invalid phone numbers (not 10 digits)
SELECT phone
FROM customers
WHERE LENGTH(REGEXP_REPLACE(phone, '[^0-9]', '', 'g')) != 10;
Pattern 5: Search Across Multiple Columns
-- Search in name OR description
SELECT name, description
FROM products
WHERE name ILIKE '%laptop%'
OR description ILIKE '%laptop%';
-- Better: Use concatenation
SELECT name, description
FROM products
WHERE CONCAT(name, ' ', description) ILIKE '%laptop%';
Full-Text Search (PostgreSQL)
Basic Text Search
-- Convert to tsvector (text search vector)
SELECT
name,
to_tsvector('english', description) AS search_vector
FROM products;
-- Search using to_tsquery
SELECT name, description
FROM products
WHERE to_tsvector('english', description) @@ to_tsquery('english', 'laptop & wireless');
-- Matches documents containing both 'laptop' AND 'wireless'
Text Search Operators
-- AND: &
WHERE to_tsvector('english', description) @@ to_tsquery('laptop & mouse');
-- OR: |
WHERE to_tsvector('english', description) @@ to_tsquery('laptop | mouse');
-- NOT: !
WHERE to_tsvector('english', description) @@ to_tsquery('laptop & !expensive');
-- Phrase search
WHERE to_tsvector('english', description) @@ phraseto_tsquery('wireless mouse');
Text Search Ranking
SELECT
name,
ts_rank(to_tsvector('english', description), query) AS rank
FROM products,
to_tsquery('english', 'laptop & wireless') query
WHERE to_tsvector('english', description) @@ query
ORDER BY rank DESC;
Performance Tips
Index for LIKE
-- ❌ Slow: Can't use regular index
SELECT name FROM products
WHERE name LIKE '%laptop%';
-- ✅ Fast: Starts with (uses index)
SELECT name FROM products
WHERE name LIKE 'Laptop%';
CREATE INDEX idx_products_name ON products(name);
Trigram Index for Fuzzy Search (PostgreSQL)
-- Enable pg_trgm extension
CREATE EXTENSION pg_trgm;
-- Create trigram index
CREATE INDEX idx_products_name_trgm ON products USING gin(name gin_trgm_ops);
-- Now LIKE '%pattern%' is fast!
SELECT name FROM products
WHERE name ILIKE '%laptop%';
Full-Text Search Index
-- Add tsvector column
ALTER TABLE products ADD COLUMN search_vector tsvector;
-- Populate search vector
UPDATE products
SET search_vector = to_tsvector('english', COALESCE(name, '') || ' ' || COALESCE(description, ''));
-- Create GIN index
CREATE INDEX idx_products_search ON products USING gin(search_vector);
-- Fast full-text search
SELECT name FROM products
WHERE search_vector @@ to_tsquery('laptop & wireless');
Common Mistakes
Mistake 1: Case-Sensitive Search
-- ❌ Won't match 'Alice'
SELECT name FROM customers
WHERE name LIKE 'alice%';
-- ✅ Case-insensitive
SELECT name FROM customers
WHERE name ILIKE 'alice%';
Mistake 2: NULL Concatenation
-- ❌ Returns NULL if middle_name is NULL
SELECT first_name || ' ' || middle_name || ' ' || last_name
FROM users;
-- ✅ Handle NULLs
SELECT CONCAT_WS(' ', first_name, middle_name, last_name)
FROM users;
Mistake 3: Inefficient Wildcards
-- ❌ Can't use index (starts with %)
WHERE name LIKE '%laptop%'
-- ✅ Uses index (starts with text)
WHERE name LIKE 'laptop%'
Practice Exercises
Exercise 1
Find all customers whose email ends with 'gmail.com'.
Solution
SELECT name, email
FROM customers
WHERE email LIKE '%gmail.com';
-- Or using regex
SELECT name, email
FROM customers
WHERE email ~ '@gmail\.com$';
Exercise 2
Convert all product names to uppercase and remove leading/trailing spaces.
Solution
SELECT
name AS original,
UPPER(TRIM(name)) AS cleaned
FROM products;
-- Update in place
UPDATE products
SET name = UPPER(TRIM(name));
Exercise 3
Extract the first word from product names.
Solution
SELECT
name,
SPLIT_PART(name, ' ', 1) AS first_word
FROM products;
-- Or using SUBSTRING
SELECT
name,
SUBSTRING(name FROM 1 FOR POSITION(' ' IN name) - 1) AS first_word
FROM products;
Exercise 4
Find products with names containing numbers.
Solution
-- Using regex
SELECT name
FROM products
WHERE name ~ '[0-9]';
-- Using LIKE (less precise)
SELECT name
FROM products
WHERE name LIKE '%0%' OR name LIKE '%1%' OR name LIKE '%2%'
OR name LIKE '%3%' OR name LIKE '%4%' OR name LIKE '%5%'
OR name LIKE '%6%' OR name LIKE '%7%' OR name LIKE '%8%'
OR name LIKE '%9%';
Key Takeaways
- ✅ CONCAT/|| joins strings together
- ✅ UPPER/LOWER converts case
- ✅ TRIM/LTRIM/RTRIM removes whitespace
- ✅ SUBSTRING/LEFT/RIGHT extracts parts of strings
- ✅ LIKE matches patterns (%, _)
- ✅ ILIKE is case-insensitive (PostgreSQL)
- ✅ Regular expressions (~, REGEXP_REPLACE) for advanced patterns
- ✅ STRING_AGG concatenates rows
- ✅ Full-text search for complex text queries
- ✅ Indexes improve LIKE and text search performance
Next Steps
Congratulations! You've completed Module 6 and mastered advanced query techniques including subqueries, CTEs, CASE statements, DISTINCT, LIMIT/pagination, and string functions.
In Module 7, we'll learn Database Design Fundamentals—how to create tables, define constraints, design schemas, and optimize with indexes.
Ready to become a database architect!

