Best Practices for Data Manipulation
Introduction
You've learned INSERT, UPDATE, and DELETE—powerful commands that modify your database. But with great power comes great responsibility! In this lesson, we'll cover best practices to ensure your data operations are safe, efficient, and maintainable.
Always Use Transactions for Critical Operations
What are Transactions?
A transaction is a group of SQL statements that execute as a single unit—either all succeed or all fail.
Basic Transaction Syntax
BEGIN; -- Start transaction
INSERT INTO accounts (name, balance) VALUES ('Alice', 1000);
UPDATE accounts SET balance = balance - 100 WHERE name = 'Alice';
INSERT INTO transactions (account, amount) VALUES ('Alice', -100);
COMMIT; -- Save changes
-- or ROLLBACK; to undo
The ACID Properties
Atomicity: All or nothing
BEGIN;
UPDATE accounts SET balance = balance - 500 WHERE id = 1; -- Debit
UPDATE accounts SET balance = balance + 500 WHERE id = 2; -- Credit
COMMIT; -- Both happen or neither happens
Consistency: Data remains valid
-- If this violates a constraint, entire transaction rolls back
BEGIN;
INSERT INTO orders (customer_id, total) VALUES (999, 100); -- Invalid customer
COMMIT; -- Will fail and rollback
Isolation: Concurrent transactions don't interfere
-- Two users updating the same row
-- Transaction 1
BEGIN;
UPDATE inventory SET stock = stock - 1 WHERE product_id = 5;
COMMIT;
-- Transaction 2 (concurrent)
BEGIN;
UPDATE inventory SET stock = stock - 1 WHERE product_id = 5;
COMMIT;
-- Both updates apply correctly
Durability: Committed data survives crashes
COMMIT; -- After this, data is permanently saved
-- Even if server crashes 1 second later, data is safe
When to Use Transactions
Always use for:
- Financial operations (transfers, payments)
- Multi-table updates
- Critical business logic
- Batch operations
Example: Money Transfer
BEGIN;
-- Check balance
SELECT balance FROM accounts WHERE id = 1;
-- Debit sender
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
-- Credit receiver
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
-- Log transaction
INSERT INTO transaction_log (from_account, to_account, amount)
VALUES (1, 2, 100);
COMMIT;
If any step fails, ROLLBACK ensures no partial transfer!
Test with SELECT Before Modifying
The Golden Rule
Before UPDATE or DELETE, always run a SELECT with the same WHERE clause:
-- 1. Test what will be affected
SELECT * FROM users WHERE last_login < '2023-01-01';
-- Review results: "Okay, 347 users will be affected"
-- 2. If comfortable, UPDATE
UPDATE users SET active = false WHERE last_login < '2023-01-01';
-- 3. Verify
SELECT COUNT(*) FROM users WHERE active = false;
Use RETURNING for Verification
DELETE FROM spam_accounts
WHERE created_at < CURRENT_DATE - INTERVAL '7 days'
RETURNING id, email, created_at;
-- See exactly what was deleted
Use Explicit Column Names
INSERT: Always Specify Columns
-- ❌ Fragile
INSERT INTO users VALUES (1, 'Alice', 'alice@example.com', 28);
-- Breaks if table structure changes
-- ✅ Robust
INSERT INTO users (id, name, email, age)
VALUES (1, 'Alice', 'alice@example.com', 28);
-- Clear and maintainable
SELECT: Be Specific When Possible
-- ❌ Gets all columns (wasteful)
SELECT * FROM large_table;
-- ✅ Gets only needed columns (efficient)
SELECT id, name, email FROM large_table;
When * is okay:
- Quick exploration/debugging
- You genuinely need all columns
- Small tables
Validate Input Data
Use CHECK Constraints
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL,
price DECIMAL(10,2) CHECK (price > 0),
stock INTEGER CHECK (stock >= 0),
rating DECIMAL(3,2) CHECK (rating BETWEEN 0 AND 5)
);
-- ❌ Error: price must be > 0
INSERT INTO products (name, price, stock)
VALUES ('Widget', -5.00, 10);
Use NOT NULL for Required Fields
CREATE TABLE users (
id SERIAL PRIMARY KEY,
email VARCHAR(255) NOT NULL, -- Required
name VARCHAR(100) NOT NULL, -- Required
bio TEXT -- Optional
);
Use UNIQUE for Unique Values
CREATE TABLE users (
id SERIAL PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL,
username VARCHAR(50) UNIQUE NOT NULL
);
Use Foreign Keys for Referential Integrity
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer_id INTEGER NOT NULL REFERENCES customers(id),
product_id INTEGER NOT NULL REFERENCES products(id)
);
-- ❌ Error: customer doesn't exist
INSERT INTO orders (customer_id, product_id)
VALUES (9999, 1);
Batch Operations for Performance
Insert Multiple Rows at Once
-- ❌ Slow: 1000 separate INSERT statements
INSERT INTO logs (message) VALUES ('Log 1');
INSERT INTO logs (message) VALUES ('Log 2');
-- ... 998 more
-- ✅ Fast: Single INSERT with multiple values
INSERT INTO logs (message)
VALUES
('Log 1'),
('Log 2'),
('Log 3'),
-- ... up to 1000
('Log 1000');
Performance difference: 10-100x faster for bulk inserts!
Update in Batches for Large Tables
-- ❌ Locks table for too long
UPDATE huge_table SET status = 'archived'
WHERE created_at < '2020-01-01'; -- 10 million rows!
-- ✅ Update in batches
DO $$
DECLARE
batch_size INTEGER := 10000;
updated INTEGER;
BEGIN
LOOP
UPDATE huge_table SET status = 'archived'
WHERE id IN (
SELECT id FROM huge_table
WHERE created_at < '2020-01-01'
AND status != 'archived'
LIMIT batch_size
);
GET DIAGNOSTICS updated = ROW_COUNT;
EXIT WHEN updated = 0;
RAISE NOTICE 'Updated % rows', updated;
COMMIT; -- Commit each batch
END LOOP;
END $$;
Use Appropriate Data Types
Don't Use TEXT for Everything
-- ❌ Inefficient
CREATE TABLE users (
id TEXT,
email TEXT,
age TEXT,
created_at TEXT
);
-- ✅ Appropriate types
CREATE TABLE users (
id SERIAL PRIMARY KEY,
email VARCHAR(255),
age INTEGER,
created_at TIMESTAMPTZ
);
Use DECIMAL for Money
-- ❌ Rounding errors
CREATE TABLE orders (
total REAL -- 19.99 might become 19.989999
);
-- ✅ Exact precision
CREATE TABLE orders (
total DECIMAL(10,2)
);
Handle NULL Values Explicitly
Provide Defaults
CREATE TABLE tasks (
id SERIAL PRIMARY KEY,
description TEXT NOT NULL,
priority INTEGER DEFAULT 3, -- Default if not specified
completed BOOLEAN DEFAULT FALSE,
created_at TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP
);
Use COALESCE in Queries
-- Display "N/A" for missing emails
SELECT
name,
COALESCE(email, 'N/A') AS email
FROM users;
Use Indexes for Frequently Queried Columns
-- Columns used in WHERE, JOIN, ORDER BY
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
CREATE INDEX idx_posts_created_at ON posts(created_at);
-- Now these are faster:
SELECT * FROM users WHERE email = 'test@example.com';
SELECT * FROM orders WHERE customer_id = 123;
SELECT * FROM posts ORDER BY created_at DESC LIMIT 10;
(We'll cover indexes in detail in Module 7)
Avoid N+1 Query Problems
The Problem
-- ❌ Inefficient: 1 query + N queries
SELECT * FROM customers; -- Returns 100 customers
-- Then in application code:
FOR EACH customer:
SELECT * FROM orders WHERE customer_id = customer.id;
-- Total: 101 queries!
The Solution: JOINs
-- ✅ Efficient: 1 query
SELECT
c.name,
COUNT(o.id) AS order_count
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
GROUP BY c.id, c.name;
-- Total: 1 query!
(We'll cover JOINs in Module 4)
Comment Your Complex Queries
/*
* Archive old orders
* - Moves orders older than 2 years to archive table
* - Preserves foreign key relationships
* - Runs in batches to avoid lock timeouts
*/
BEGIN;
INSERT INTO orders_archive
SELECT * FROM orders
WHERE created_at < CURRENT_DATE - INTERVAL '2 years'
LIMIT 10000;
DELETE FROM orders
WHERE id IN (SELECT id FROM orders_archive);
COMMIT;
Use Descriptive Names
Table Names
-- ❌ Unclear
CREATE TABLE usr (id SERIAL, n VARCHAR(100));
-- ✅ Clear
CREATE TABLE users (id SERIAL, name VARCHAR(100));
Column Names
-- ❌ Cryptic
CREATE TABLE orders (
id SERIAL,
cid INTEGER, -- customer ID?
dt DATE, -- date?
amt DECIMAL -- amount?
);
-- ✅ Descriptive
CREATE TABLE orders (
id SERIAL,
customer_id INTEGER,
order_date DATE,
total_amount DECIMAL(10,2)
);
Soft Deletes for Important Data
Instead of DELETE, mark as deleted:
CREATE TABLE users (
id SERIAL PRIMARY KEY,
email VARCHAR(255),
deleted_at TIMESTAMPTZ -- NULL = active
);
-- "Delete" user
UPDATE users SET deleted_at = CURRENT_TIMESTAMP WHERE id = 5;
-- Query active users
SELECT * FROM users WHERE deleted_at IS NULL;
-- Restore user
UPDATE users SET deleted_at = NULL WHERE id = 5;
Benefits:
- Data recovery
- Audit trail
- No foreign key issues
Regular Backups
# Backup database
pg_dump -U postgres -d sql_basics > backup.sql
# Restore database
psql -U postgres -d sql_basics < backup.sql
Automate backups:
- Daily automated backups
- Store off-site (cloud storage)
- Test restores regularly
Security: Prevent SQL Injection
Never Concatenate User Input
# ❌ DANGEROUS: SQL injection vulnerability
user_input = "'; DELETE FROM users; --"
query = "SELECT * FROM users WHERE name = '" + user_input + "'"
# Results in: SELECT * FROM users WHERE name = ''; DELETE FROM users; --'
Always Use Parameterized Queries
# ✅ SAFE: Uses parameters
user_input = "Alice"
cursor.execute(
"SELECT * FROM users WHERE name = %s",
(user_input,)
)
// Node.js example (safe)
const { rows } = await pool.query(
'SELECT * FROM users WHERE name = $1',
[userInput]
);
Monitor Query Performance
-- PostgreSQL: EXPLAIN shows query plan
EXPLAIN ANALYZE
SELECT * FROM users WHERE email = 'test@example.com';
Look for:
- Sequential scans (add indexes if frequent)
- High execution time (optimize query)
- Large row counts (add WHERE filters)
Summary Checklist
Before modifying data:
- Use transaction for multi-step operations
- Test with SELECT first
- Verify WHERE clause is correct
- Check affected row count makes sense
- Have a backup (for production)
- Use RETURNING to see changes
- Consider soft delete instead of hard delete
For table design:
- Use appropriate data types
- Add NOT NULL for required fields
- Add UNIQUE for unique values
- Add CHECK constraints for validation
- Add foreign keys for relationships
- Add indexes for frequently queried columns
- Provide sensible defaults
For queries:
- Specify columns explicitly
- Use parameterized queries (prevent SQL injection)
- Use JOINs to avoid N+1 problems
- Comment complex logic
- Use descriptive names
Key Takeaways
- ✅ Use transactions for critical operations
- ✅ Test with SELECT before UPDATE/DELETE
- ✅ Specify columns explicitly
- ✅ Validate data with constraints
- ✅ Batch operations for performance
- ✅ Use appropriate data types
- ✅ Handle NULL values explicitly
- ✅ Add indexes for frequently queried columns
- ✅ Soft delete important data
- ✅ Regular backups are essential
Next Steps
Congratulations! You've completed Module 3 and learned how to safely and efficiently manipulate data. In Module 4, we'll learn Querying Multiple Tables with JOINs—one of SQL's most powerful features for combining related data.
Get ready to level up your SQL skills!

