Data Persistence Strategies
Choosing the right data persistence strategy is crucial for containerized applications. This lesson covers when to use each approach and best practices for managing persistent data.
The Ephemeral Nature of Containers
By default, container filesystems are temporary:
# Write data to container
docker run --name test alpine sh -c "echo 'important data' > /data.txt"
# Data exists in container
docker exec test cat /data.txt
# Output: important data
# Remove container
docker rm test
# Data is gone!
docker run --name test2 alpine cat /data.txt
# cat: can't open '/data.txt': No such file or directory
Choosing a Storage Type
┌─────────────────────────────────────────────────────────────────┐
│ Data Persistence Decision Tree │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Need to persist data across container restarts? │
│ └── No → Container filesystem (default) │
│ └── Yes ↓ │
│ │
│ Is this development or production? │
│ └── Development → Bind mounts (edit on host) │
│ └── Production ↓ │
│ │
│ Is data sensitive and should only exist in memory? │
│ └── Yes → tmpfs mount │
│ └── No → Named volume │
│ │
└─────────────────────────────────────────────────────────────────┘
Storage Types Comparison
| Type | Persistence | Performance | Portability | Use Case |
|---|---|---|---|---|
| Container FS | None | Good | High | Temp files |
| Volume | Yes | Best | High | Databases, uploads |
| Bind Mount | Yes | Good | Low | Development |
| tmpfs | Session | Fastest | N/A | Secrets, caches |
Database Storage Patterns
PostgreSQL
# Create dedicated volume
docker volume create postgres-data
# Run with volume
docker run -d \
--name postgres \
-e POSTGRES_PASSWORD=secret \
-e POSTGRES_DB=myapp \
-v postgres-data:/var/lib/postgresql/data \
-p 5432:5432 \
postgres:15
# Data survives container recreation
docker rm -f postgres
docker run -d \
--name postgres \
-e POSTGRES_PASSWORD=secret \
-v postgres-data:/var/lib/postgresql/data \
-p 5432:5432 \
postgres:15
# All data is intact!
MongoDB
docker volume create mongo-data
docker run -d \
--name mongo \
-v mongo-data:/data/db \
-p 27017:27017 \
mongo:7
Redis with Persistence
docker volume create redis-data
docker run -d \
--name redis \
-v redis-data:/data \
redis:alpine redis-server --appendonly yes
Application File Storage
User Uploads
docker volume create user-uploads
docker run -d \
--name api \
-v user-uploads:/app/uploads \
-p 3000:3000 \
myapi
Static Assets
# For build-time assets, include in image
# For dynamic assets, use volumes
docker volume create static-assets
docker run -d \
--name cdn \
-v static-assets:/app/public/uploads \
mynginx
tmpfs Mounts
Store data in memory only (never written to disk):
# Using --tmpfs
docker run -d \
--tmpfs /app/cache \
myapp
# Using --mount (more options)
docker run -d \
--mount type=tmpfs,target=/app/cache,tmpfs-size=100m \
myapp
Use Cases for tmpfs
# Session storage (no persistence needed)
docker run -d \
--tmpfs /app/sessions \
mywebapp
# Temporary secrets
docker run -d \
--tmpfs /run/secrets \
myapp
# High-performance cache
docker run -d \
--mount type=tmpfs,target=/app/cache,tmpfs-size=256m \
myapp
Multi-Container Data Sharing
Shared Volume Pattern
# Create shared volume
docker volume create shared-data
# Web server writes logs
docker run -d \
--name web \
-v shared-data:/var/log/app \
webapp
# Log processor reads logs
docker run -d \
--name log-processor \
-v shared-data:/logs:ro \
logprocessor
Sidecar Pattern
# Application container
docker run -d \
--name app \
-v app-logs:/app/logs \
myapp
# Logging sidecar
docker run -d \
--name logger \
-v app-logs:/logs:ro \
fluent-bit
# Backup sidecar
docker run -d \
--name backup \
-v app-logs:/data:ro \
-v /backup:/backup \
backup-agent
Backup Strategies
Volume Backup
# Stop container to ensure consistency
docker stop mydb
# Backup volume to tar file
docker run --rm \
-v mydb-data:/source:ro \
-v $(pwd)/backups:/backup \
alpine tar czf /backup/mydb-$(date +%Y%m%d).tar.gz -C /source .
# Restart container
docker start mydb
Automated Backup Script
#!/bin/bash
# backup-volumes.sh
BACKUP_DIR="/backups"
DATE=$(date +%Y%m%d-%H%M%S)
# List of volumes to backup
VOLUMES="postgres-data redis-data uploads"
for vol in $VOLUMES; do
echo "Backing up $vol..."
docker run --rm \
-v $vol:/source:ro \
-v $BACKUP_DIR:/backup \
alpine tar czf /backup/${vol}-${DATE}.tar.gz -C /source .
done
# Cleanup old backups (keep last 7 days)
find $BACKUP_DIR -name "*.tar.gz" -mtime +7 -delete
Restore from Backup
# Create new volume
docker volume create mydb-data-restored
# Restore from backup
docker run --rm \
-v mydb-data-restored:/target \
-v $(pwd)/backups:/backup:ro \
alpine tar xzf /backup/mydb-20240115.tar.gz -C /target
# Use restored volume
docker run -d \
--name mydb \
-v mydb-data-restored:/var/lib/postgresql/data \
postgres:15
Migration Strategies
Migrate Volume Between Hosts
# On source host: create backup
docker run --rm \
-v mydata:/source:ro \
-v $(pwd):/backup \
alpine tar czf /backup/mydata.tar.gz -C /source .
# Transfer to destination
scp mydata.tar.gz user@newhost:/backups/
# On destination host: restore
docker volume create mydata
docker run --rm \
-v mydata:/target \
-v /backups:/backup:ro \
alpine tar xzf /backup/mydata.tar.gz -C /target
Using rsync for Large Volumes
# Sync volume to remote host
docker run --rm \
-v mydata:/source:ro \
-v ~/.ssh:/root/.ssh:ro \
alpine/rsync \
rsync -avz /source/ user@newhost:/path/to/data/
Best Practices
1. Name Your Volumes
# Bad - hard to identify
docker run -v 8d7f6e5a4b3c:/data myapp
# Good - descriptive name
docker run -v myapp-postgres-data:/var/lib/postgresql/data postgres
2. Use Labels
# Create with labels
docker volume create \
--label project=myapp \
--label environment=production \
myapp-data
# Filter by labels
docker volume ls --filter label=project=myapp
3. Document Mount Points
# In Dockerfile, document expected mounts
# These are the data directories:
# /var/lib/postgresql/data - Database files
# /app/uploads - User uploaded files
# /app/logs - Application logs
4. Separate Data Types
# Separate volumes by purpose
docker run -d \
-v db-data:/var/lib/postgresql/data \
-v db-logs:/var/log/postgresql \
-v db-backups:/backups \
postgres
5. Regular Cleanup
# Remove unused volumes
docker volume prune
# Remove specific old volumes
docker volume rm $(docker volume ls -q -f "dangling=true")
Monitoring Volume Usage
# Check disk usage
docker system df
# Detailed volume sizes
docker system df -v
# Volume size in container
docker exec mycontainer du -sh /var/lib/postgresql/data
Key Takeaways
- Containers are ephemeral by default - data needs explicit persistence
- Use named volumes for production data (databases, uploads)
- Use bind mounts for development workflows
- Use tmpfs for sensitive data that shouldn't persist
- Implement regular backup procedures for important volumes
- Label volumes for better organization
- Clean up unused volumes regularly
- Consider data migration strategies before deployment

