Building a Production Agent System
The techniques from previous lessons -- subagents, hooks, MCP servers, and headless mode -- combine into something greater than their parts: a production agent system. This lesson walks through designing, building, and operating an automated system where Claude Code agents handle real work on a schedule, with monitoring, cost controls, and failure recovery.
What You Will Learn
- Architecture of a Claude Code agent system
- Scheduler, heartbeat, and notification agents
- Content generation pipelines for blogs, news, and courses
- SEO audit automation
- Cost monitoring and token management
Agent System Architecture
A production agent system consists of several layers:
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā Scheduler ā
ā (cron / systemd / GitHub Actions) ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā¤
ā Orchestrator Agent ā
ā Reads task queue, delegates to workers ā
āāāāāāāāāāāā¬āāāāāāāāāāā¬āāāāāāāāāāā¬āāāāāāāāāāāāā¤
ā Content ā SEO ā Test ā Deploy ā
ā Agent ā Agent ā Agent ā Agent ā
āāāāāāāāāāāā“āāāāāāāāāāā“āāāāāāāāāāā“āāāāāāāāāāāāā¤
ā Shared Infrastructure ā
ā Git repos, MCP servers, notification hooks ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
The scheduler triggers runs at defined intervals. The orchestrator reads a task configuration and spawns specialized worker agents. Each worker has its own tools, model, and constraints. Shared infrastructure provides git access, MCP connections, and notification channels.
The Orchestrator Pattern
The orchestrator is a script that launches Claude Code in headless mode with a structured task:
#!/bin/bash
# orchestrator.sh - Run daily agent tasks
TIMESTAMP=$(date +%Y-%m-%d)
LOG_DIR="logs/agents/$TIMESTAMP"
mkdir -p "$LOG_DIR"
# Task 1: Content generation
claude --dangerously-skip-permissions \
-p "You are the content agent. Generate today's blog post about AI news.
Use WebSearch to find current news. Save to src/content/blog/.
Follow the project's MDX format and SEO guidelines." \
--model claude-sonnet-4-5 \
--max-turns 25 \
> "$LOG_DIR/content-agent.log" 2>&1
# Task 2: SEO audit
claude --dangerously-skip-permissions \
-p "You are the SEO agent. Audit the 10 most recently modified pages
for SEO compliance. Output a JSON report to reports/seo-audit.json." \
--model claude-haiku-3-5 \
--max-turns 15 \
> "$LOG_DIR/seo-agent.log" 2>&1
# Task 3: Test health check
claude --dangerously-skip-permissions \
-p "Run the full test suite. If any tests fail, create a GitHub issue
with the failure details and assign it to the on-call developer." \
--model claude-sonnet-4-5 \
--max-turns 10 \
> "$LOG_DIR/test-agent.log" 2>&1
# Send summary notification
node scripts/notify-slack.js "$LOG_DIR"
Content Generation Pipelines
Content pipelines are one of the most practical applications of agent systems. Here is a complete pipeline for blog post generation:
Pipeline Steps
- Research: Agent searches the web for current topics
- Deduplication: Agent checks existing content for overlapping topics
- Writing: Agent creates the post following project templates
- Validation: Script validates MDX compilation and SEO metadata
- Review: Changes go to a PR for human review
- Publication: After approval, the PR is merged and deployed
Implementation
#!/bin/bash
# content-pipeline.sh
set -e
BRANCH="content/auto-$(date +%Y%m%d)"
# Create a feature branch
git checkout -b "$BRANCH" main
# Step 1-3: Research, dedup, and write
claude --dangerously-skip-permissions \
-p "Create a blog post about the latest developments in AI coding assistants.
Before writing:
1. Search the web for news from the past week
2. Check existing posts in src/content/blog/ to avoid duplicating topics
3. If a similar post exists from the last 30 days, choose a different angle
Writing rules:
- Follow the MDX format used by existing posts
- Include proper frontmatter (title, date, description, keywords)
- Create structured-data.json for SEO
- Target 1500-2000 words
- Include at least 3 external source links" \
--model claude-sonnet-4-5 \
--max-turns 30
# Step 4: Validate
node scripts/validate-mdx-compilation.js src/content/blog/
npm test -- blog.test.ts
# Step 5: Create PR for review
git add src/content/blog/
git commit -m "feat: add auto-generated blog post for $(date +%Y-%m-%d)"
git push -u origin "$BRANCH"
gh pr create \
--title "Auto-generated blog post: $(date +%Y-%m-%d)" \
--body "Automated content pipeline. Please review before merging."
Course Generation Pipeline
For a learning platform, course generation follows a similar pattern but with more structure:
claude --dangerously-skip-permissions \
-p "Create a new course about Docker fundamentals.
Follow the exact structure in existing courses:
1. Create course-structure.json with 8 lessons in 3 modules
2. Create MDX lesson files (no frontmatter, escape curly braces)
3. Create quiz JSON files for each lesson (4-5 questions each)
4. Create final-exam-questions.json (22+ questions)
5. Update courses.json with the new course entry
6. Run: node scripts/validate-mdx-compilation.js on the course directory
7. Run: npm test -- courses.test.ts to verify" \
--model claude-opus-4-6 \
--max-turns 50
SEO Audit Automation
An automated SEO audit agent can run weekly and report issues:
#!/bin/bash
# seo-audit.sh
REPORT="reports/seo-audit-$(date +%Y-%m-%d).json"
claude --dangerously-skip-permissions \
-p "Perform a comprehensive SEO audit:
1. Check all page components in src/app/ for:
- Meta title length (target: 50-60 characters)
- Meta description length (target: 150-160 characters)
- Open Graph tags (og:title, og:description, og:image)
- Structured data / JSON-LD
- Heading hierarchy (single H1, logical H2-H6 order)
- Image alt text presence
2. Check src/content/ for:
- Missing SEO metadata in frontmatter
- Duplicate titles or descriptions
- Missing keywords
3. Output results as JSON to $REPORT with this structure:
\{
\"timestamp\": \"...\",
\"totalPages\": N,
\"issues\": [
\{
\"file\": \"path\",
\"severity\": \"critical|warning|info\",
\"issue\": \"description\",
\"fix\": \"suggested fix\"
\}
]
\}" \
--model claude-sonnet-4-5 \
--max-turns 20 \
--output-format json > "$REPORT"
# Alert if critical issues found
CRITICAL_COUNT=$(node -e "
const r = require('./$REPORT');
console.log((r.issues || []).filter(i => i.severity === 'critical').length);
")
if [ "$CRITICAL_COUNT" -gt 0 ]; then
echo "ALERT: $CRITICAL_COUNT critical SEO issues found"
# Send notification
node scripts/notify-slack.js "SEO Audit: $CRITICAL_COUNT critical issues" "$REPORT"
fi
Cost Monitoring and Token Management
Token costs can escalate quickly with automated agents. Implement these controls:
Per-Run Budget Controls
# Set max-turns to limit token consumption
claude --dangerously-skip-permissions \
-p "..." \
--max-turns 15 # Hard limit on conversation turns
Cost Tracking Script
#!/bin/bash
# track-costs.sh - Log token usage per agent run
AGENT_NAME=$1
START_TIME=$(date +%s)
# Run agent and capture output
OUTPUT=$(claude --dangerously-skip-permissions \
-p "$2" \
--model "$3" \
--max-turns "$4" \
--output-format json 2>&1)
END_TIME=$(date +%s)
DURATION=$((END_TIME - START_TIME))
# Log the run
echo "\{
\"agent\": \"$AGENT_NAME\",
\"timestamp\": \"$(date -u +%Y-%m-%dT%H:%M:%SZ)\",
\"model\": \"$3\",
\"max_turns\": $4,
\"duration_seconds\": $DURATION,
\"status\": \"completed\"
\}" >> logs/agent-costs.jsonl
Cost Optimization Strategies
- Use the cheapest model that works: Haiku for simple tasks, Sonnet for most tasks, Opus only when needed
- Set aggressive max-turns: Most tasks complete in 10-20 turns. Set limits to prevent runaway sessions
- Cache common operations: If multiple agents read the same files, pre-read them and pass as context
- Run at off-peak times: Some providers offer lower rates during off-peak hours
- Monitor weekly trends: Track cost per agent type and investigate spikes
Failure Recovery
Agents can fail. Build resilience into your system:
#!/bin/bash
# resilient-agent.sh
MAX_RETRIES=3
RETRY_COUNT=0
while [ $RETRY_COUNT -lt $MAX_RETRIES ]; do
claude --dangerously-skip-permissions \
-p "$1" \
--model claude-sonnet-4-5 \
--max-turns 15
if [ $? -eq 0 ]; then
echo "Agent completed successfully"
exit 0
fi
RETRY_COUNT=$((RETRY_COUNT + 1))
echo "Attempt $RETRY_COUNT failed. Retrying..."
sleep 10
done
echo "Agent failed after $MAX_RETRIES attempts"
node scripts/notify-slack.js "Agent failure: $1"
exit 1
Putting It All Together
A production agent system for a content platform might run like this:
Daily (6 AM):
- Content agent: Generate 1 blog post ā PR for review
- Test agent: Run full test suite ā Issue if failures
Weekly (Monday 9 AM):
- SEO agent: Full site audit ā Report + Slack alert
- Dependency agent: Check for updates ā PR if safe updates available
- Analytics agent: Generate weekly metrics summary ā Email to team
Monthly (1st, 9 AM):
- Course agent: Propose new course topics based on search trends
- Link checker: Verify all external links still work
- Performance agent: Run Lighthouse audits on key pages
Each agent runs in a container, logs its actions, and communicates results through PRs, issues, and Slack notifications. The human team reviews outputs and maintains the agent configurations.
Key Takeaways
- A production agent system uses a scheduler, orchestrator, and specialized worker agents
- Content pipelines combine research, deduplication, writing, validation, and PR-based review
- SEO audit agents can run on a schedule and alert on critical issues
- Cost control requires max-turns limits, model selection strategy, and usage tracking
- Build failure recovery with retry logic and notification hooks
- Always route automated changes through PRs for human review before merging to main
- Start small with one agent, prove the pattern, then expand to a full system

