Cost Optimization for Prompt Chains

Production chains need to balance quality with cost. This lesson covers strategies for optimizing token usage and API costs.

Understanding Chain Costs

Chain Cost = Σ(Input Tokens + Output Tokens) × Price per Token

Each step in a chain adds to total cost. Chains can multiply costs quickly.

Cost Drivers in Chains

Token Accumulation

// Cost grows with each step
const chainCosts = {
  step1: { input: 500, output: 200 },   // 700 tokens
  step2: { input: 800, output: 300 },   // 1,100 tokens (includes step1 output)
  step3: { input: 1200, output: 400 },  // 1,600 tokens (includes step1+2)
  total: 3400  // Total tokens used
};

The Context Accumulation Problem

Loading Prompt Playground...

Optimization Strategies

1. Selective Context Passing

Only pass what each step needs:

async function optimizedChain(document) {
  // Step 1: Full document needed
  const summary = await summarize(document);

  // Step 2: Only needs document, not summary
  const entities = await extractEntities(document);

  // Step 3: Only needs summary, not full document
  const sentiment = await analyzeSentiment(summary);

  // Step 4: Needs summary + entities + sentiment, NOT full document
  const report = await generateReport({
    summary,      // 1000 tokens
    entities,     // 500 tokens
    sentiment     // 300 tokens
    // Total: 1800 tokens instead of 6800
  });

  return report;
}

2. Model Selection by Task

Loading Prompt Playground...

3. Caching Strategies

class ChainCache {
  constructor(ttlSeconds = 3600) {
    this.cache = new Map();
    this.ttl = ttlSeconds * 1000;
  }

  getCacheKey(step, input) {
    // Create deterministic cache key
    return `${step}:${hashInput(input)}`;
  }

  async getOrCompute(step, input, computeFn) {
    const key = this.getCacheKey(step, input);
    const cached = this.cache.get(key);

    if (cached && Date.now() - cached.timestamp < this.ttl) {
      console.log(`Cache hit for ${step}`);
      return cached.value;
    }

    const result = await computeFn(input);
    this.cache.set(key, {
      value: result,
      timestamp: Date.now()
    });

    return result;
  }
}

// Usage in chain
const cache = new ChainCache(3600);

async function cachedChain(input) {
  const step1Result = await cache.getOrCompute(
    'classification',
    input,
    () => classifyContent(input)
  );

  // If same classification seen before, reuse analysis
  const step2Result = await cache.getOrCompute(
    'analysis',
    { type: step1Result.type, input },
    () => analyzeContent(input, step1Result)
  );

  return step2Result;
}

4. Prompt Compression

Loading Prompt Playground...

Cost Monitoring

Tracking Token Usage

class CostTracker {
  constructor(pricing) {
    this.pricing = pricing;
    this.usage = {
      totalInputTokens: 0,
      totalOutputTokens: 0,
      byStep: {},
      byModel: {}
    };
  }

  recordUsage(step, model, inputTokens, outputTokens) {
    // Track totals
    this.usage.totalInputTokens += inputTokens;
    this.usage.totalOutputTokens += outputTokens;

    // Track by step
    if (!this.usage.byStep[step]) {
      this.usage.byStep[step] = { input: 0, output: 0, calls: 0 };
    }
    this.usage.byStep[step].input += inputTokens;
    this.usage.byStep[step].output += outputTokens;
    this.usage.byStep[step].calls++;

    // Track by model
    if (!this.usage.byModel[model]) {
      this.usage.byModel[model] = { input: 0, output: 0 };
    }
    this.usage.byModel[model].input += inputTokens;
    this.usage.byModel[model].output += outputTokens;
  }

  getCost() {
    let totalCost = 0;

    for (const [model, usage] of Object.entries(this.usage.byModel)) {
      const pricing = this.pricing[model];
      totalCost += (usage.input / 1000) * pricing.input;
      totalCost += (usage.output / 1000) * pricing.output;
    }

    return {
      totalCost,
      breakdown: this.usage
    };
  }
}

Setting Budgets and Alerts

class BudgetManager {
  constructor(dailyBudget, alertThreshold = 0.8) {
    this.dailyBudget = dailyBudget;
    this.alertThreshold = alertThreshold;
    this.dailySpend = 0;
    this.lastReset = new Date().toDateString();
  }

  checkBudget(estimatedCost) {
    this.resetIfNewDay();

    if (this.dailySpend + estimatedCost > this.dailyBudget) {
      throw new Error('Daily budget exceeded');
    }

    if (this.dailySpend / this.dailyBudget > this.alertThreshold) {
      this.sendAlert(`Budget ${this.alertThreshold * 100}% consumed`);
    }
  }

  recordSpend(cost) {
    this.dailySpend += cost;
  }

  resetIfNewDay() {
    const today = new Date().toDateString();
    if (today !== this.lastReset) {
      this.dailySpend = 0;
      this.lastReset = today;
    }
  }
}

Exercise: Optimize a Chain

Loading Prompt Playground...

Key Takeaways

Token costs accumulate across chain steps
Pass only necessary context between steps
Use cheaper models for simpler tasks
Implement caching for repeated operations
Compress prompts without losing clarity
Monitor costs and set budget alerts
Balance cost optimization with quality requirements

Next, we'll explore latency and performance optimization.

Cost Optimization for Prompt Chains

Production chains need to balance quality with cost. This lesson covers strategies for optimizing token usage and API costs.

Understanding Chain Costs

Chain Cost = Σ(Input Tokens + Output Tokens) × Price per Token

Each step in a chain adds to total cost. Chains can multiply costs quickly.

Cost Drivers in Chains

Token Accumulation

// Cost grows with each step
const chainCosts = {
  step1: { input: 500, output: 200 },   // 700 tokens
  step2: { input: 800, output: 300 },   // 1,100 tokens (includes step1 output)
  step3: { input: 1200, output: 400 },  // 1,600 tokens (includes step1+2)
  total: 3400  // Total tokens used
};

The Context Accumulation Problem

Loading Prompt Playground...

Optimization Strategies

1. Selective Context Passing

Only pass what each step needs:

async function optimizedChain(document) {
  // Step 1: Full document needed
  const summary = await summarize(document);

  // Step 2: Only needs document, not summary
  const entities = await extractEntities(document);

  // Step 3: Only needs summary, not full document
  const sentiment = await analyzeSentiment(summary);

  // Step 4: Needs summary + entities + sentiment, NOT full document
  const report = await generateReport({
    summary,      // 1000 tokens
    entities,     // 500 tokens
    sentiment     // 300 tokens
    // Total: 1800 tokens instead of 6800
  });

  return report;
}

2. Model Selection by Task

Loading Prompt Playground...

3. Caching Strategies

class ChainCache {
  constructor(ttlSeconds = 3600) {
    this.cache = new Map();
    this.ttl = ttlSeconds * 1000;
  }

  getCacheKey(step, input) {
    // Create deterministic cache key
    return `${step}:${hashInput(input)}`;
  }

  async getOrCompute(step, input, computeFn) {
    const key = this.getCacheKey(step, input);
    const cached = this.cache.get(key);

    if (cached && Date.now() - cached.timestamp < this.ttl) {
      console.log(`Cache hit for ${step}`);
      return cached.value;
    }

    const result = await computeFn(input);
    this.cache.set(key, {
      value: result,
      timestamp: Date.now()
    });

    return result;
  }
}

// Usage in chain
const cache = new ChainCache(3600);

async function cachedChain(input) {
  const step1Result = await cache.getOrCompute(
    'classification',
    input,
    () => classifyContent(input)
  );

  // If same classification seen before, reuse analysis
  const step2Result = await cache.getOrCompute(
    'analysis',
    { type: step1Result.type, input },
    () => analyzeContent(input, step1Result)
  );

  return step2Result;
}

4. Prompt Compression

Loading Prompt Playground...

Cost Monitoring

Tracking Token Usage

class CostTracker {
  constructor(pricing) {
    this.pricing = pricing;
    this.usage = {
      totalInputTokens: 0,
      totalOutputTokens: 0,
      byStep: {},
      byModel: {}
    };
  }

  recordUsage(step, model, inputTokens, outputTokens) {
    // Track totals
    this.usage.totalInputTokens += inputTokens;
    this.usage.totalOutputTokens += outputTokens;

    // Track by step
    if (!this.usage.byStep[step]) {
      this.usage.byStep[step] = { input: 0, output: 0, calls: 0 };
    }
    this.usage.byStep[step].input += inputTokens;
    this.usage.byStep[step].output += outputTokens;
    this.usage.byStep[step].calls++;

    // Track by model
    if (!this.usage.byModel[model]) {
      this.usage.byModel[model] = { input: 0, output: 0 };
    }
    this.usage.byModel[model].input += inputTokens;
    this.usage.byModel[model].output += outputTokens;
  }

  getCost() {
    let totalCost = 0;

    for (const [model, usage] of Object.entries(this.usage.byModel)) {
      const pricing = this.pricing[model];
      totalCost += (usage.input / 1000) * pricing.input;
      totalCost += (usage.output / 1000) * pricing.output;
    }

    return {
      totalCost,
      breakdown: this.usage
    };
  }
}

Setting Budgets and Alerts

class BudgetManager {
  constructor(dailyBudget, alertThreshold = 0.8) {
    this.dailyBudget = dailyBudget;
    this.alertThreshold = alertThreshold;
    this.dailySpend = 0;
    this.lastReset = new Date().toDateString();
  }

  checkBudget(estimatedCost) {
    this.resetIfNewDay();

    if (this.dailySpend + estimatedCost > this.dailyBudget) {
      throw new Error('Daily budget exceeded');
    }

    if (this.dailySpend / this.dailyBudget > this.alertThreshold) {
      this.sendAlert(`Budget ${this.alertThreshold * 100}% consumed`);
    }
  }

  recordSpend(cost) {
    this.dailySpend += cost;
  }

  resetIfNewDay() {
    const today = new Date().toDateString();
    if (today !== this.lastReset) {
      this.dailySpend = 0;
      this.lastReset = today;
    }
  }
}

Exercise: Optimize a Chain

Loading Prompt Playground...

Key Takeaways

Token costs accumulate across chain steps
Pass only necessary context between steps
Use cheaper models for simpler tasks
Implement caching for repeated operations
Compress prompts without losing clarity
Monitor costs and set budget alerts
Balance cost optimization with quality requirements

Next, we'll explore latency and performance optimization.

Cost Optimization for Prompt Chains

Understanding Chain Costs

Cost Drivers in Chains

Token Accumulation

The Context Accumulation Problem

Optimization Strategies

1. Selective Context Passing

2. Model Selection by Task

3. Caching Strategies

4. Prompt Compression

Cost Monitoring

Tracking Token Usage

Setting Budgets and Alerts

Exercise: Optimize a Chain

Key Takeaways

Discussion

Cost Optimization for Prompt Chains

Understanding Chain Costs

Cost Drivers in Chains

Token Accumulation

The Context Accumulation Problem

Optimization Strategies

1. Selective Context Passing

2. Model Selection by Task

3. Caching Strategies

4. Prompt Compression

Cost Monitoring

Tracking Token Usage

Setting Budgets and Alerts

Exercise: Optimize a Chain

Key Takeaways

Discussion