Understanding Context Windows
Context windows are the memory limit for AI models. Understanding how they work is essential for building chains that don't break due to context overflow.
What is a Context Window?
The context window is the total amount of text a model can "see" at once. This includes:
- Your system prompt
- Conversation history
- The current user message
- Any injected context (documents, previous outputs)
- The model's response
┌─────────────────────────────────────────┐
│ CONTEXT WINDOW │
│ (e.g., 128k tokens) │
├─────────────────────────────────────────┤
│ System Prompt │ 1,000 │
│ Chain State │ 5,000 │
│ Previous Step Outputs │ 15,000 │
│ Current Step Input │ 3,000 │
│ ───────────────────────────────────── │
│ USED │ 24,000 │
│ AVAILABLE FOR RESPONSE │ 104,000 │
└─────────────────────────────────────────┘
Token Basics
Tokens are the units models use to measure text:
- Average: ~4 characters per token in English
- Words: Most common words are 1-2 tokens
- Code: Often uses more tokens due to special characters
- JSON: Structure overhead adds tokens
Loading Prompt Playground...
Context Window Sizes
Different models have different limits:
| Model | Context Window |
|---|---|
| GPT-3.5 | 4k - 16k tokens |
| GPT-4 | 8k - 128k tokens |
| Claude 3 | 200k tokens |
| Gemini | 32k - 1M tokens |
Important: Larger isn't always better. Longer contexts can:
- Increase latency
- Increase cost
- Sometimes decrease quality (lost in the middle)
Context in Chains
Each step in a chain needs context. This accumulates:
Step 1: Input (1000 tokens)
└→ Output (500 tokens)
Step 2: Input (1000) + Step1 Output (500) = 1500 tokens
└→ Output (800 tokens)
Step 3: Input (1000) + Step1 (500) + Step2 (800) = 2300 tokens
└→ Output (1000 tokens)
By Step 10: Could easily exceed 10,000+ tokens
The "Lost in the Middle" Problem
Models don't attend equally to all parts of context:
Attention strength:
███████████░░░░░░░░░░░░░░░░░███████████
^ ^
Beginning Middle End
(Strong) (Weak) (Strong)
This means:
- Put important information at the start and end
- Critical instructions should be in the system prompt
- The most recent context matters most
Managing Context in Chains
Strategy 1: Minimal Context
Only pass what's absolutely needed:
// Bad: Pass everything
const step2Input = {
originalDocument: fullDocument, // 5000 tokens
step1Analysis: completeAnalysis, // 2000 tokens
metadata: allMetadata // 500 tokens
};
// Good: Pass only what's needed
const step2Input = {
summary: step1Analysis.summary, // 200 tokens
keyEntities: step1Analysis.entities, // 100 tokens
relevantMetadata: { // 50 tokens
documentType: metadata.type
}
};
Strategy 2: Context Budgets
Allocate tokens per step:
const contextBudget = {
systemPrompt: 500,
originalInput: 1000,
previousSteps: 2000,
currentStep: 500,
responseBuffer: 2000,
total: 6000
};
function checkBudget(content, allocation) {
const tokens = estimateTokens(content);
if (tokens > allocation) {
return summarize(content, allocation);
}
return content;
}
Strategy 3: Rolling Window
Keep only recent context:
class RollingContextWindow {
constructor(maxTokens) {
this.maxTokens = maxTokens;
this.items = [];
}
add(item) {
this.items.push(item);
// Remove oldest items if over budget
while (this.getTotalTokens() > this.maxTokens) {
this.items.shift();
}
}
getTotalTokens() {
return this.items.reduce((sum, item) => sum + item.tokens, 0);
}
}
Measuring Context Usage
Token Estimation
Loading Prompt Playground...
Monitoring Token Usage
async function trackedStep(prompt, input) {
const inputTokens = estimateTokens(prompt + JSON.stringify(input));
const startTime = Date.now();
const result = await runPrompt(prompt, input);
const duration = Date.now() - startTime;
const outputTokens = estimateTokens(result);
return {
result,
metrics: {
inputTokens,
outputTokens,
totalTokens: inputTokens + outputTokens,
duration,
tokensPerSecond: outputTokens / (duration / 1000)
}
};
}
Exercise: Optimize Context Usage
Review this chain state and optimize it:
Loading Prompt Playground...
Key Takeaways
- Context windows limit how much text a model can process
- Tokens are the unit of measurement (~4 chars per token)
- Context accumulates across chain steps
- Models attend more to beginning and end of context
- Pass only necessary context between steps
- Use rolling windows or summaries for long chains
- Monitor token usage to prevent overflow
- Budget tokens across chain steps
Next, we'll explore strategies for accumulating context effectively.

