Understanding Context Windows
Context windows are the memory limit for AI models. Understanding how they work is essential for building chains that don't break due to context overflow.
What is a Context Window?
The context window is the total amount of text a model can "see" at once. This includes:
- Your system prompt
- Conversation history
- The current user message
- Any injected context (documents, previous outputs)
- The model's response
βββββββββββββββββββββββββββββββββββββββββββ
β CONTEXT WINDOW β
β (e.g., 128k tokens) β
βββββββββββββββββββββββββββββββββββββββββββ€
β System Prompt β 1,000 β
β Chain State β 5,000 β
β Previous Step Outputs β 15,000 β
β Current Step Input β 3,000 β
β βββββββββββββββββββββββββββββββββββββ β
β USED β 24,000 β
β AVAILABLE FOR RESPONSE β 104,000 β
βββββββββββββββββββββββββββββββββββββββββββ
Token Basics
Tokens are the units models use to measure text:
- Average: ~4 characters per token in English
- Words: Most common words are 1-2 tokens
- Code: Often uses more tokens due to special characters
- JSON: Structure overhead adds tokens
Loading Prompt Playground...
Context Window Sizes
Different models have different limits:
| Model | Context Window |
|---|---|
| GPT-3.5 | 4k - 16k tokens |
| GPT-4 | 8k - 128k tokens |
| Claude 3 | 200k tokens |
| Gemini | 32k - 1M tokens |
Important: Larger isn't always better. Longer contexts can:
- Increase latency
- Increase cost
- Sometimes decrease quality (lost in the middle)
Context in Chains
Each step in a chain needs context. This accumulates:
Step 1: Input (1000 tokens)
ββ Output (500 tokens)
Step 2: Input (1000) + Step1 Output (500) = 1500 tokens
ββ Output (800 tokens)
Step 3: Input (1000) + Step1 (500) + Step2 (800) = 2300 tokens
ββ Output (1000 tokens)
By Step 10: Could easily exceed 10,000+ tokens
The "Lost in the Middle" Problem
Models don't attend equally to all parts of context:
Attention strength:
βββββββββββββββββββββββββββββββββββββββ
^ ^
Beginning Middle End
(Strong) (Weak) (Strong)
This means:
- Put important information at the start and end
- Critical instructions should be in the system prompt
- The most recent context matters most
Managing Context in Chains
Strategy 1: Minimal Context
Only pass what's absolutely needed:
// Bad: Pass everything
const step2Input = {
originalDocument: fullDocument, // 5000 tokens
step1Analysis: completeAnalysis, // 2000 tokens
metadata: allMetadata // 500 tokens
};
// Good: Pass only what's needed
const step2Input = {
summary: step1Analysis.summary, // 200 tokens
keyEntities: step1Analysis.entities, // 100 tokens
relevantMetadata: { // 50 tokens
documentType: metadata.type
}
};
Strategy 2: Context Budgets
Allocate tokens per step:
const contextBudget = {
systemPrompt: 500,
originalInput: 1000,
previousSteps: 2000,
currentStep: 500,
responseBuffer: 2000,
total: 6000
};
function checkBudget(content, allocation) {
const tokens = estimateTokens(content);
if (tokens > allocation) {
return summarize(content, allocation);
}
return content;
}
Strategy 3: Rolling Window
Keep only recent context:
class RollingContextWindow {
constructor(maxTokens) {
this.maxTokens = maxTokens;
this.items = [];
}
add(item) {
this.items.push(item);
// Remove oldest items if over budget
while (this.getTotalTokens() > this.maxTokens) {
this.items.shift();
}
}
getTotalTokens() {
return this.items.reduce((sum, item) => sum + item.tokens, 0);
}
}
Measuring Context Usage
Token Estimation
Loading Prompt Playground...
Monitoring Token Usage
async function trackedStep(prompt, input) {
const inputTokens = estimateTokens(prompt + JSON.stringify(input));
const startTime = Date.now();
const result = await runPrompt(prompt, input);
const duration = Date.now() - startTime;
const outputTokens = estimateTokens(result);
return {
result,
metrics: {
inputTokens,
outputTokens,
totalTokens: inputTokens + outputTokens,
duration,
tokensPerSecond: outputTokens / (duration / 1000)
}
};
}
Exercise: Optimize Context Usage
Review this chain state and optimize it:
Loading Prompt Playground...
Key Takeaways
- Context windows limit how much text a model can process
- Tokens are the unit of measurement (~4 chars per token)
- Context accumulates across chain steps
- Models attend more to beginning and end of context
- Pass only necessary context between steps
- Use rolling windows or summaries for long chains
- Monitor token usage to prevent overflow
- Budget tokens across chain steps
Next, we'll explore strategies for accumulating context effectively.
Discussion
Sign in to join the discussion.
0 comments

