Understanding Context Windows

Context windows are the memory limit for AI models. Understanding how they work is essential for building chains that don't break due to context overflow.

What is a Context Window?

The context window is the total amount of text a model can "see" at once. This includes:

Your system prompt
Conversation history
The current user message
Any injected context (documents, previous outputs)
The model's response

┌─────────────────────────────────────────┐
│            CONTEXT WINDOW               │
│         (e.g., 128k tokens)             │
├─────────────────────────────────────────┤
│  System Prompt          │    1,000      │
│  Chain State            │    5,000      │
│  Previous Step Outputs  │   15,000      │
│  Current Step Input     │    3,000      │
│  ─────────────────────────────────────  │
│  USED                   │   24,000      │
│  AVAILABLE FOR RESPONSE │  104,000      │
└─────────────────────────────────────────┘

Token Basics

Tokens are the units models use to measure text:

Average: ~4 characters per token in English
Words: Most common words are 1-2 tokens
Code: Often uses more tokens due to special characters
JSON: Structure overhead adds tokens

Loading Prompt Playground...

Context Window Sizes

Different models have different limits:

Model	Context Window
GPT-3.5	4k - 16k tokens
GPT-4	8k - 128k tokens
Claude 3	200k tokens
Gemini	32k - 1M tokens

Important: Larger isn't always better. Longer contexts can:

Increase latency
Increase cost
Sometimes decrease quality (lost in the middle)

Context in Chains

Each step in a chain needs context. This accumulates:

Step 1: Input (1000 tokens)
        └→ Output (500 tokens)

Step 2: Input (1000) + Step1 Output (500) = 1500 tokens
        └→ Output (800 tokens)

Step 3: Input (1000) + Step1 (500) + Step2 (800) = 2300 tokens
        └→ Output (1000 tokens)

By Step 10: Could easily exceed 10,000+ tokens

The "Lost in the Middle" Problem

Models don't attend equally to all parts of context:

Attention strength:
███████████░░░░░░░░░░░░░░░░░███████████
^                                      ^
Beginning              Middle          End
(Strong)              (Weak)          (Strong)

This means:

Put important information at the start and end
Critical instructions should be in the system prompt
The most recent context matters most

Managing Context in Chains

Strategy 1: Minimal Context

Only pass what's absolutely needed:

// Bad: Pass everything
const step2Input = {
  originalDocument: fullDocument,      // 5000 tokens
  step1Analysis: completeAnalysis,     // 2000 tokens
  metadata: allMetadata               // 500 tokens
};

// Good: Pass only what's needed
const step2Input = {
  summary: step1Analysis.summary,      // 200 tokens
  keyEntities: step1Analysis.entities, // 100 tokens
  relevantMetadata: {                  // 50 tokens
    documentType: metadata.type
  }
};

Strategy 2: Context Budgets

Allocate tokens per step:

const contextBudget = {
  systemPrompt: 500,
  originalInput: 1000,
  previousSteps: 2000,
  currentStep: 500,
  responseBuffer: 2000,
  total: 6000
};

function checkBudget(content, allocation) {
  const tokens = estimateTokens(content);
  if (tokens > allocation) {
    return summarize(content, allocation);
  }
  return content;
}

Strategy 3: Rolling Window

Keep only recent context:

class RollingContextWindow {
  constructor(maxTokens) {
    this.maxTokens = maxTokens;
    this.items = [];
  }

  add(item) {
    this.items.push(item);

    // Remove oldest items if over budget
    while (this.getTotalTokens() > this.maxTokens) {
      this.items.shift();
    }
  }

  getTotalTokens() {
    return this.items.reduce((sum, item) => sum + item.tokens, 0);
  }
}

Measuring Context Usage

Token Estimation

Loading Prompt Playground...

Monitoring Token Usage

async function trackedStep(prompt, input) {
  const inputTokens = estimateTokens(prompt + JSON.stringify(input));

  const startTime = Date.now();
  const result = await runPrompt(prompt, input);
  const duration = Date.now() - startTime;

  const outputTokens = estimateTokens(result);

  return {
    result,
    metrics: {
      inputTokens,
      outputTokens,
      totalTokens: inputTokens + outputTokens,
      duration,
      tokensPerSecond: outputTokens / (duration / 1000)
    }
  };
}

Exercise: Optimize Context Usage

Review this chain state and optimize it:

Loading Prompt Playground...

Key Takeaways

Context windows limit how much text a model can process
Tokens are the unit of measurement (~4 chars per token)
Context accumulates across chain steps
Models attend more to beginning and end of context
Pass only necessary context between steps
Use rolling windows or summaries for long chains
Monitor token usage to prevent overflow
Budget tokens across chain steps

Next, we'll explore strategies for accumulating context effectively.

Understanding Context Windows

Context windows are the memory limit for AI models. Understanding how they work is essential for building chains that don't break due to context overflow.

What is a Context Window?

The context window is the total amount of text a model can "see" at once. This includes:

Your system prompt
Conversation history
The current user message
Any injected context (documents, previous outputs)
The model's response

┌─────────────────────────────────────────┐
│            CONTEXT WINDOW               │
│         (e.g., 128k tokens)             │
├─────────────────────────────────────────┤
│  System Prompt          │    1,000      │
│  Chain State            │    5,000      │
│  Previous Step Outputs  │   15,000      │
│  Current Step Input     │    3,000      │
│  ─────────────────────────────────────  │
│  USED                   │   24,000      │
│  AVAILABLE FOR RESPONSE │  104,000      │
└─────────────────────────────────────────┘

Token Basics

Tokens are the units models use to measure text:

Average: ~4 characters per token in English
Words: Most common words are 1-2 tokens
Code: Often uses more tokens due to special characters
JSON: Structure overhead adds tokens

Loading Prompt Playground...

Context Window Sizes

Different models have different limits:

Model	Context Window
GPT-3.5	4k - 16k tokens
GPT-4	8k - 128k tokens
Claude 3	200k tokens
Gemini	32k - 1M tokens

Important: Larger isn't always better. Longer contexts can:

Increase latency
Increase cost
Sometimes decrease quality (lost in the middle)

Context in Chains

Each step in a chain needs context. This accumulates:

Step 1: Input (1000 tokens)
        └→ Output (500 tokens)

Step 2: Input (1000) + Step1 Output (500) = 1500 tokens
        └→ Output (800 tokens)

Step 3: Input (1000) + Step1 (500) + Step2 (800) = 2300 tokens
        └→ Output (1000 tokens)

By Step 10: Could easily exceed 10,000+ tokens

The "Lost in the Middle" Problem

Models don't attend equally to all parts of context:

Attention strength:
███████████░░░░░░░░░░░░░░░░░███████████
^                                      ^
Beginning              Middle          End
(Strong)              (Weak)          (Strong)

This means:

Put important information at the start and end
Critical instructions should be in the system prompt
The most recent context matters most

Managing Context in Chains

Strategy 1: Minimal Context

Only pass what's absolutely needed:

// Bad: Pass everything
const step2Input = {
  originalDocument: fullDocument,      // 5000 tokens
  step1Analysis: completeAnalysis,     // 2000 tokens
  metadata: allMetadata               // 500 tokens
};

// Good: Pass only what's needed
const step2Input = {
  summary: step1Analysis.summary,      // 200 tokens
  keyEntities: step1Analysis.entities, // 100 tokens
  relevantMetadata: {                  // 50 tokens
    documentType: metadata.type
  }
};

Strategy 2: Context Budgets

Allocate tokens per step:

const contextBudget = {
  systemPrompt: 500,
  originalInput: 1000,
  previousSteps: 2000,
  currentStep: 500,
  responseBuffer: 2000,
  total: 6000
};

function checkBudget(content, allocation) {
  const tokens = estimateTokens(content);
  if (tokens > allocation) {
    return summarize(content, allocation);
  }
  return content;
}

Strategy 3: Rolling Window

Keep only recent context:

class RollingContextWindow {
  constructor(maxTokens) {
    this.maxTokens = maxTokens;
    this.items = [];
  }

  add(item) {
    this.items.push(item);

    // Remove oldest items if over budget
    while (this.getTotalTokens() > this.maxTokens) {
      this.items.shift();
    }
  }

  getTotalTokens() {
    return this.items.reduce((sum, item) => sum + item.tokens, 0);
  }
}

Measuring Context Usage

Token Estimation

Loading Prompt Playground...

Monitoring Token Usage

async function trackedStep(prompt, input) {
  const inputTokens = estimateTokens(prompt + JSON.stringify(input));

  const startTime = Date.now();
  const result = await runPrompt(prompt, input);
  const duration = Date.now() - startTime;

  const outputTokens = estimateTokens(result);

  return {
    result,
    metrics: {
      inputTokens,
      outputTokens,
      totalTokens: inputTokens + outputTokens,
      duration,
      tokensPerSecond: outputTokens / (duration / 1000)
    }
  };
}

Exercise: Optimize Context Usage

Review this chain state and optimize it:

Loading Prompt Playground...

Key Takeaways

Context windows limit how much text a model can process
Tokens are the unit of measurement (~4 chars per token)
Context accumulates across chain steps
Models attend more to beginning and end of context
Pass only necessary context between steps
Use rolling windows or summaries for long chains
Monitor token usage to prevent overflow
Budget tokens across chain steps

Next, we'll explore strategies for accumulating context effectively.

Understanding Context Windows

What is a Context Window?

Token Basics

Context Window Sizes

Context in Chains

The "Lost in the Middle" Problem

Managing Context in Chains

Strategy 1: Minimal Context

Strategy 2: Context Budgets

Strategy 3: Rolling Window

Measuring Context Usage

Token Estimation

Monitoring Token Usage

Exercise: Optimize Context Usage

Key Takeaways

Discussion

Understanding Context Windows

What is a Context Window?

Token Basics

Context Window Sizes

Context in Chains

The "Lost in the Middle" Problem

Managing Context in Chains

Strategy 1: Minimal Context

Strategy 2: Context Budgets

Strategy 3: Rolling Window

Measuring Context Usage

Token Estimation

Monitoring Token Usage

Exercise: Optimize Context Usage

Key Takeaways

Discussion