When to Use Extended Thinking
Extended thinking is one of Claude's most powerful — and most misunderstood — capabilities. It's not a magic performance boost for every task. Used in the right situations, it produces dramatically better results. Used in the wrong ones, it just increases cost and latency with no benefit. This lesson explains what extended thinking actually is, where it helps, and how to think about the cost tradeoffs.
What Extended Thinking Actually Is
When you enable extended thinking, Claude gains access to a scratchpad before composing its response. During this phase, Claude works through the problem — exploring approaches, checking reasoning, reconsidering assumptions, and organizing its thoughts — before producing the final answer you receive.
These internal steps are called thinking tokens. They consume token budget but are not shown to the end user by default. What you get back is the finished response, informed by that internal reasoning process.
Think of it like the difference between asking someone a hard question and expecting an instant answer versus giving them five minutes to think on paper before they respond. The thinking process itself isn't the output — it's the mechanism that makes the output better.
The Thinking Budget Parameter
When calling the Claude API with extended thinking enabled, you set a thinking budget — the maximum number of tokens Claude can use for internal reasoning. The actual thinking usage will vary by problem complexity.
{
"model": "claude-opus-4-6",
"max_tokens": 16000,
"thinking": {
"type": "enabled",
"budget_tokens": 10000
},
"messages": [...]
}
Key constraints:
budget_tokensmust be at least 1,024budget_tokensmust be less thanmax_tokens- A higher budget doesn't guarantee better output — Claude uses what it needs
Start conservative: a budget of 5,000–8,000 tokens covers most use cases. Increase only if output quality is insufficient.
When Extended Thinking Helps
Extended thinking improves performance on tasks that require holding multiple considerations in mind simultaneously, checking work against constraints, or exploring a problem space before committing to an approach.
Complex mathematical and quantitative reasoning Multi-step calculations, proofs, or optimization problems where errors compound across steps. Claude can verify intermediate results before proceeding.
Multi-step logical problems Puzzles, deductive reasoning chains, or inference tasks where the answer depends on correctly following a sequence of logical steps.
Code architecture decisions Designing a system from scratch, choosing between architectural approaches, or diagnosing performance issues that require reasoning about the entire call chain.
Legal and financial analysis Applying multiple overlapping rules, regulations, or constraints to a specific situation — the kind of analysis where a wrong intermediate step invalidates the entire conclusion.
Debugging complex issues Tracing a bug through multiple layers of code or data pipelines where understanding root cause requires ruling out many candidate explanations.
Strategic planning under constraints Any task with multiple competing requirements that need to be weighed against each other before arriving at a recommendation.
When Extended Thinking Doesn't Help
For many tasks, extended thinking adds cost and latency with no quality improvement. The internal reasoning process doesn't change the output when the task doesn't require it.
Simple factual Q&A — "What is the capital of France?" Thinking won't change the answer.
Creative writing — Writing a poem or short story is not improved by internal deliberation. Claude's creative output is already a generative process.
Translation — Translating text between languages is a pattern-matching task that doesn't benefit from extended reasoning.
Summarization of short content — Summarizing a paragraph or two doesn't require pre-reasoning.
Straightforward instructions — "Format this list alphabetically" or "Make this email more formal" are direct edits that don't require extended thinking.
High-volume, low-complexity tasks — If you're running thousands of classification requests, the cost of thinking tokens adds up quickly for no measurable benefit.
A useful heuristic: if a task could be done well by a thoughtful person in under 30 seconds, extended thinking won't help. If the task would benefit from someone working through it carefully on a whiteboard, it probably will.
Cost Implications
Thinking tokens count toward output token usage and are billed accordingly. A single extended thinking request with a 10,000 token budget can cost significantly more than a standard request.
Practical guidelines:
- Don't enable extended thinking by default across all requests
- Identify the small subset of your use case that genuinely requires it
- For interactive applications, set expectations about latency — extended thinking responses take longer
- Test with the smallest budget that produces acceptable quality before scaling up
Try It: A Problem Suited for Extended Thinking
The prompt below presents a constraint optimization problem — the kind of task that benefits most from extended thinking. Try it as written, then experiment with simplifying the problem to see when the extra reasoning starts to matter.
Key Takeaways
- Extended thinking gives Claude a reasoning scratchpad before producing its response — thinking tokens are not shown to users
- It helps most for tasks requiring multi-step reasoning, constraint satisfaction, complex debugging, and strategic analysis
- It adds no value for simple Q&A, creative tasks, translation, or straightforward instructions
- Set a
budget_tokensparameter conservatively and increase only when output quality is insufficient - Thinking tokens are billed as output tokens — enable selectively for tasks that genuinely require it
Discussion
Sign in to join the discussion.

