Prompting with Extended Thinking
Knowing when to enable extended thinking is half the battle. The other half is knowing how to write prompts that work with it rather than against it. The counterintuitive finding from working with extended thinking is that the prompting style that works best is often the opposite of what most prompt engineers expect.
The Central Insight: High-Level Goals, Not Step-by-Step Instructions
When extended thinking is enabled, Claude uses its internal scratchpad to plan and reason through the problem. If you've already provided a detailed step-by-step breakdown of how to approach the problem, you've taken that responsibility away from Claude — and often constrained it to a reasoning path that isn't optimal.
Without extended thinking, explicit step-by-step instructions are often helpful. They substitute for Claude's limited working memory in a single forward pass.
With extended thinking, those same instructions can hurt performance. They force Claude to follow a predetermined path rather than discovering the best approach during its internal reasoning phase.
The shift in mental model: instead of writing a procedure for Claude to execute, write a problem for Claude to solve.
What Changes in Your Prompt Style
| Without Extended Thinking | With Extended Thinking |
|---|---|
| Break the task into numbered steps | State the goal and constraints clearly |
| Tell Claude exactly what to do first | Let Claude decide its own approach |
| Specify the reasoning path | Specify the quality criteria for the output |
| "First do X, then do Y, then do Z" | "Solve X. Requirements: Y. Output format: Z." |
This doesn't mean being vague. It means shifting from prescribing the method to describing the problem and the desired outcome.
The "Think Deeply About X" Pattern
One effective pattern is a direct invitation to reason thoroughly, without specifying how. This simple framing signals to Claude that deep analysis is expected and appropriate.
Compare these two approaches for a complex code architecture task:
Constrained (less effective with thinking):
First, list all the components we'll need.
Then, for each component, describe its responsibilities.
Next, define the interfaces between components.
Finally, identify any potential bottlenecks.
Goal-oriented (more effective with thinking):
Think carefully about the best architecture for this system.
Requirements: [list requirements]
Constraints: [list constraints]
Produce a complete design with justification for each major decision.
The second version lets Claude use its thinking budget to explore the design space before committing. The first version locks it into a structure before that exploration happens.
Thinking Budget Optimization
Start smaller than you think you need. Claude doesn't always use its full budget, and a well-scoped problem often requires less thinking than expected.
Practical starting points by task type:
- Constraint optimization (staffing, scheduling): 8,000–12,000 tokens
- Code architecture design: 6,000–10,000 tokens
- Complex debugging: 5,000–8,000 tokens
- Legal or financial analysis: 8,000–16,000 tokens
- Multi-step math proofs: 4,000–8,000 tokens
If the quality of the output is insufficient, increase the budget in steps of 2,000–4,000 tokens. If quality is already good, try reducing — you may be able to get the same result at lower cost.
Using Thinking for Code Generation and Debugging
Extended thinking is particularly valuable for:
Architecture from scratch — designing a system before any code exists, where the thinking phase explores different structural approaches and their tradeoffs.
Root cause analysis — when a bug has multiple plausible causes and systematic elimination is required. The thinking phase works through candidates before settling on an explanation.
Refactoring decisions — evaluating whether to restructure code and what the safest migration path looks like.
For code tasks, your prompt should describe what the code needs to do and the constraints it operates under, not the implementation steps. Let the thinking process explore the implementation.
Known Limitations
No response prefilling with extended thinking. The prefilling technique (providing the start of Claude's response) is incompatible with extended thinking. When thinking is enabled, Claude must start its response from scratch — the internal reasoning process and a pre-seeded response start can't coexist.
Thinking tokens are not streamed by default. In standard API integrations, the internal thinking content is not returned in the response. You receive only the final output. Some integrations support receiving thinking content, but this is not the default behavior.
Extended thinking requires a minimum budget. You cannot set budget_tokens below 1,024. For very fast tasks, this minimum may make extended thinking impractical from a latency standpoint.
Prompt Templates for Extended Thinking
These templates show the high-level, goal-oriented structure that works best.
Try It: A Complex Reasoning Task
The prompt below is structured using the goal-oriented style suited for extended thinking. Run it, then experiment by rewriting it as step-by-step instructions to see how the output changes.
Key Takeaways
- With extended thinking enabled, high-level goal descriptions outperform step-by-step instructions
- The thinking budget is used for internal reasoning — let Claude determine its own approach rather than prescribing it
- Use the pattern: describe the problem, state the constraints, specify what a good output looks like
- Prefilling is incompatible with extended thinking — don't combine the two techniques
- Start with a conservative thinking budget and increase in small increments based on output quality
- Thinking tokens are not returned in the response by default — you receive only the finished output
Discussion
Sign in to join the discussion.

