Response Prefilling for Format Control
Response prefilling is an API-level technique that lets you control exactly how Claude begins its response. By placing text in the assistant turn before Claude generates its reply, you force Claude to continue from that starting point — effectively locking in the format before Claude writes a single word. It is one of the most reliable ways to get structured output without elaborate prompting.
What Response Prefilling Is
In the Claude API, a conversation is a list of messages with alternating user and assistant roles. Normally you send user messages and Claude fills in the assistant messages. Prefilling works by providing a partial assistant message yourself — Claude then completes it.
Here is the structure in the API:
{
"messages": [
{
"role": "user",
"content": "Analyze the sentiment of this review: 'The product broke after one week.'"
},
{
"role": "assistant",
"content": "{"
}
]
}
Claude receives that partial { and completes the JSON object. It has no choice but to continue in JSON — the opening brace is already there. This is far more reliable than asking Claude to "respond in JSON format" and hoping for compliance.
JSON Prefilling
JSON is the most common use case for prefilling. You can prefill at different levels of specificity depending on how much control you need.
Minimal prefill — just the opening brace:
{ "role": "assistant", "content": "{" }
Claude will complete the entire JSON structure. This works well when you want valid JSON but don't need to dictate the exact keys.
Key-anchored prefill — start with a known key:
{ "role": "assistant", "content": "{\"result\":" }
Now Claude must continue with the value for result. This guarantees your expected top-level key is present.
Deeper prefill — nested structure start:
{ "role": "assistant", "content": "{\"sentiment\": \"" }
Claude will complete the string value for sentiment. You can use this to constrain what values appear in specific fields.
Prefilling is especially valuable when downstream code expects a specific JSON schema. A parsing failure at 2 AM because Claude added a markdown explanation above the JSON block is exactly the kind of brittle behavior prefilling eliminates.
Code Block Prefilling
When you need Claude to produce code without any preamble, prefilling with a fenced code block opener forces the response into code-only mode:
{ "role": "assistant", "content": "```python\n" }
Claude will immediately begin writing Python code. Without prefilling, Claude often responds with something like "Here is a Python function that..." followed by the code. Prefilling removes that conversational wrapper entirely.
You can do the same for other languages:
{ "role": "assistant", "content": "```typescript\n" }
{ "role": "assistant", "content": "```sql\n" }
This technique is useful when you are building code generation pipelines and need to extract the code programmatically without stripping markdown explanations.
XML Prefilling
XML-structured output follows the same pattern. If you want Claude to return a response inside specific XML tags:
{ "role": "assistant", "content": "<analysis>\n" }
Claude will continue inside the <analysis> tag and close it properly. You can go further by opening multiple tags:
{ "role": "assistant", "content": "<response>\n<summary>" }
Claude will complete the <summary> element and continue building the XML structure. This is useful for pipelines that parse XML, or when you want to use Claude's natural tendency to close open tags as a structural guarantee.
Limitations and Constraints
Prefilling has two important constraints you must know before using it in production:
Extended thinking is incompatible with prefilling. When you enable extended thinking (the feature that shows Claude's reasoning process), you cannot use a prefilled assistant turn. The API will return an error. If you need both structured output and extended thinking, you must rely on prompting alone — not prefilling.
Prefilling is API-only. The claude.ai chat interface does not expose the message list in a way that lets you inject a partial assistant message. Prefilling is a feature of direct API access (or SDK usage). It is not available in the Projects feature, the web UI, or most third-party integrations that abstract over the raw API. If your application uses one of those surfaces, use prompting techniques for format control instead.
Comparing Prompted vs. Prefilled Format Control
The playground below lets you compare prompting for JSON output versus what a prefilled equivalent would look like. Because the playground runs in a browser (not via the raw API), it simulates the prefilling effect through strong prompting — in your actual API integration, you would use the assistant turn approach described above.
In a real API call, you would pair this prompt with a prefilled assistant turn starting with { — eliminating the need for the "Return ONLY the JSON object" instruction entirely. The prefill does the work that instruction is attempting to do.
Prefilling Pattern Reference
Exercise: Design a Prefilled API Prompt
You are building a document processing pipeline. Users submit legal contract excerpts and your application needs structured data about the parties, dates, and key obligations. Design the full API prompt — including the user message and the prefilled assistant turn — to reliably extract this information as JSON.
Key Takeaways
Response prefilling is a surgical tool for format control. It works by injecting a partial assistant message into the API conversation, forcing Claude to continue from that starting point. Use { for JSON, triple backticks for code, and XML tags for structured markup. Remember: prefilling does not work with extended thinking, and it requires direct API access. For applications that need guaranteed output structure in production, prefilling is more reliable than any amount of format instruction in the prompt.
Discussion
Sign in to join the discussion.

