Structured Data Extraction Prompts
Extracting structured data from unstructured text is one of the most practically valuable applications of Claude. Whether you are parsing customer feedback into categories, extracting contract terms into a spreadsheet, or converting meeting notes into action items, the right prompt structure determines whether you get clean, usable output or a mess that needs manual cleanup.
The Core Principle: Define the Schema First
The most common mistake in data extraction prompts is describing the task without defining the output structure. Claude will produce something, but it will not be consistent enough to feed into downstream systems.
Weak approach:
Extract the key information from this job posting.
Strong approach:
Extract the following fields from this job posting. Return a JSON object matching this exact schema. If a field is not present in the posting, use null.
Schema:
{"title": string, "company": string, "location": string, "salary_min": number|null, "salary_max": number|null, "remote": boolean, "required_years_experience": number|null}
The schema-first approach gives Claude an unambiguous target. You get the same structure every time, which is essential when processing multiple items.
JSON Output Patterns
Flat Objects
For simple extractions, define a flat JSON schema inline:
Nested Structures
For more complex extractions, define nested schemas with descriptions:
{
"contract": {
"parties": [
{ "name": string, "role": "buyer" | "seller" | "agent" }
],
"effective_date": "YYYY-MM-DD" | null,
"termination_clauses": [
{ "trigger": string, "notice_days": number | null }
],
"payment_terms": {
"amount": number | null,
"currency": string | null,
"schedule": string | null
}
}
}
When using nested schemas, add a note: "Maintain the exact nesting structure. Do not flatten nested objects."
Handling Ambiguous Values
Tell Claude explicitly how to handle ambiguity:
If a field could reasonably be interpreted multiple ways, choose the most literal interpretation and add a
"_note"field alongside it explaining the ambiguity. For example, if salary is listed as a range, use the midpoint forsalaryand add"salary_note": "Listed as range 80k-100k, used midpoint".
CSV Output Patterns
CSV output is useful when you plan to import data into a spreadsheet or database. The key is defining column headers and their expected data types explicitly.
The final instruction "Output only the CSV" is important. Without it, Claude may wrap the CSV in explanation text that breaks automated parsing.
Batch Processing
When you have multiple items to process, structure your prompt to handle them in a single pass. This is more efficient than one prompt per item.
Batch pattern with XML delimiters:
Process each of the following items and return a JSON array.
Each array element should be the extraction result for one item.
Preserve the order of items.
<item id="1">
[First item text]
</item>
<item id="2">
[Second item text]
</item>
Return format: [{ "id": string, "extracted": { ...your schema } }]
The id field in the output lets you match results back to inputs, which is critical if Claude skips or reorders items (rare, but it can happen with very large batches).
Batch Size Considerations
For batches larger than 20-30 items, consider splitting them. Claude's accuracy can degrade toward the end of very long prompts. A practical test: run the same batch twice and compare outputs for items near the end.
Data Validation: Flagging Uncertain Extractions
Production data extraction requires knowing when Claude is uncertain. Add a confidence or flag mechanism to your schema:
For each extracted field, if you are less than confident in the value — because the source text is ambiguous, contradictory, or the field is inferred rather than stated explicitly — set the field value to null and add an entry to a
"flags"array:{ "field": "field_name", "issue": "brief description of ambiguity" }.
This pattern separates clean extractions from ones that need human review, which is far more useful than getting plausible-but-wrong values silently.
Combining XML Tags with JSON Output
For complex extraction tasks, use XML tags to organize the input and JSON to structure the output:
Summary
Effective data extraction prompts lead with the schema, specify types and null handling for every field, use batch patterns with IDs for multi-item processing, and include a flagging mechanism for uncertain extractions. The combination of XML tags for input organization and JSON for output structure gives you the most reliable and parseable results for production use cases.
Discussion
Sign in to join the discussion.

