Structured Data Extraction Prompts

Extracting structured data from unstructured text is one of the most practically valuable applications of Claude. Whether you are parsing customer feedback into categories, extracting contract terms into a spreadsheet, or converting meeting notes into action items, the right prompt structure determines whether you get clean, usable output or a mess that needs manual cleanup.

The Core Principle: Define the Schema First

The most common mistake in data extraction prompts is describing the task without defining the output structure. Claude will produce something, but it will not be consistent enough to feed into downstream systems.

Weak approach:

Extract the key information from this job posting.

Strong approach:

Extract the following fields from this job posting. Return a JSON object matching this exact schema. If a field is not present in the posting, use null.

Schema: {"title": string, "company": string, "location": string, "salary_min": number|null, "salary_max": number|null, "remote": boolean, "required_years_experience": number|null}

The schema-first approach gives Claude an unambiguous target. You get the same structure every time, which is essential when processing multiple items.

JSON Output Patterns

Flat Objects

For simple extractions, define a flat JSON schema inline:

Loading Prompt Playground...

Nested Structures

For more complex extractions, define nested schemas with descriptions:

{
  "contract": {
    "parties": [
      { "name": string, "role": "buyer" | "seller" | "agent" }
    ],
    "effective_date": "YYYY-MM-DD" | null,
    "termination_clauses": [
      { "trigger": string, "notice_days": number | null }
    ],
    "payment_terms": {
      "amount": number | null,
      "currency": string | null,
      "schedule": string | null
    }
  }
}

When using nested schemas, add a note: "Maintain the exact nesting structure. Do not flatten nested objects."

Handling Ambiguous Values

Tell Claude explicitly how to handle ambiguity:

If a field could reasonably be interpreted multiple ways, choose the most literal interpretation and add a "_note" field alongside it explaining the ambiguity. For example, if salary is listed as a range, use the midpoint for salary and add "salary_note": "Listed as range 80k-100k, used midpoint".

CSV Output Patterns

CSV output is useful when you plan to import data into a spreadsheet or database. The key is defining column headers and their expected data types explicitly.

Extract product mentions from the following customer reviews. Output CSV with these exact columns — do not add additional columns: product_name,sentiment,specific_feature_mentioned,would_recommend Rules: - sentiment: must be exactly "positive", "negative", or "mixed" - specific_feature_mentioned: the single most prominent feature mentioned, or empty if none - would_recommend: true, false, or unknown - Escape any commas within field values with double quotes - Include the header row - One row per review <reviews> Review 1: [review text] Review 2: [review text] Review 3: [review text] </reviews> Output only the CSV. Do not include explanation or commentary.

The final instruction "Output only the CSV" is important. Without it, Claude may wrap the CSV in explanation text that breaks automated parsing.

Batch Processing

When you have multiple items to process, structure your prompt to handle them in a single pass. This is more efficient than one prompt per item.

Batch pattern with XML delimiters:

Process each of the following items and return a JSON array.
Each array element should be the extraction result for one item.
Preserve the order of items.

<item id="1">
[First item text]
</item>

<item id="2">
[Second item text]
</item>

Return format: [{ "id": string, "extracted": { ...your schema } }]

The id field in the output lets you match results back to inputs, which is critical if Claude skips or reorders items (rare, but it can happen with very large batches).

Batch Size Considerations

For batches larger than 20-30 items, consider splitting them. Claude's accuracy can degrade toward the end of very long prompts. A practical test: run the same batch twice and compare outputs for items near the end.

Data Validation: Flagging Uncertain Extractions

Production data extraction requires knowing when Claude is uncertain. Add a confidence or flag mechanism to your schema:

For each extracted field, if you are less than confident in the value — because the source text is ambiguous, contradictory, or the field is inferred rather than stated explicitly — set the field value to null and add an entry to a "flags" array: { "field": "field_name", "issue": "brief description of ambiguity" }.

This pattern separates clean extractions from ones that need human review, which is far more useful than getting plausible-but-wrong values silently.

Combining XML Tags with JSON Output

For complex extraction tasks, use XML tags to organize the input and JSON to structure the output:

You are extracting structured data from sales call transcripts. <schema> { "call_id": string, "duration_minutes": number | null, "prospect_company": string | null, "decision_maker_present": boolean, "pain_points": string[], "objections_raised": string[], "next_steps": string[], "deal_stage": "discovery" | "evaluation" | "negotiation" | "closed_won" | "closed_lost" | "unknown", "flags": [{ "field": string, "issue": string }] } </schema> <instructions> - Extract only what is explicitly stated in the transcript. Do not infer. - If you are uncertain about a field, use null and add a flag. - pain_points, objections_raised, and next_steps should be direct quotes or close paraphrases, not summaries. - Output only valid JSON. No preamble or explanation. </instructions> <transcript id="CALL-2024-0847"> [Transcript content here] </transcript>

Loading Exercise...

Summary

Effective data extraction prompts lead with the schema, specify types and null handling for every field, use batch patterns with IDs for multi-item processing, and include a flagging mechanism for uncertain extractions. The combination of XML tags for input organization and JSON for output structure gives you the most reliable and parseable results for production use cases.

Structured Data Extraction Prompts

The Core Principle: Define the Schema First

Weak approach:

Extract the key information from this job posting.

Strong approach:

Extract the following fields from this job posting. Return a JSON object matching this exact schema. If a field is not present in the posting, use null.

Schema: {"title": string, "company": string, "location": string, "salary_min": number|null, "salary_max": number|null, "remote": boolean, "required_years_experience": number|null}

The schema-first approach gives Claude an unambiguous target. You get the same structure every time, which is essential when processing multiple items.

JSON Output Patterns

Flat Objects

For simple extractions, define a flat JSON schema inline:

Loading Prompt Playground...

Nested Structures

For more complex extractions, define nested schemas with descriptions:

{
  "contract": {
    "parties": [
      { "name": string, "role": "buyer" | "seller" | "agent" }
    ],
    "effective_date": "YYYY-MM-DD" | null,
    "termination_clauses": [
      { "trigger": string, "notice_days": number | null }
    ],
    "payment_terms": {
      "amount": number | null,
      "currency": string | null,
      "schedule": string | null
    }
  }
}

When using nested schemas, add a note: "Maintain the exact nesting structure. Do not flatten nested objects."

Handling Ambiguous Values

Tell Claude explicitly how to handle ambiguity:

If a field could reasonably be interpreted multiple ways, choose the most literal interpretation and add a "_note" field alongside it explaining the ambiguity. For example, if salary is listed as a range, use the midpoint for salary and add "salary_note": "Listed as range 80k-100k, used midpoint".

CSV Output Patterns

CSV output is useful when you plan to import data into a spreadsheet or database. The key is defining column headers and their expected data types explicitly.

The final instruction "Output only the CSV" is important. Without it, Claude may wrap the CSV in explanation text that breaks automated parsing.

Batch Processing

When you have multiple items to process, structure your prompt to handle them in a single pass. This is more efficient than one prompt per item.

Batch pattern with XML delimiters:

Process each of the following items and return a JSON array.
Each array element should be the extraction result for one item.
Preserve the order of items.

<item id="1">
[First item text]
</item>

<item id="2">
[Second item text]
</item>

Return format: [{ "id": string, "extracted": { ...your schema } }]

The id field in the output lets you match results back to inputs, which is critical if Claude skips or reorders items (rare, but it can happen with very large batches).

Batch Size Considerations

Data Validation: Flagging Uncertain Extractions

Production data extraction requires knowing when Claude is uncertain. Add a confidence or flag mechanism to your schema:

For each extracted field, if you are less than confident in the value — because the source text is ambiguous, contradictory, or the field is inferred rather than stated explicitly — set the field value to null and add an entry to a "flags" array: { "field": "field_name", "issue": "brief description of ambiguity" }.

This pattern separates clean extractions from ones that need human review, which is far more useful than getting plausible-but-wrong values silently.

Combining XML Tags with JSON Output

For complex extraction tasks, use XML tags to organize the input and JSON to structure the output:

Loading Exercise...

Structured Data Extraction Prompts

The Core Principle: Define the Schema First

JSON Output Patterns

Flat Objects

Nested Structures

Handling Ambiguous Values

CSV Output Patterns

Batch Processing

Batch Size Considerations

Data Validation: Flagging Uncertain Extractions

Combining XML Tags with JSON Output

Summary

Questions & Answers

Structured Data Extraction Prompts

The Core Principle: Define the Schema First

JSON Output Patterns

Flat Objects

Nested Structures

Handling Ambiguous Values

CSV Output Patterns

Batch Processing

Batch Size Considerations

Data Validation: Flagging Uncertain Extractions

Combining XML Tags with JSON Output

Summary

Questions & Answers