Extracting Data from Documents
Finance teams deal with a constant flow of documents — invoices, contracts, bank statements, financial reports, supplier quotes — that contain data needing to be extracted, categorised, and entered into systems. AI can handle a significant portion of this extraction work.
What Document Extraction Looks Like
The basic workflow:
- You have a document with data in unstructured or semi-structured form
- You describe what you want extracted
- AI returns the data in a structured format (table, JSON, CSV-ready)
- You verify and import
This doesn't require any technical setup — you can do it today with Claude or ChatGPT by copying and pasting document text.
Invoice Extraction
For individual invoices or small batches:
"Extract all relevant data from this invoice and return it as a structured table:
Required fields: Supplier name, Supplier address, Invoice number, Invoice date, Due date, PO number (if present), Line items (description, quantity, unit price, line total), Subtotal, VAT rate, VAT amount, Total amount due, Payment terms, Bank details (if present).
If any field is not present in the invoice, write 'Not found'.
[paste invoice text]"
For processing multiple invoices:
"I'm going to paste 5 invoice texts one after another. For each one, extract: Supplier, Invoice Number, Date, Total Amount, VAT. Return as a single table with one row per invoice."
Contract Data Extraction
Contracts contain key financial terms buried in legal language:
"Extract the following financial and commercial terms from this contract:
- Contract value or pricing structure
- Payment terms
- Renewal/expiry dates
- Termination clauses (notice period, financial penalties)
- Price escalation provisions (index-linked increases, fixed escalators)
- Any caps, minimums, or volume commitments
Format as a summary table. Flag any terms that represent a financial risk.
[paste contract text]"
Bank Statement Analysis
"Here is a bank statement for [month]. Categorise each transaction into these categories: [list your categories]. Then produce a summary table showing total spend per category. Flag any transactions you cannot categorise with confidence.
[paste bank statement data]"
Annual Report Data Extraction
For competitor or investor research:
"Extract the following financial data from this annual report excerpt:
- 5-year revenue trend (if available)
- Latest year gross margin and operating margin
- Net debt position
- Any stated financial targets or guidance
- Key risk factors mentioned with financial implications
[paste annual report section]"
Extracting from PDFs
Most AI tools can read PDF content if you:
- Copy and paste the text (works for digital PDFs)
- Upload the file directly (Claude and ChatGPT support file uploads)
- Use a PDF-to-text converter first if the PDF is an image scan
For scanned documents (image-based PDFs), Claude's vision capabilities or ChatGPT's vision mode can read them directly from screenshots or uploaded images.
"This is a screenshot of an invoice. Extract all the data and return it as a structured summary."
Building Extraction Templates
For documents you process regularly, build a standard prompt:
"I process [type of document] every [frequency]. The format is always the same. Here is an example document:
[paste example]
Create a reusable extraction prompt I can use each time I receive one of these documents, with clear placeholders for the parts that change."
Quality Control
Always verify extracted data, especially:
- Numbers (AI can transpose digits or misread currency symbols)
- Dates (format inconsistencies can cause errors)
- Totals that should add up — verify the arithmetic
A useful check prompt:
"I've extracted this data from an invoice. Check that the line item totals add up to the subtotal, and that the subtotal plus VAT equals the total amount due. Flag any discrepancies.
[paste extracted data]"
Your Turn
Find 3 recent invoices or expense receipts in your email or finance system. Copy the text from each and use the invoice extraction prompt. Notice how much manual re-keying this could eliminate over a month.
Discussion
Sign in to join the discussion.

