Pre-Translation Document Analysis with AI
The single most underrated productivity gain in modern translation is the 20 minutes you spend with AI before you open your CAT tool. A well-prepared source document is a translation that flows; a poorly-prepared source is one that breaks every 200 words.
What You'll Learn
- How to use AI to analyze and triage a new source document
- The pre-translation questions you should ask before quoting or starting
- How to extract structure, references, and domain context automatically
- How to spot source-text problems before they become target-text problems
Why Pre-Analysis Pays for Itself
Every experienced translator has had this experience: you start a job, work for two hours, and only then realize the source contains a recurring proper noun you can't identify, three undefined acronyms, two equivalent terms used interchangeably for what may or may not be the same concept, and a section that appears to be lifted from a regulation you've never read.
You stop, you research, you backfill — losing momentum and confidence. The job runs over budget. Worse, if it's a fixed-price contract, you lose money.
Pre-translation analysis surfaces these problems before you start producing translated text. AI makes that analysis fast.
The Pre-Translation Brief
Before opening your CAT tool, paste the entire source (or its table of contents plus a representative chapter) into Claude or Gemini, both of which handle long inputs well, and run this:
"You are helping me prepare to translate the following document from [SOURCE] to [TARGET]. Produce a one-page pre-translation brief covering:
- Document type and likely purpose (regulatory filing? user manual? marketing brochure? academic paper?)
- Target audience and reading level
- Estimated word count (be approximate)
- Structural overview (headings, tables, lists, footnotes, references)
- The 5–10 most important domain-specific terms
- Any acronyms or abbreviations and what they likely stand for
- Proper nouns (people, organizations, products, places) that will recur
- Any references to external documents, regulations, or standards that I might need to consult
- Sections that look poorly written, ambiguous, or inconsistent in the source
- Three questions I should ask the client before starting
Save this output. It is your project handover document, your starting point for the glossary, and your evidence to the client if scope changes mid-project.
Spotting Source-Text Problems Early
Translators inherit the source's problems and the source's reputation. If the German source has a typo in a chemical formula and you faithfully translate the formula into English, you look incompetent.
A specific prompt for source-quality screening:
"Read the following French source. Identify any segments where: (a) a sentence appears incomplete or grammatically broken, (b) a term is used inconsistently across the document, (c) a number, date, or unit appears mismatched with another instance, (d) there is an obvious typo or OCR artifact, (e) a cross-reference points to a section that doesn't exist, (f) a footnote number is duplicated or missing. List each issue with its location."
Send the resulting list to the client before you start. This:
- Builds trust
- Documents that you flagged the issue (CYA)
- Often improves the source before you translate, saving you rework
Reference Tracking
Long professional documents reference external sources — laws, standards, prior reports, company policies. You need to know which ones have official target-language translations.
"Identify every external reference cited in the source text below (laws, standards, ISO/EN/DIN numbers, regulatory bodies, treaties, prior reports). For each, tell me: (a) the full official name, (b) whether an official target-language version exists, (c) where I would find the official target-language title (e.g., on EUR-Lex, ISO catalog, national gazette)."
A 5-minute prompt that prevents you from inventing your own translation of a law that already has an official German title.
Estimating Effort Before Quoting
For freelancers, pre-analysis is also where you produce a defensible quote.
"Analyze the following 8,000-word source document. Estimate:
- Effective word count for pricing (counting tables, captions, footnotes as appropriate)
- Domain difficulty (1–5)
- Repetition level (rough fuzzy match potential)
- Total hours for a competent translator working EN → IT at a sustainable pace of 250 finished words per hour for this domain
- Risk factors that justify a rush surcharge, complexity surcharge, or additional QA charge
Plus a draft client email summarizing the analysis and confirming the scope before I commit to a deadline."
Now you have a quote with reasoning behind it — not just a per-word number you pulled from intuition.
Structural Extraction for Format-Heavy Documents
For documents heavy in tables, captions, footnotes, or numbered lists — common in legal, technical, scientific work — AI can produce a structural skeleton:
"Read the following document. Produce a hierarchical outline showing: top-level heading, sub-headings, presence and approximate row count of any tables, lists, code blocks, callout boxes, footnotes. Number each element so I can refer to it later."
This becomes your map. When the client asks "did you translate the table in section 4.2?", you have a structural reference.
Acronym and Named-Entity Sweeps
The single biggest source of "wait, what does this mean?" interruptions during translation is undefined acronyms. Resolve them up front:
"Extract every acronym and abbreviation in the document. For each, list: the most likely expanded form, the domain it belongs to, and your confidence level (1–5). Flag any acronym where you would recommend I ask the client for confirmation."
This sweep prevents the mid-project panic when "PMS" turns out to be the client's internal "Project Management System" rather than the broader meaning the AI guessed.
Bilingual Document Comparison
If the client provides a previous version of the source plus its translation as reference material, you have gold:
"Below are two columns: source paragraphs from the previous version of this document, and the approved English translations. Build me a quick reference of: (1) recurring terms and their approved translations, (2) recurring phrases and their approved renderings, (3) any stylistic patterns I should match (e.g., always use 'we' instead of 'our company', always 'must' instead of 'shall')."
Now you have an instant style guide derived from approved prior work — invaluable for repeat clients.
A Sample Pre-Project Routine
For any project over 2,000 words, before opening the CAT tool:
- Run the pre-translation brief prompt. Save output.
- Run the source-quality screening prompt. Email client with any findings.
- Run the acronym sweep. Add unknowns to your client questions.
- Run the reference tracking prompt. Look up authoritative target-language versions of any external references.
- Run the effort estimate if quoting, or skip if already priced.
- Send the client a short email summarizing scope and asking any open questions.
- Now open the CAT tool.
Total time: 20–40 minutes. Translation time saved: typically 2–4x that, because you're not interrupting mid-flow to research what you should have researched up front.
Key Takeaways
- 20 minutes of AI-assisted pre-analysis can save hours of mid-project interruptions and rework.
- Flag source-text problems to the client before you start translating — protects your reputation and your time.
- Track external references early so you use the official target-language versions, not your own translations.
- Build a quick reference from any prior approved bilingual material the client provides.

