Building Glossaries & Terminology with AI
Terminology work is the part of a translator's job that scales worst with human time. Building a 200-term glossary by hand for a new client used to mean a day of work. AI does the first 80% of that work in 20 minutes, leaving you to do the 20% β verification β that actually requires your expertise.
What You'll Learn
- How to extract a domain glossary from a source document
- How to enrich a glossary with definitions, context, and usage notes
- How to convert AI output into a CAT-tool-ready termbase
- The verification habit that keeps AI glossaries trustworthy
Why Terminology Is Worth Your AI Time
Bad terminology is the leading cause of poor large-project quality. The same English term gets translated inconsistently because:
- You translated some segments on Monday and some on Friday
- A different translator worked on chapters 3β6
- The client did not provide a glossary
- The reviewer disagrees with the translator's choices
A solid project glossary, agreed with the client before you start, eliminates 60% of avoidable revisions. AI lets you build one in a single afternoon.
Step 1: Term Extraction from the Source
Paste a representative chunk of the source β an executive summary, the first chapter, or the table of contents plus introduction β into Claude or ChatGPT.
"You are a senior terminologist. I am translating a [DOMAIN] document from [SOURCE] to [TARGET] for a [CLIENT TYPE] audience. Below is the executive summary of the source. Extract the 40 most important domain-specific terms and proper nouns that will need consistent translation throughout the project. Exclude common everyday words. For each term, list:
- The exact source-language term
- Why it is important (concept / product name / regulation / acronym / technical jargon)
- Whether it should be left untranslated, translated literally, or adapted
Format as a Markdown table."
A typical output: 30β50 terms with rationale. Now you have the spine of your glossary.
Step 2: Generate Target-Language Equivalents
For each extracted term, get a candidate translation with context:
"For each row in the table above, propose two candidate translations into [TARGET LANGUAGE, e.g., Brazilian Portuguese] for a [DOMAIN] audience. Include:
- Suggested target term
- Alternative
- One sentence of usage guidance for me as the translator
- A 12β15 word example sentence in the target language showing the term in use"
This is the moment where AI is most useful and most dangerous. The fluency of the suggestions can lure you into accepting plausible but incorrect terms. Always verify against authoritative sources before adding to your termbase.
Step 3: Verification Against Authoritative Sources
For every term in your AI-generated glossary, check at least one authoritative source:
- EU terminology: IATE (iate.europa.eu) β multilingual, vetted, free
- UN terminology: UNTERM (unterm.un.org)
- Canadian government: TERMIUM Plus (free, EN/FR/ES/PT)
- Spanish-specific: FundΓ©u, RAE
- German-specific: DWDS, Duden
- Medical: WHO terminology, MedDRA, ICD-11
- Legal: National legal dictionaries (Black's Law Dictionary for US English; Cornu for French)
- Industry: Client's existing materials, competitor websites in the target market, industry body publications
A clean workflow: keep two browser windows open. AI suggestion on the left, authoritative source on the right. Accept, reject, or modify each term in 10β30 seconds.
Step 4: Enrich with Context and Forbidden Terms
A working termbase is more than equivalents. Ask the AI to expand each entry:
"For each accepted term, add the following fields: (1) part of speech in the target language, (2) grammatical gender (if applicable), (3) common collocations (3 examples), (4) forbidden synonyms or false friends to avoid, (5) plural form, (6) a one-sentence definition in the target language."
You now have a termbase that rivals what a senior terminologist would produce in a week, in under an hour.
Step 5: Export to Your CAT Tool
Most CAT tools (Trados Studio, memoQ, Phrase, Smartcat) accept termbases as:
- TBX (TermBase eXchange, XML format)
- CSV with specific column headers
- Excel with mapped fields
Have the AI format it for you:
"Export the accepted terms as a CSV with these columns, ready for memoQ import: Source | Target | Part of speech | Domain | Definition | Forbidden synonyms | Notes. Quote any field that contains a comma. Use UTF-8."
Save the output as a .csv file and import into your CAT tool.
Multilingual Glossaries in One Pass
If you work into multiple targets β common for software localization or institutional clients β you can build a multilingual glossary in one prompt:
"For each of the 40 source terms above, generate equivalents in: French (France), Spanish (Latin America), Italian, German, Brazilian Portuguese, Simplified Chinese, Japanese. Format as one row per source term, with one column per language. Highlight any term where you are less than 80% confident in the equivalent and explain why."
The "highlight uncertainty" instruction is what makes the output safe β you know which rows to scrutinize first.
When the Client Has an Existing Glossary
Don't throw it away. Feed it to the AI as authoritative context:
"Below is the client's existing English-French glossary (50 terms). Below that is the source text I will translate (3,000 words). Identify: (a) any source terms not yet in the glossary that should be added, (b) any glossary terms that appear in the source so I know they're relevant, (c) any inconsistencies in the existing glossary I should raise with the client."
This gap-analysis prompt is gold for repeat clients. It surfaces missing terms, flags drift in the existing termbase, and gives you something to send the client to demonstrate diligence.
Interpreter-Specific Glossaries
Interpreters need glossaries that look different from translators':
- Phonetic guides for difficult names
- Acronym expansions
- Quick equivalents arranged for booth-readability, not alphabetically
"I will simultaneous-interpret EN β ES at a 4-hour conference on offshore wind farms. Here is the program. Generate a glossary of 40 likely terms with: (1) English term, (2) Spanish equivalent, (3) IPA pronunciation for any English name or acronym, (4) Spanish pronunciation note where Spanish-speaking listeners may stumble. Order by likely frequency in the agenda."
A Cautionary Tale
In 2024 a freelance translator working on a Mongolian-English medical glossary trusted ChatGPT's "veterinary equivalent" output for a complex pharmacological term β and shipped the term to a client who used it in regulatory paperwork. The term was wrong. The client lost the filing window. Verification matters. AI gives you speed; your authoritative sources give you safety. Use both.
Key Takeaways
- AI extracts and proposes terminology dramatically faster than manual work β but is not authoritative.
- Always verify AI suggestions against IATE, UNTERM, TERMIUM, client materials, or industry sources.
- Enrich termbases with collocations, forbidden synonyms, and definitions for richer CAT-tool support.
- Build interpreter glossaries differently from translator glossaries β booth-readability and pronunciation matter.

