Data Privacy and PII in Insurance AI
Insurance professionals routinely handle some of the most sensitive personal data in the economy: Social Security numbers, driver's license numbers, medical records, financial statements, and detailed loss histories. Before you paste any of this into an AI tool, you need to understand the privacy rules that apply.
What You'll Learn
- The categories of sensitive data that appear in insurance work
- Which AI tools are safe for which kinds of data
- HIPAA, GLBA, NAIC Model 668, and state privacy law basics
- Practical de-identification techniques you can use today
The Sensitive Data You Touch Every Day
A typical insurance file contains a stack of regulated information. The big categories:
- PII (Personally Identifiable Information): Name, address, date of birth, SSN, driver's license, account numbers
- PHI (Protected Health Information): Diagnosis codes, treatment plans, medical narratives, prescription history — covered by HIPAA when handled in conjunction with health insurance or ERISA self-funded plans
- Financial information: Bank accounts, income, tax returns, credit reports — covered under GLBA
- Biometric data: Voice prints, photographs of injured insureds, dashcam video
- Loss and litigation history: Prior claims, court filings, settlement amounts
Pasting any of this into a public AI tool without the right safeguards can be a regulatory event.
The Privacy Laws That Matter
HIPAA (Health Insurance Portability and Accountability Act)
If you work with health insurance, disability claims, workers' comp, or any product that touches PHI, HIPAA applies. You can only share PHI with vendors who have signed a Business Associate Agreement (BAA). Most consumer AI tools do not offer a BAA on the free or basic tier.
GLBA (Gramm-Leach-Bliley Act)
GLBA covers financial information. It applies to most insurance products. The Safeguards Rule requires a written information security program and reasonable controls over how customer financial data is shared.
NAIC Model 668 — Insurance Data Security Model Law
Adopted in some form by more than 25 US states. Requires carriers and producers to implement and maintain an information security program, conduct risk assessments, and notify the state insurance commissioner of cybersecurity events within 72 hours.
State Privacy Laws
California (CCPA/CPRA), Virginia (VCDPA), Colorado (CPA), Texas (TDPSA), and a growing list of other states have consumer privacy laws that grant rights over personal data. Most insurance is partially exempt under GLBA, but customer interactions outside of policy administration may not be.
GDPR
If you write any business in the EU or UK, GDPR applies. Personal data must be processed lawfully, with a defined purpose, and subjects have rights to access, correct, and delete.
Which AI Tools Are Safe for Which Data
Free / personal tier of ChatGPT, Claude, Gemini, Perplexity: Treat as public. Do not paste PII, PHI, GLBA-covered data, or anything you would not post on social media.
ChatGPT Team / Enterprise, Claude Team / Enterprise, Gemini for Workspace, Microsoft 365 Copilot: Generally do not train on your data. Some offer BAAs and stronger contractual protections. Still, your carrier's IT and compliance team should approve before you upload regulated content.
Carrier-procured AI platforms (Guidewire AI, Duck Creek AI, vendor-built tools): These have been negotiated with security and compliance review and are usually safe for the data they were designed to handle.
When in doubt, ask. Your privacy officer or CISO would much rather answer "can I use Tool X for Y?" than handle a breach notification.
Practical De-identification Techniques
You can still use AI for sensitive workflows without violating policy. The trick is to remove identifiers before pasting. A few practical patterns:
Replace names and identifiers with placeholders
Instead of "John Smith, SSN 123-45-6789, claim 2026-44812," use "Insured A, SSN [REDACTED], claim [CLAIM_NUMBER]." The AI can still analyze the structure and suggest improvements without ever seeing the real values.
Keep facts, drop identifiers
For a medical narrative, you might paste: "45-year-old male presenting with lumbar strain following lifting injury at work. MRI shows L4-L5 disc bulge. Conservative treatment recommended." That is enough for analysis without name, DOB, address, or treating physician.
Use the model to redact for you
Below is a claim narrative. Produce a de-identified version that:
- Replaces names with [INSURED], [CLAIMANT], [WITNESS_1]
- Replaces dates with relative dates (e.g., "DOL", "DOL+5")
- Replaces specific addresses with [ADDRESS]
- Keeps clinical and factual content intact
Narrative: [paste]
Run this once and use the redacted version for further analysis.
Summarize locally before sharing externally
If your carrier provides an internal AI tool, do the heavy lifting there. Use external tools only for generic drafting work that does not require the underlying data.
Red Flags That Should Stop You
If any of these apply, do not paste:
- The data includes a Social Security number, driver's license, or account number
- The data includes medical narratives or diagnosis codes
- The data includes named individuals connected to a specific event
- You are using a free / personal tier of any consumer AI
- Your employer has not approved the tool for the data category
What to Do Instead
- Ask your IT or compliance team about the approved AI tools at your organization
- Build a habit of de-identifying before you paste
- Use AI for the parts of the workflow that do not require sensitive data (templates, communications, training scenarios, research)
- Document which tool you used and what data went in if your carrier requires audit trails
Key Takeaways
- Insurance work touches PII, PHI, GLBA-covered financial data, and biometrics — all regulated.
- HIPAA, GLBA, NAIC Model 668, and state privacy laws shape what you can and cannot share with AI tools.
- Free tiers of consumer AI tools are not appropriate for regulated insurance data. Enterprise tiers and carrier-procured tools may be acceptable with the right contracts.
- De-identification — placeholders, redaction prompts, structural summaries — lets you still get value from AI without exposing protected data.

