What AI Can and Cannot Do with Your Data

Before you let AI touch your company's data, you need a clear mental model of what these tools actually do, what they cannot do, and where they fail silently. An analyst who understands the failure modes ships trustworthy work. An analyst who does not ends up explaining a wrong number in a Monday QBR.

This lesson draws the lines.

What You'll Learn

What AI models do well and where they are genuinely excellent
Four failure modes every analyst should know by name
Privacy and data-handling rules for analyst AI use
A pre-flight checklist before sending data to any AI tool

Where AI Genuinely Excels

For analyst work, AI is reliably good at the following:

Syntactic translation. Converting SQL between dialects, rewriting a pandas chain into a SQL query, converting a Tableau calc into a Power BI DAX measure. The output is almost always structurally correct.
Pattern-heavy code. Boilerplate pandas pipelines, standard Jupyter setup cells, typical matplotlib or seaborn charts.
Summarization. Turning a 500-row result set into a two-paragraph summary, or reducing a 12-page methodology doc into an exec brief.
Explaining code. Given a 200-line SQL query you inherited, AI can explain what it does, CTE by CTE, faster than any human.
Brainstorming. Suggesting which chart type to use, which segmentation to try, which metric to track.
First drafts of everything. Emails, report narratives, data dictionaries, runbooks. The first draft may be 70% — but 70% is a huge head start.

The Four Failure Modes

These are the four ways AI will hurt your analyst work if you are not paying attention.

1. Hallucinated columns, tables, and functions

AI does not know your warehouse. If you ask for a query without giving it schema, it will make up plausible column names (user_signup_date, customer_type) that do not exist in your database. The query will fail to run, which is annoying but safe — you will catch it.

The dangerous version: the AI invents a column name that does exist in your database but means something different. Your query runs, returns a number, and looks fine. Only a careful spot-check reveals the number is wrong.

Defense: always paste schema. Always run the query and compare against a known value.

2. Silent numerical drift

AI sometimes gets arithmetic subtly wrong when reasoning about data in-chat (as opposed to running actual code). It may say "the average of these 10 numbers is 42" when the correct answer is 46. Without a calculator, LLMs estimate. For small numbers they estimate well. For mixed units, long number lists, or multi-step arithmetic, they drift.

Defense: when a number matters, run the calculation in a code interpreter or reproduce it yourself in SQL or pandas. Never quote an AI's in-chat arithmetic to a stakeholder.

3. Confident-sounding bad statistics

"This result is statistically significant" is a phrase AI will generate even when the test was not appropriate, the sample was biased, or the effect size was meaningless. Statistical reasoning has edge cases that language models mangle: peeking at p-values, multiple-comparison adjustments, base rate effects, small-sample t-tests.

Defense: treat AI statistical claims as a hypothesis to check, not a conclusion. Paste the numbers into a proper stats function and read the result yourself.

4. Outdated knowledge

Most AI tools have a training cutoff. If you ask "what's the latest version of pandas?" or "what new chart type did Tableau release last month?" you may get stale information. Worse, you may get a confident description of a feature that was renamed or removed.

Defense: for any version-sensitive question, check the official docs. For industry benchmarks, use Perplexity and cite the source.

Privacy and Data-Handling Rules

Not every AI tool handles your data the same way. Before sending data, confirm:

Is the tool a business / enterprise tier? Consumer ChatGPT, consumer Claude, and free Gemini tiers may use your data for training. ChatGPT Team, ChatGPT Enterprise, Claude for Work, and Gemini for Google Workspace do not train on your data by default.
Does your company have an approved AI tool list? Many employers now maintain one. Use the tool on the list — do not copy customer PII into a tool that is not vetted.
Is the data you are about to paste regulated? PHI, payment card data, EU personal data under GDPR, and similar categories have specific rules. When in doubt, anonymize or aggregate first.
Are you in the cloud tenant you think you are? Copilot inside Microsoft 365 respects your tenant's data boundaries; a personal ChatGPT account does not.

Safer substitutes

If you need AI help with sensitive data, these tricks usually work:

Replace real values with dummies. Change real customer names to Customer_A, Customer_B. The AI's logic will still be correct.
Share schema not rows. AI can write the query from a table description alone.
Use synthetic samples. Generate a plausible fake dataset that has the same columns and distribution, work with that, then run the final logic on the real data yourself.

Pre-Flight Checklist

Run this checklist before every AI task that involves company data:

Is this tool approved by my employer?
Is the tier I am using covered by a "no training" agreement?
Am I about to paste anything personally identifiable, financial, or regulated?
Have I scrubbed customer names, emails, and account IDs that are not needed?
Do I have a way to verify the answer (a known number, a SQL count, a spot-check)?
Will I read the generated code before running it in production?

If you cannot say yes to all six, stop. Either switch to an approved tool, anonymize the data, or do the task manually.

When to Not Use AI at All

There are cases where AI is the wrong tool:

Regulatory reporting where provenance of every number must be auditable.
Executive decisions on tiny samples — if n is small, the signal is fragile; an AI summary may smooth over nuance the exec needs.
Root-cause investigations during incidents where time to correct answer is critical; AI can lead you down a plausible wrong path for 20 minutes.

Key Takeaways

AI is excellent at translation, pattern code, summarization, and first drafts
Watch for hallucinated schema, silent arithmetic drift, bad statistics, outdated docs
Use business-tier tools; never paste sensitive data into consumer tiers
Anonymize, aggregate, or synthetic-sample if the data is regulated
Run the six-item pre-flight checklist before every data-involving prompt
AI is a collaborator, not an authority — you are still the accountable analyst

What AI Can and Cannot Do with Your Data

This lesson draws the lines.

What You'll Learn

What AI models do well and where they are genuinely excellent
Four failure modes every analyst should know by name
Privacy and data-handling rules for analyst AI use
A pre-flight checklist before sending data to any AI tool

Where AI Genuinely Excels

For analyst work, AI is reliably good at the following:

Syntactic translation. Converting SQL between dialects, rewriting a pandas chain into a SQL query, converting a Tableau calc into a Power BI DAX measure. The output is almost always structurally correct.
Pattern-heavy code. Boilerplate pandas pipelines, standard Jupyter setup cells, typical matplotlib or seaborn charts.
Summarization. Turning a 500-row result set into a two-paragraph summary, or reducing a 12-page methodology doc into an exec brief.
Explaining code. Given a 200-line SQL query you inherited, AI can explain what it does, CTE by CTE, faster than any human.
Brainstorming. Suggesting which chart type to use, which segmentation to try, which metric to track.
First drafts of everything. Emails, report narratives, data dictionaries, runbooks. The first draft may be 70% — but 70% is a huge head start.

The Four Failure Modes

These are the four ways AI will hurt your analyst work if you are not paying attention.

1. Hallucinated columns, tables, and functions

Defense: always paste schema. Always run the query and compare against a known value.

2. Silent numerical drift

Defense: when a number matters, run the calculation in a code interpreter or reproduce it yourself in SQL or pandas. Never quote an AI's in-chat arithmetic to a stakeholder.

3. Confident-sounding bad statistics

Defense: treat AI statistical claims as a hypothesis to check, not a conclusion. Paste the numbers into a proper stats function and read the result yourself.

4. Outdated knowledge

Defense: for any version-sensitive question, check the official docs. For industry benchmarks, use Perplexity and cite the source.

Privacy and Data-Handling Rules

Not every AI tool handles your data the same way. Before sending data, confirm:

Is the tool a business / enterprise tier? Consumer ChatGPT, consumer Claude, and free Gemini tiers may use your data for training. ChatGPT Team, ChatGPT Enterprise, Claude for Work, and Gemini for Google Workspace do not train on your data by default.
Does your company have an approved AI tool list? Many employers now maintain one. Use the tool on the list — do not copy customer PII into a tool that is not vetted.
Is the data you are about to paste regulated? PHI, payment card data, EU personal data under GDPR, and similar categories have specific rules. When in doubt, anonymize or aggregate first.
Are you in the cloud tenant you think you are? Copilot inside Microsoft 365 respects your tenant's data boundaries; a personal ChatGPT account does not.

Safer substitutes

If you need AI help with sensitive data, these tricks usually work:

Replace real values with dummies. Change real customer names to Customer_A, Customer_B. The AI's logic will still be correct.
Share schema not rows. AI can write the query from a table description alone.
Use synthetic samples. Generate a plausible fake dataset that has the same columns and distribution, work with that, then run the final logic on the real data yourself.

Pre-Flight Checklist

Run this checklist before every AI task that involves company data:

Is this tool approved by my employer?
Is the tier I am using covered by a "no training" agreement?
Am I about to paste anything personally identifiable, financial, or regulated?
Have I scrubbed customer names, emails, and account IDs that are not needed?
Do I have a way to verify the answer (a known number, a SQL count, a spot-check)?
Will I read the generated code before running it in production?

If you cannot say yes to all six, stop. Either switch to an approved tool, anonymize the data, or do the task manually.

When to Not Use AI at All

There are cases where AI is the wrong tool:

Regulatory reporting where provenance of every number must be auditable.
Executive decisions on tiny samples — if n is small, the signal is fragile; an AI summary may smooth over nuance the exec needs.
Root-cause investigations during incidents where time to correct answer is critical; AI can lead you down a plausible wrong path for 20 minutes.

Key Takeaways

AI is excellent at translation, pattern code, summarization, and first drafts
Watch for hallucinated schema, silent arithmetic drift, bad statistics, outdated docs
Use business-tier tools; never paste sensitive data into consumer tiers
Anonymize, aggregate, or synthetic-sample if the data is regulated
Run the six-item pre-flight checklist before every data-involving prompt
AI is a collaborator, not an authority — you are still the accountable analyst

What AI Can and Cannot Do with Your Data

What You'll Learn

Where AI Genuinely Excels

The Four Failure Modes

1. Hallucinated columns, tables, and functions

2. Silent numerical drift

3. Confident-sounding bad statistics

4. Outdated knowledge

Privacy and Data-Handling Rules

Safer substitutes

Pre-Flight Checklist

When to Not Use AI at All

Key Takeaways

Quiz

Questions & Answers

What AI Can and Cannot Do with Your Data

What You'll Learn

Where AI Genuinely Excels

The Four Failure Modes

1. Hallucinated columns, tables, and functions

2. Silent numerical drift

3. Confident-sounding bad statistics

4. Outdated knowledge

Privacy and Data-Handling Rules

Safer substitutes

Pre-Flight Checklist

When to Not Use AI at All

Key Takeaways

Quiz

Questions & Answers