Your First AI Prompts as a Data Analyst
Writing a useful prompt as a data analyst is different from writing a useful prompt as a marketer or a content creator. Your output needs to be technically correct. A prompt that says "write me a SQL query for user retention" will get you syntactically valid SQL that may or may not match how your company actually calculates retention.
This lesson gives you a framework designed for analyst work and eight ready-to-use prompt templates you can copy today.
What You'll Learn
- The DATA framework for analyst-specific prompting
- How to give AI the schema, business rules, and dialect context it needs
- Eight copy-paste templates for common analyst tasks
- How to iterate when the first output is close but not quite right
The DATA Framework
Generic prompt frameworks (STAR, RICE, and so on) do not cover what analysts need. Here is one built for us:
D — Dialect and dataset: Which SQL dialect? Which library version? What is the shape and schema of the data?
A — Assumptions and business rules: How is "active user" defined? What counts as a churned account? Which time zone is canonical?
T — Task: What exact output do you need — a query, a chart, a pandas one-liner, an explanation?
A — Acceptance criteria: How will you know the output is correct? Which rows should it match? What should the total equal?
DATA in Action
Compare these two prompts for the same question.
Weak prompt:
Write a SQL query to find top customers.
DATA prompt:
Dialect: BigQuery Standard SQL.
Dataset: Table
sales.orderswith columnsorder_id STRING,customer_id STRING,order_date DATE,revenue NUMERIC,status STRING. About 12 million rows. Refreshed daily at 02:00 UTC.Assumptions: A "customer" is any
customer_idthat has at least one row withstatus = 'completed'. We ignorestatus IN ('cancelled', 'refunded'). "Top" means highest total completed revenue in the last full calendar quarter (Q1 2026: Jan 1 to Mar 31).Task: Return the top 10 customers by total completed revenue for Q1 2026, with columns
customer_id,total_revenue,order_count.Acceptance: Results should sum to at most the known Q1 company revenue of $4.2M. A known top customer
cust_8821should appear in the list.
The second prompt produces a working BigQuery query with the right filters, the right grain, and a sanity check built in. The first prompt produces a guess.
Giving AI the Right Context
The top three mistakes analysts make when prompting:
- Skipping schema. The AI will hallucinate column names. Always paste a
DESCRIBE tableor a 5-rowSELECT *sample. - Skipping the dialect. PostgreSQL, BigQuery, Snowflake, Redshift, and MySQL have different functions (
DATE_TRUNCvsTRUNCvsDATETIME_TRUNC). Tell the AI which one. - Skipping business rules. Every company has its own definition of "MAU," "ARR," or "session." Paste yours.
The Analyst Context Block
Drop this at the top of any new AI conversation:
Context:
- Stack: Snowflake, dbt, Tableau
- Data warehouse: analytics.prod
- My role: marketing analyst on the growth team
- Canonical time zone: UTC
- Business rules:
- "Active user" = any user with at least one session in the last 28 days
- "Paying customer" = any account with subscription_status = 'active' AND plan != 'free'
- Fiscal year starts February 1
- When writing SQL: use CTEs, lowercase keywords, trailing commas, and explain each CTE's purpose in a comment.
Save this block somewhere. Paste it at the start of every new thread.
Eight Ready-to-Use Analyst Prompts
1. SQL query generation
Using the schema I pasted above, write a Snowflake SQL query that returns monthly active users for the last 12 full months. Use CTEs. Add a comment above each CTE explaining what it does.
2. SQL debugging
Here is a query that returns 14,223 rows but should return around 10,000. [Paste query]. [Paste 5 sample rows that show the duplication.] Identify the join that is causing fan-out and suggest a fix.
3. Data cleaning plan
I have a CSV with columns
signup_date,country. [Paste the first 20 rows.] List every data-quality issue you can see (missing values, inconsistent formats, suspected duplicates) and propose a cleaning plan with the specific pandas code for each step.
4. Pandas one-liner
In pandas 2.2, I have a DataFrame
dfwith columnsuser_id,event_name,timestamp. Give me a single line that returns the 10 users with the highest number ofpurchaseevents in the last 30 days.
5. Excel formula
In Google Sheets, I have columns A (date), B (product), C (revenue). Write a formula for cell E2 that returns the total revenue for product "Pro" in the month of March 2026.
6. Chart recommendation
I want to show how customer retention differs across three signup cohorts (Q1, Q2, Q3 2025) over their first 12 weeks. Recommend the right chart type and tell me exactly how to build it in Tableau.
7. Stakeholder email
Summarize this analysis for the CFO in a 120-word email. Lead with the number, explain the driver in plain language, and end with one recommendation. No jargon. [Paste your finding bullets.]
8. Statistical interpretation
I ran a two-sample t-test. Group A conversion = 3.2% (n=4,210), Group B = 3.7% (n=4,198), p-value = 0.048. Explain what this means for a non-technical PM, including the 95% confidence interval for the lift, and whether I should be cautious.
Iterating When the Output is Close but Wrong
AI rarely nails it on the first try. Three iteration moves that work:
"Show your work." Ask the AI to explain each step. Often the bug is obvious once it narrates the logic.
"Give me three variants." Useful for chart recommendations and SQL approaches. You see trade-offs and pick.
"Test against this known value." Tell the AI, "This query should return 1,847 for March 2025. Debug until it does."
Key Takeaways
- Use the DATA framework: Dialect, Assumptions, Task, Acceptance criteria
- Always paste schema, 5 sample rows, and your business rules
- Save an analyst context block and reuse it in every conversation
- Eight templates cover 80% of analyst tasks — adapt them, do not rewrite from scratch
- Iterate by asking the AI to show work, produce variants, or match a known value

