Your First AI Prompts as a Data Analyst

Writing a useful prompt as a data analyst is different from writing a useful prompt as a marketer or a content creator. Your output needs to be technically correct. A prompt that says "write me a SQL query for user retention" will get you syntactically valid SQL that may or may not match how your company actually calculates retention.

This lesson gives you a framework designed for analyst work and eight ready-to-use prompt templates you can copy today.

What You'll Learn

The DATA framework for analyst-specific prompting
How to give AI the schema, business rules, and dialect context it needs
Eight copy-paste templates for common analyst tasks
How to iterate when the first output is close but not quite right

The DATA Framework

Generic prompt frameworks (STAR, RICE, and so on) do not cover what analysts need. Here is one built for us:

D — Dialect and dataset: Which SQL dialect? Which library version? What is the shape and schema of the data?

A — Assumptions and business rules: How is "active user" defined? What counts as a churned account? Which time zone is canonical?

T — Task: What exact output do you need — a query, a chart, a pandas one-liner, an explanation?

A — Acceptance criteria: How will you know the output is correct? Which rows should it match? What should the total equal?

DATA in Action

Compare these two prompts for the same question.

Weak prompt:

Write a SQL query to find top customers.

DATA prompt:

Dialect: BigQuery Standard SQL.

Dataset: Table sales.orders with columns order_id STRING, customer_id STRING, order_date DATE, revenue NUMERIC, status STRING. About 12 million rows. Refreshed daily at 02:00 UTC.

Assumptions: A "customer" is any customer_id that has at least one row with status = 'completed'. We ignore status IN ('cancelled', 'refunded'). "Top" means highest total completed revenue in the last full calendar quarter (Q1 2026: Jan 1 to Mar 31).

Task: Return the top 10 customers by total completed revenue for Q1 2026, with columns customer_id, total_revenue, order_count.

Acceptance: Results should sum to at most the known Q1 company revenue of $4.2M. A known top customer cust_8821 should appear in the list.

The second prompt produces a working BigQuery query with the right filters, the right grain, and a sanity check built in. The first prompt produces a guess.

Giving AI the Right Context

The top three mistakes analysts make when prompting:

Skipping schema. The AI will hallucinate column names. Always paste a DESCRIBE table or a 5-row SELECT * sample.
Skipping the dialect. PostgreSQL, BigQuery, Snowflake, Redshift, and MySQL have different functions (DATE_TRUNC vs TRUNC vs DATETIME_TRUNC). Tell the AI which one.
Skipping business rules. Every company has its own definition of "MAU," "ARR," or "session." Paste yours.

The Analyst Context Block

Drop this at the top of any new AI conversation:

Context:
- Stack: Snowflake, dbt, Tableau
- Data warehouse: analytics.prod
- My role: marketing analyst on the growth team
- Canonical time zone: UTC
- Business rules:
  - "Active user" = any user with at least one session in the last 28 days
  - "Paying customer" = any account with subscription_status = 'active' AND plan != 'free'
  - Fiscal year starts February 1
- When writing SQL: use CTEs, lowercase keywords, trailing commas, and explain each CTE's purpose in a comment.

Save this block somewhere. Paste it at the start of every new thread.

Eight Ready-to-Use Analyst Prompts

1. SQL query generation

Using the schema I pasted above, write a Snowflake SQL query that returns monthly active users for the last 12 full months. Use CTEs. Add a comment above each CTE explaining what it does.

2. SQL debugging

Here is a query that returns 14,223 rows but should return around 10,000. [Paste query]. [Paste 5 sample rows that show the duplication.] Identify the join that is causing fan-out and suggest a fix.

3. Data cleaning plan

I have a CSV with columns email, signup_date, country. [Paste the first 20 rows.] List every data-quality issue you can see (missing values, inconsistent formats, suspected duplicates) and propose a cleaning plan with the specific pandas code for each step.

4. Pandas one-liner

In pandas 2.2, I have a DataFrame df with columns user_id, event_name, timestamp. Give me a single line that returns the 10 users with the highest number of purchase events in the last 30 days.

5. Excel formula

In Google Sheets, I have columns A (date), B (product), C (revenue). Write a formula for cell E2 that returns the total revenue for product "Pro" in the month of March 2026.

6. Chart recommendation

I want to show how customer retention differs across three signup cohorts (Q1, Q2, Q3 2025) over their first 12 weeks. Recommend the right chart type and tell me exactly how to build it in Tableau.

7. Stakeholder email

Summarize this analysis for the CFO in a 120-word email. Lead with the number, explain the driver in plain language, and end with one recommendation. No jargon. [Paste your finding bullets.]

8. Statistical interpretation

I ran a two-sample t-test. Group A conversion = 3.2% (n=4,210), Group B = 3.7% (n=4,198), p-value = 0.048. Explain what this means for a non-technical PM, including the 95% confidence interval for the lift, and whether I should be cautious.

Iterating When the Output is Close but Wrong

AI rarely nails it on the first try. Three iteration moves that work:

"Show your work." Ask the AI to explain each step. Often the bug is obvious once it narrates the logic.

"Give me three variants." Useful for chart recommendations and SQL approaches. You see trade-offs and pick.

"Test against this known value." Tell the AI, "This query should return 1,847 for March 2025. Debug until it does."

Key Takeaways

Use the DATA framework: Dialect, Assumptions, Task, Acceptance criteria
Always paste schema, 5 sample rows, and your business rules
Save an analyst context block and reuse it in every conversation
Eight templates cover 80% of analyst tasks — adapt them, do not rewrite from scratch
Iterate by asking the AI to show work, produce variants, or match a known value

Your First AI Prompts as a Data Analyst

This lesson gives you a framework designed for analyst work and eight ready-to-use prompt templates you can copy today.

What You'll Learn

The DATA framework for analyst-specific prompting
How to give AI the schema, business rules, and dialect context it needs
Eight copy-paste templates for common analyst tasks
How to iterate when the first output is close but not quite right

The DATA Framework

Generic prompt frameworks (STAR, RICE, and so on) do not cover what analysts need. Here is one built for us:

D — Dialect and dataset: Which SQL dialect? Which library version? What is the shape and schema of the data?

A — Assumptions and business rules: How is "active user" defined? What counts as a churned account? Which time zone is canonical?

T — Task: What exact output do you need — a query, a chart, a pandas one-liner, an explanation?

A — Acceptance criteria: How will you know the output is correct? Which rows should it match? What should the total equal?

DATA in Action

Compare these two prompts for the same question.

Weak prompt:

Write a SQL query to find top customers.

DATA prompt:

Dialect: BigQuery Standard SQL.

Dataset: Table sales.orders with columns order_id STRING, customer_id STRING, order_date DATE, revenue NUMERIC, status STRING. About 12 million rows. Refreshed daily at 02:00 UTC.

Assumptions: A "customer" is any customer_id that has at least one row with status = 'completed'. We ignore status IN ('cancelled', 'refunded'). "Top" means highest total completed revenue in the last full calendar quarter (Q1 2026: Jan 1 to Mar 31).

Task: Return the top 10 customers by total completed revenue for Q1 2026, with columns customer_id, total_revenue, order_count.

Acceptance: Results should sum to at most the known Q1 company revenue of $4.2M. A known top customer cust_8821 should appear in the list.

The second prompt produces a working BigQuery query with the right filters, the right grain, and a sanity check built in. The first prompt produces a guess.

Giving AI the Right Context

The top three mistakes analysts make when prompting:

Skipping schema. The AI will hallucinate column names. Always paste a DESCRIBE table or a 5-row SELECT * sample.
Skipping the dialect. PostgreSQL, BigQuery, Snowflake, Redshift, and MySQL have different functions (DATE_TRUNC vs TRUNC vs DATETIME_TRUNC). Tell the AI which one.
Skipping business rules. Every company has its own definition of "MAU," "ARR," or "session." Paste yours.

The Analyst Context Block

Drop this at the top of any new AI conversation:

Context:
- Stack: Snowflake, dbt, Tableau
- Data warehouse: analytics.prod
- My role: marketing analyst on the growth team
- Canonical time zone: UTC
- Business rules:
  - "Active user" = any user with at least one session in the last 28 days
  - "Paying customer" = any account with subscription_status = 'active' AND plan != 'free'
  - Fiscal year starts February 1
- When writing SQL: use CTEs, lowercase keywords, trailing commas, and explain each CTE's purpose in a comment.

Save this block somewhere. Paste it at the start of every new thread.

Eight Ready-to-Use Analyst Prompts

1. SQL query generation

Using the schema I pasted above, write a Snowflake SQL query that returns monthly active users for the last 12 full months. Use CTEs. Add a comment above each CTE explaining what it does.

2. SQL debugging

Here is a query that returns 14,223 rows but should return around 10,000. [Paste query]. [Paste 5 sample rows that show the duplication.] Identify the join that is causing fan-out and suggest a fix.

3. Data cleaning plan

I have a CSV with columns email, signup_date, country. [Paste the first 20 rows.] List every data-quality issue you can see (missing values, inconsistent formats, suspected duplicates) and propose a cleaning plan with the specific pandas code for each step.

4. Pandas one-liner

In pandas 2.2, I have a DataFrame df with columns user_id, event_name, timestamp. Give me a single line that returns the 10 users with the highest number of purchase events in the last 30 days.

5. Excel formula

In Google Sheets, I have columns A (date), B (product), C (revenue). Write a formula for cell E2 that returns the total revenue for product "Pro" in the month of March 2026.

6. Chart recommendation

I want to show how customer retention differs across three signup cohorts (Q1, Q2, Q3 2025) over their first 12 weeks. Recommend the right chart type and tell me exactly how to build it in Tableau.

7. Stakeholder email

Summarize this analysis for the CFO in a 120-word email. Lead with the number, explain the driver in plain language, and end with one recommendation. No jargon. [Paste your finding bullets.]

8. Statistical interpretation

I ran a two-sample t-test. Group A conversion = 3.2% (n=4,210), Group B = 3.7% (n=4,198), p-value = 0.048. Explain what this means for a non-technical PM, including the 95% confidence interval for the lift, and whether I should be cautious.

Iterating When the Output is Close but Wrong

AI rarely nails it on the first try. Three iteration moves that work:

"Show your work." Ask the AI to explain each step. Often the bug is obvious once it narrates the logic.

"Give me three variants." Useful for chart recommendations and SQL approaches. You see trade-offs and pick.

"Test against this known value." Tell the AI, "This query should return 1,847 for March 2025. Debug until it does."

Key Takeaways

Use the DATA framework: Dialect, Assumptions, Task, Acceptance criteria
Always paste schema, 5 sample rows, and your business rules
Save an analyst context block and reuse it in every conversation
Eight templates cover 80% of analyst tasks — adapt them, do not rewrite from scratch
Iterate by asking the AI to show work, produce variants, or match a known value

Your First AI Prompts as a Data Analyst

What You'll Learn

The DATA Framework

DATA in Action

Giving AI the Right Context

The Analyst Context Block

Eight Ready-to-Use Analyst Prompts

1. SQL query generation

2. SQL debugging

3. Data cleaning plan

4. Pandas one-liner

5. Excel formula

6. Chart recommendation

7. Stakeholder email

8. Statistical interpretation

Iterating When the Output is Close but Wrong

Key Takeaways

Quiz

Questions & Answers

Your First AI Prompts as a Data Analyst

What You'll Learn

The DATA Framework

DATA in Action

Giving AI the Right Context

The Analyst Context Block

Eight Ready-to-Use Analyst Prompts

1. SQL query generation

2. SQL debugging

3. Data cleaning plan

4. Pandas one-liner

5. Excel formula

6. Chart recommendation

7. Stakeholder email

8. Statistical interpretation

Iterating When the Output is Close but Wrong

Key Takeaways

Quiz

Questions & Answers