Turning Data into Insight
You can load data, and you can call a model. This lesson connects them. The trick is not to throw raw rows at the AI. Instead you build a small, information-dense text summary of the data and send that, along with a clear instruction. Done well, the model returns a genuinely useful written analysis. Done badly, it hallucinates or chokes on noise.
The summarizing step is pure pandas and string work, so you can run it right here in the playground. The actual AI call still happens on your machine (it needs your key), so that part stays as code to read.
What You'll Learn
- Why you summarize data before sending it, never dump raw rows
- How to turn a DataFrame into a compact text brief
- A reliable prompt pattern for data analysis
- How to combine the summary and the prompt into one call
- How to keep the model honest about what the data does and does not say
Why summarize first
Three reasons you almost never send raw rows:
- Cost and speed. Large inputs cost more and take longer. A summary is tiny.
- Focus. Models reason better over a clean brief than over thousands of noisy rows.
- Truthfulness. If you hand the model the exact numbers (totals, averages, top groups), it has less room to invent. You are doing the arithmetic in pandas, which is reliable, and asking the model only to interpret.
So the division of labor is: pandas computes the facts, the model explains them.
Build a text summary of the data
Here is a summarize_dataframe function. It pulls together the shape, the column types, the numeric statistics, and a few sample rows into one readable string. Run it and read the output, that string is what the model will see.
That brief is small, readable, and full of real numbers. It is exactly what you want to hand a model. Notice how much signal is packed into a few lines: counts, means, ranges, and a sample of the actual records.
Add a group breakdown when it helps
For tabular data, a per-group total often unlocks the best analysis. A single groupby line gives the model the comparison it needs.
You can fold this straight into the brief so the model sees the breakdown alongside the overall stats.
The prompt pattern for data analysis
A good data prompt has four parts, in this order:
- Role. Tell the model what it is. ("You are a precise data analyst.")
- Task. State exactly what you want. ("Summarize the key findings in 4 to 6 bullet points.")
- The data brief. Paste in the summary you built.
- Guardrails. Tell it what not to do. ("Only use numbers present in the brief. If something is not in the data, say so.")
Putting the guardrail in writing is what keeps the analysis grounded. Without it, models tend to fill gaps with plausible-sounding but invented detail.
Assemble the prompt
This function takes your data brief and builds the full prompt string. It is pure string work, so it runs in the playground. Read the assembled prompt, this is the precise text the model receives.
Make the AI call (on your machine)
Now combine everything with the ask_ai function from the previous lesson. This part needs your key, so run it locally rather than in the playground.
# Assumes ask_ai(prompt, system=...) from the previous lesson is available,
# and summarize_dataframe / build_analysis_prompt from above.
def analyze_dataframe(df, question="What are the key findings?"):
brief = summarize_dataframe(df)
prompt = build_analysis_prompt(brief, question)
return ask_ai(
prompt,
system="You are a precise data analyst. Never invent numbers.",
max_tokens=600,
)
# Example
analysis = analyze_dataframe(df, question="Which region is underperforming and why?")
print(analysis)
That is the whole insight engine: summarize_dataframe builds the facts, build_analysis_prompt frames the request with guardrails, and ask_ai does the call. You now have the two functions the app needs: a clean DataFrame in, a written analysis out.
Keep the model honest
A few habits that pay off:
- Compute, do not ask. Any number that matters (totals, averages, growth) should be computed in pandas and placed in the brief, not left for the model to derive.
- Name the unknowns. If a question cannot be answered from the data, you want the model to say so. The guardrail line makes that the expected behavior.
- Keep briefs small. If your data is huge, summarize harder (more
groupby, fewer sample rows) rather than sending more text.
Key Takeaways
- Summarize the data into a compact text brief; never dump raw rows at the model.
- Let pandas compute the facts and let the model interpret them.
- Use the four-part prompt pattern: role, task, data brief, guardrails.
- A
groupbybreakdown often gives the model exactly the comparison it needs. - The insight engine is three functions:
summarize_dataframe,build_analysis_prompt, andask_ai.

