Loading Real Data with pandas

The app starts with data. In this lesson you load a dataset into pandas, look at it, and clean it just enough to be useful. We are not going deep on pandas here. We use only the handful of methods the app needs, because pandas is a tool in service of the project, not the subject of the project.

You can run every example below right in the page. The first run loads the Python engine, so give it a few seconds.

What You'll Learn

Read a CSV into a pandas DataFrame
Inspect a dataset quickly with head, shape, info, and describe
Select and filter the columns and rows you care about
Handle the two most common messes: missing values and wrong types
Build a small, reusable function that loads and cleans data

A DataFrame is just a table

A DataFrame is pandas' name for a table: rows and columns, like a spreadsheet. In the real app you will read it from a file with pd.read_csv("yourfile.csv"). Here in the playground there is no file system, so we create the same kind of table directly from a dictionary. Everything you learn applies identically to a loaded CSV.

Loading Python Playground...

Look before you leap

Before you do anything with data, look at it. Four methods cover almost every first glance.

df.head() shows the first few rows.
df.shape gives (rows, columns).
df.info() lists columns, their types, and how many values are missing.
df.describe() gives quick statistics for the numeric columns.

Loading Python Playground...

That describe() output (count, mean, min, max, and so on) is gold for the AI step later. A compact statistical summary is exactly the kind of small, information-dense slice you want to send to a model.

Select the columns you care about

You rarely need every column. Pick the ones that matter by passing a list of names.

Loading Python Playground...

Filter the rows you care about

Filtering uses a condition inside the square brackets. The condition produces True/False for each row, and pandas keeps the True rows.

Loading Python Playground...

Clean the two most common messes

Real CSVs are messy. The two problems you will hit constantly are missing values and numbers stored as text.

Missing values show up as NaN. You can drop those rows with dropna() or fill them with fillna().

Wrong types happen when a number column arrives as text (often because of stray symbols). pd.to_numeric fixes it, and errors="coerce" turns anything unconvertible into NaN so it does not crash.

Loading Python Playground...

Notice the "Chen" row disappeared (its revenue could not be parsed) and the "Ben" row disappeared (missing region). That is the point: cleaning decides what the rest of your app gets to trust.

Wrap it in a function

Professional code packages a job into a function so you can reuse and test it. Here is a small load_and_clean that takes a DataFrame, fixes types, drops bad rows, and returns the clean result. In the real app, the first line would read from a file instead.

Loading Python Playground...

In your real project, the only change is the source. Instead of building raw by hand you would write raw = pd.read_csv("sales.csv") and then call load_and_clean(raw, ...) exactly as above. That clean DataFrame is what we will summarize and hand to the AI model in the next lessons.

Key Takeaways

A DataFrame is a table; load real files with pd.read_csv("file.csv").
Always look first: head, shape, info, describe.
Select columns with df[["a", "b"]] and filter rows with a boolean condition like df[df["x"] > 0].
The two big cleanups are missing values (dropna / fillna) and bad types (pd.to_numeric(..., errors="coerce")).
Package load-and-clean into a function so the rest of the app gets clean, trustworthy data.

Loading Real Data with pandas

You can run every example below right in the page. The first run loads the Python engine, so give it a few seconds.

What You'll Learn

Read a CSV into a pandas DataFrame
Inspect a dataset quickly with head, shape, info, and describe
Select and filter the columns and rows you care about
Handle the two most common messes: missing values and wrong types
Build a small, reusable function that loads and cleans data

A DataFrame is just a table

Loading Python Playground...

Look before you leap

Before you do anything with data, look at it. Four methods cover almost every first glance.

df.head() shows the first few rows.
df.shape gives (rows, columns).
df.info() lists columns, their types, and how many values are missing.
df.describe() gives quick statistics for the numeric columns.

Loading Python Playground...

Select the columns you care about

You rarely need every column. Pick the ones that matter by passing a list of names.

Loading Python Playground...

Filter the rows you care about

Filtering uses a condition inside the square brackets. The condition produces True/False for each row, and pandas keeps the True rows.

Loading Python Playground...

Clean the two most common messes

Real CSVs are messy. The two problems you will hit constantly are missing values and numbers stored as text.

Missing values show up as NaN. You can drop those rows with dropna() or fill them with fillna().

Loading Python Playground...

Notice the "Chen" row disappeared (its revenue could not be parsed) and the "Ben" row disappeared (missing region). That is the point: cleaning decides what the rest of your app gets to trust.

Wrap it in a function

Loading Python Playground...

Key Takeaways

A DataFrame is a table; load real files with pd.read_csv("file.csv").
Always look first: head, shape, info, describe.
Select columns with df[["a", "b"]] and filter rows with a boolean condition like df[df["x"] > 0].
The two big cleanups are missing values (dropna / fillna) and bad types (pd.to_numeric(..., errors="coerce")).
Package load-and-clean into a function so the rest of the app gets clean, trustworthy data.

Loading Real Data with pandas

What You'll Learn

A DataFrame is just a table

Look before you leap

Select the columns you care about

Filter the rows you care about

Clean the two most common messes

Wrap it in a function

Key Takeaways

Quiz

Questions & Answers

Loading Real Data with pandas

What You'll Learn

A DataFrame is just a table

Look before you leap

Select the columns you care about

Filter the rows you care about

Clean the two most common messes

Wrap it in a function

Key Takeaways

Quiz

Questions & Answers