Reducing DataFrame Memory
By default Pandas reads numbers into wide types: 64-bit integers and 64-bit floats. Many real columns do not need that range. Downcasting numeric columns to smaller types, alongside the category dtype for text, can cut a DataFrame's memory footprint dramatically, which means faster operations and the ability to load larger datasets.
What You'll Learn
- How to inspect a DataFrame's memory usage
- How to downcast integer and float columns safely
- How to combine downcasting with the category dtype
- A repeatable routine for shrinking any DataFrame
Inspecting Memory Usage
Start by measuring. info(memory_usage='deep') shows per-column dtypes and the total footprint, and memory_usage(deep=True) returns the bytes per column.
The id column is stored as int64 and score as float64, both far wider than this data needs.
Downcasting Numbers
pd.to_numeric with downcast='integer' or downcast='float' picks the smallest type that still holds the data without losing information. An int64 column with small values can often drop to int16 or int8.
The integer column shrinks to the smallest signed type that fits, and the float column moves from float64 to float32.
Combining With Category
The biggest wins come from downcasting numbers and converting low-cardinality text to category together. Here is the before-and-after on a small frame.
A Repeatable Shrink Routine
For any DataFrame, a simple loop downcasts every numeric column and converts low-cardinality object columns. Adjust the cardinality threshold to taste.
Watch the Range
Downcasting is only safe when the smaller type can hold every value. If a future value exceeds the range of an int8, it will overflow. Downcast based on the realistic maximum the column can hold, not just the current sample.
Exercise: Downcast an Integer Column
Exercise: Measure the Savings
Key Points
- Measure first with
info(memory_usage='deep')ormemory_usage(deep=True) pd.to_numeric(col, downcast='integer'|'float')picks the smallest safe numeric type- Combine numeric downcasting with the category dtype for the biggest savings
- A simple per-column loop makes the routine repeatable across any DataFrame
- Downcast based on the realistic value range, not just the current sample, to avoid overflow

