Vectorization vs Loops and apply

The single biggest speed difference in Pandas comes from how you compute a column. Looping row by row, or even using apply, runs Python code once per row. Vectorized operations push the work down into fast compiled code that processes the whole column at once. For large data, the gap is enormous. This lesson shows the hierarchy of approaches and how to recognize when each belongs.

What You'll Learn

Why vectorized operations are far faster than row loops
How to rewrite a loop or apply as a vectorized expression
How to vectorize conditional logic
When apply is still a reasonable choice

The Speed Hierarchy

From slowest to fastest, the common ways to compute a new column are:

iterrows loopSlowest
apply(axis=1)Faster
VectorizedFastest

The reason is overhead. A loop and apply both call into Python once per row, paying interpreter cost every time. A vectorized expression hands the entire column to optimized array code that runs in a single pass.

The Slow Way: A Row Loop

This works, but it is the pattern to move away from on anything but tiny data.

Loading Pandas Playground...

The Fast Way: Vectorize

The same calculation as a single column expression. Pandas multiplies the two columns element-wise in compiled code.

Loading Pandas Playground...

The result is identical, the code is shorter, and on a million rows it is dramatically faster.

Vectorizing Conditional Logic

A common reason people reach for apply is per-row if/else logic. You usually do not need it. np.where handles a single condition, and np.select handles several.

Loading Pandas Playground...

Both run vectorized across the whole column, replacing what would otherwise be a slow per-row function.

String and Date Operations Vectorize Too

The .str and .dt accessors are vectorized. Use them instead of applying a Python function per row.

Loading Pandas Playground...

When apply Is Still Reasonable

apply is not forbidden. It is a fair choice when:

The logic genuinely cannot be expressed with vectorized operations or np.where / np.select.
The data is small enough that speed does not matter.
You are calling an external function (like a custom parser) that only works on one value at a time.

Even then, prefer applying to a single Series (df['col'].apply(func)) over apply(axis=1) across rows, which is the slowest form.

Exercise: Vectorize a Calculation

Loading Exercise...

Exercise: Vectorize a Condition

Loading Exercise...

Key Points

Vectorized column expressions run in fast compiled code; row loops and apply run Python per row
Rewrite arithmetic as direct column operations (df['a'] * df['b'])
Use np.where for one condition and np.select for several instead of per-row if/else
The .str and .dt accessors are vectorized; prefer them over apply
Reserve apply for logic that truly cannot be vectorized, and prefer Series apply over apply(axis=1)

Vectorization vs Loops and apply

What You'll Learn

Why vectorized operations are far faster than row loops
How to rewrite a loop or apply as a vectorized expression
How to vectorize conditional logic
When apply is still a reasonable choice

The Speed Hierarchy

From slowest to fastest, the common ways to compute a new column are:

iterrows loopSlowest
apply(axis=1)Faster
VectorizedFastest

The Slow Way: A Row Loop

This works, but it is the pattern to move away from on anything but tiny data.

Loading Pandas Playground...

The Fast Way: Vectorize

The same calculation as a single column expression. Pandas multiplies the two columns element-wise in compiled code.

Loading Pandas Playground...

The result is identical, the code is shorter, and on a million rows it is dramatically faster.

Vectorizing Conditional Logic

A common reason people reach for apply is per-row if/else logic. You usually do not need it. np.where handles a single condition, and np.select handles several.

Loading Pandas Playground...

Both run vectorized across the whole column, replacing what would otherwise be a slow per-row function.

String and Date Operations Vectorize Too

The .str and .dt accessors are vectorized. Use them instead of applying a Python function per row.

Loading Pandas Playground...

When apply Is Still Reasonable

apply is not forbidden. It is a fair choice when:

The logic genuinely cannot be expressed with vectorized operations or np.where / np.select.
The data is small enough that speed does not matter.
You are calling an external function (like a custom parser) that only works on one value at a time.

Even then, prefer applying to a single Series (df['col'].apply(func)) over apply(axis=1) across rows, which is the slowest form.

Exercise: Vectorize a Calculation

Loading Exercise...

Exercise: Vectorize a Condition

Loading Exercise...

Key Points

Vectorized column expressions run in fast compiled code; row loops and apply run Python per row
Rewrite arithmetic as direct column operations (df['a'] * df['b'])
Use np.where for one condition and np.select for several instead of per-row if/else
The .str and .dt accessors are vectorized; prefer them over apply
Reserve apply for logic that truly cannot be vectorized, and prefer Series apply over apply(axis=1)

Vectorization vs Loops and apply

What You'll Learn

The Speed Hierarchy

The Slow Way: A Row Loop

The Fast Way: Vectorize

Vectorizing Conditional Logic

String and Date Operations Vectorize Too

When apply Is Still Reasonable

Exercise: Vectorize a Calculation

Exercise: Vectorize a Condition

Key Points

Quiz

Questions & Answers

Vectorization vs Loops and apply

What You'll Learn

The Speed Hierarchy

The Slow Way: A Row Loop

The Fast Way: Vectorize

Vectorizing Conditional Logic

String and Date Operations Vectorize Too

When apply Is Still Reasonable

Exercise: Vectorize a Calculation

Exercise: Vectorize a Condition

Key Points

Quiz

Questions & Answers