Comprehensions That Read Like English
Comprehensions are Python's signature feature for transforming and filtering data in a single, readable line. Once they click, you will write less code that does more, and your data-wrangling steps will read almost like a sentence. This lesson takes you past the basics into the patterns you actually use on real data.
What You'll Learn
- List, dict, and set comprehensions and when to use each
- How to add filtering with
if - The readability line you should not cross
- Why a generator expression saves memory on big data
From Loop to Comprehension
Any simple "build a new list from an old one" loop can become a comprehension. Compare:
# The loop way
scores = [0.71, 0.86, 0.92, 0.55]
passed = []
for s in scores:
if s >= 0.8:
passed.append(s)
# The comprehension way
passed = [s for s in scores if s >= 0.8]
print(passed) # [0.86, 0.92]
Read it left to right: "give me s, for each s in scores, if s >= 0.8." The structure is always expression first, then for, then an optional if filter.
Transform and Filter at Once
The expression on the left can do work, not just copy:
names = [" Alice ", "BOB", " carol"]
clean = [n.strip().title() for n in names]
print(clean) # ['Alice', 'Bob', 'Carol']
This is the bread and butter of data cleaning: normalize, strip, convert types, all in one pass.
Dict and Set Comprehensions
The same syntax builds dicts and sets by changing the brackets.
words = ["model", "data", "model", "ai", "data"]
# set comprehension: unique values
unique = {w for w in words}
print(unique) # {'model', 'data', 'ai'}
# dict comprehension: word -> length
lengths = {w: len(w) for w in words}
print(lengths) # {'model': 5, 'data': 4, 'ai': 2}
Dict comprehensions are perfect for building lookup tables: map an ID to a record, a label to a count, a column name to a cleaned value.
Generator Expressions: The Memory-Saver
Swap the square brackets for parentheses and you get a generator expression. It does not build the whole list in memory; it produces values one at a time. On a million-row file this is the difference between fitting in memory and crashing.
# Builds the entire list in memory:
total = sum([x * 2 for x in range(1_000_000)])
# Streams one value at a time, far less memory:
total = sum(x * 2 for x in range(1_000_000))
When you only need to consume the values once (summing, looping, feeding another function), prefer the generator form. You will go deeper on this in the next lesson.
The Readability Line
Comprehensions are great until they are not. If you find yourself nesting two for clauses plus an if, or the line wraps past your screen, switch back to a regular loop. Clever one-liners that nobody can read are a net loss.
Decision
Should this be a comprehension?
- If Simple map and/or one filter
Yes, use a comprehension
- If Only consumed once, large data
Use a generator expression (parentheses)
- If Nested loops + conditions, or side effects
Use a regular for-loop instead
Readability beats cleverness
Try It
Run this, then change the filter so it keeps only words longer than 3 letters.
Key Takeaways
- Comprehensions follow the shape: expression, then
for, then an optionalif. - Change brackets to build lists
[], sets{}, or dicts{k: v}. - The left expression can transform values, which makes comprehensions ideal for data cleaning.
- A generator expression
(...)streams values one at a time and saves memory on big data. - If a comprehension stops being readable, use a regular loop instead.

