Handling Duplicates
Finding and removing duplicate rows is an important data cleaning step.
Finding Duplicates
Use duplicated() to identify duplicate rows:
Loading Pandas Playground...
Keeping First vs Last
Control which duplicate is marked:
Loading Pandas Playground...
Checking Specific Columns
Check duplicates in subset of columns:
Loading Pandas Playground...
Dropping Duplicates
Remove duplicate rows with drop_duplicates():
Loading Pandas Playground...
Dropping by Specific Columns
Keep unique combinations of selected columns:
Loading Pandas Playground...
Getting Duplicate Rows
Extract the actual duplicates:
Loading Pandas Playground...
Exercise: Remove Duplicates
Loading Exercise...
Exercise: Unique by Column
Loading Exercise...
Key Points
duplicated()returns boolean Serieskeep='first'marks later occurrences (default)keep='last'marks earlier occurrenceskeep=Falsemarks all duplicatessubset=checks specific columns onlydrop_duplicates()removes duplicate rows

