Dataclasses and Dunder Methods
Writing a class just to hold a few fields gets tedious fast: you repeat every field name three times in __init__, then again to print it nicely. Python's dataclasses remove that boilerplate, and dunder methods ("double underscore" methods like __repr__) let your objects behave like built-in types. Together they make data-heavy code dramatically cleaner.
What You'll Learn
- How
@dataclassremoves boilerplate from data-holding classes - What dunder methods are and the most useful ones
- How to give objects sensible defaults and equality
- When a dataclass beats a plain dict
The Problem Dataclasses Solve
Here is a normal class that just holds data about a dataset:
class Dataset:
def __init__(self, name, rows, source):
self.name = name
self.rows = rows
self.source = source
d = Dataset("sales", 10000, "csv")
print(d) # <__main__.Dataset object at 0x10a...> (useless)
That print output is useless, and you typed every field name twice. A dataclass fixes both:
from dataclasses import dataclass
@dataclass
class Dataset:
name: str
rows: int
source: str = "csv" # default value
d = Dataset("sales", 10000)
print(d) # Dataset(name='sales', rows=10000, source='csv')
The @dataclass decorator auto-generates __init__, a readable __repr__, and __eq__ (so two datasets with the same fields compare as equal). You write the fields once. This is the go-to for config objects, records, and any "bag of named values" in data and AI pipelines.
Dunder Methods: How Objects Talk to Python
Dunder methods let Python's built-in syntax work on your objects. You rarely call them directly; Python calls them for you when you use print(), ==, len(), [], and so on.
You write the method; Python calls it from ordinary syntax.
| Criteria | Dunder method | Triggered by |
|---|---|---|
| __init__ | Creating the object | Dataset(...) |
| __repr__ | Showing the object | print(d), repr(d) |
| __eq__ | Comparing | d1 == d2 |
| __len__ | Length | len(d) |
Dunder method
- __init__
- Creating the object
- __repr__
- Showing the object
- __eq__
- Comparing
- __len__
- Length
Triggered by
- __init__
- Dataset(...)
- __repr__
- print(d), repr(d)
- __eq__
- d1 == d2
- __len__
- len(d)
Adding __len__ to a class lets len() work on it:
@dataclass
class Dataset:
name: str
rows: int
def __len__(self):
return self.rows
print(len(Dataset("sales", 10000))) # 10000
Defaults, Order, and Immutability
Two practical features you will use constantly:
- Defaults must come after non-default fields, exactly like function arguments.
frozen=Truemakes instances read-only, which is great for config you never want mutated by accident.
@dataclass(frozen=True)
class Config:
learning_rate: float = 0.001
epochs: int = 10
cfg = Config(epochs=20)
# cfg.epochs = 5 # raises an error: frozen dataclasses can't be changed
For mutable defaults like lists, do not write tags: list = []. Use field:
from dataclasses import dataclass, field
@dataclass
class Record:
tags: list = field(default_factory=list)
default_factory=list gives each new Record its own fresh list instead of sharing one.
Dataclass vs Dict
A dict is fine for throwaway data. A dataclass wins the moment you want: named fields you can autocomplete, a fixed shape, type hints your editor checks, and clean printing. In AI pipelines, a typed config dataclass catches a misspelled epcohs immediately, while a dict silently accepts it.
Try It
Run this, then add a frozen=True to the decorator and watch the mutation line fail.
Notice a and b have separate tags lists thanks to default_factory.
Key Takeaways
@dataclassauto-generates__init__,__repr__, and__eq__so you write each field once.- Dunder methods like
__repr__,__eq__, and__len__let built-in syntax work on your objects. - Defaults follow non-default fields; use
field(default_factory=list)for mutable defaults. frozen=Truemakes a dataclass read-only, ideal for config.- Prefer a dataclass over a dict when you want named fields, a fixed shape, and editor checks.

