Dataclasses and Dunder Methods

Writing a class just to hold a few fields gets tedious fast: you repeat every field name three times in __init__, then again to print it nicely. Python's dataclasses remove that boilerplate, and dunder methods ("double underscore" methods like __repr__) let your objects behave like built-in types. Together they make data-heavy code dramatically cleaner.

What You'll Learn

How @dataclass removes boilerplate from data-holding classes
What dunder methods are and the most useful ones
How to give objects sensible defaults and equality
When a dataclass beats a plain dict

The Problem Dataclasses Solve

Here is a normal class that just holds data about a dataset:

class Dataset:
    def __init__(self, name, rows, source):
        self.name = name
        self.rows = rows
        self.source = source

d = Dataset("sales", 10000, "csv")
print(d)   # <__main__.Dataset object at 0x10a...>  (useless)

That print output is useless, and you typed every field name twice. A dataclass fixes both:

from dataclasses import dataclass

@dataclass
class Dataset:
    name: str
    rows: int
    source: str = "csv"     # default value

d = Dataset("sales", 10000)
print(d)   # Dataset(name='sales', rows=10000, source='csv')

The @dataclass decorator auto-generates __init__, a readable __repr__, and __eq__ (so two datasets with the same fields compare as equal). You write the fields once. This is the go-to for config objects, records, and any "bag of named values" in data and AI pipelines.

Dunder Methods: How Objects Talk to Python

Dunder methods let Python's built-in syntax work on your objects. You rarely call them directly; Python calls them for you when you use print(), ==, len(), [], and so on.

You write the method; Python calls it from ordinary syntax.

You write the method; Python calls it from ordinary syntax.
Criteria	Dunder method	Triggered by
__init__	Creating the object	Dataset(...)
__repr__	Showing the object	print(d), repr(d)
__eq__	Comparing	d1 == d2
__len__	Length	len(d)

Dunder method

__init__: Creating the object
__repr__: Showing the object
__eq__: Comparing
__len__: Length

Triggered by

__init__: Dataset(...)
__repr__: print(d), repr(d)
__eq__: d1 == d2
__len__: len(d)

Adding __len__ to a class lets len() work on it:

@dataclass
class Dataset:
    name: str
    rows: int

    def __len__(self):
        return self.rows

print(len(Dataset("sales", 10000)))   # 10000

Defaults, Order, and Immutability

Two practical features you will use constantly:

Defaults must come after non-default fields, exactly like function arguments.
frozen=True makes instances read-only, which is great for config you never want mutated by accident.

@dataclass(frozen=True)
class Config:
    learning_rate: float = 0.001
    epochs: int = 10

cfg = Config(epochs=20)
# cfg.epochs = 5   # raises an error: frozen dataclasses can't be changed

For mutable defaults like lists, do not write tags: list = []. Use field:

from dataclasses import dataclass, field

@dataclass
class Record:
    tags: list = field(default_factory=list)

default_factory=list gives each new Record its own fresh list instead of sharing one.

Dataclass vs Dict

A dict is fine for throwaway data. A dataclass wins the moment you want: named fields you can autocomplete, a fixed shape, type hints your editor checks, and clean printing. In AI pipelines, a typed config dataclass catches a misspelled epcohs immediately, while a dict silently accepts it.

Try It

Run this, then add a frozen=True to the decorator and watch the mutation line fail.

Loading Python Playground...

Notice a and b have separate tags lists thanks to default_factory.

Key Takeaways

@dataclass auto-generates __init__, __repr__, and __eq__ so you write each field once.
Dunder methods like __repr__, __eq__, and __len__ let built-in syntax work on your objects.
Defaults follow non-default fields; use field(default_factory=list) for mutable defaults.
frozen=True makes a dataclass read-only, ideal for config.
Prefer a dataclass over a dict when you want named fields, a fixed shape, and editor checks.

Dataclasses and Dunder Methods

What You'll Learn

How @dataclass removes boilerplate from data-holding classes
What dunder methods are and the most useful ones
How to give objects sensible defaults and equality
When a dataclass beats a plain dict

The Problem Dataclasses Solve

Here is a normal class that just holds data about a dataset:

class Dataset:
    def __init__(self, name, rows, source):
        self.name = name
        self.rows = rows
        self.source = source

d = Dataset("sales", 10000, "csv")
print(d)   # <__main__.Dataset object at 0x10a...>  (useless)

That print output is useless, and you typed every field name twice. A dataclass fixes both:

from dataclasses import dataclass

@dataclass
class Dataset:
    name: str
    rows: int
    source: str = "csv"     # default value

d = Dataset("sales", 10000)
print(d)   # Dataset(name='sales', rows=10000, source='csv')

Dunder Methods: How Objects Talk to Python

Dunder methods let Python's built-in syntax work on your objects. You rarely call them directly; Python calls them for you when you use print(), ==, len(), [], and so on.

You write the method; Python calls it from ordinary syntax.

You write the method; Python calls it from ordinary syntax.
Criteria	Dunder method	Triggered by
__init__	Creating the object	Dataset(...)
__repr__	Showing the object	print(d), repr(d)
__eq__	Comparing	d1 == d2
__len__	Length	len(d)

Dunder method

__init__: Creating the object
__repr__: Showing the object
__eq__: Comparing
__len__: Length

Triggered by

__init__: Dataset(...)
__repr__: print(d), repr(d)
__eq__: d1 == d2
__len__: len(d)

Adding __len__ to a class lets len() work on it:

@dataclass
class Dataset:
    name: str
    rows: int

    def __len__(self):
        return self.rows

print(len(Dataset("sales", 10000)))   # 10000

Defaults, Order, and Immutability

Two practical features you will use constantly:

Defaults must come after non-default fields, exactly like function arguments.
frozen=True makes instances read-only, which is great for config you never want mutated by accident.

@dataclass(frozen=True)
class Config:
    learning_rate: float = 0.001
    epochs: int = 10

cfg = Config(epochs=20)
# cfg.epochs = 5   # raises an error: frozen dataclasses can't be changed

For mutable defaults like lists, do not write tags: list = []. Use field:

from dataclasses import dataclass, field

@dataclass
class Record:
    tags: list = field(default_factory=list)

default_factory=list gives each new Record its own fresh list instead of sharing one.

Dataclass vs Dict

Try It

Run this, then add a frozen=True to the decorator and watch the mutation line fail.

Loading Python Playground...

Notice a and b have separate tags lists thanks to default_factory.

Key Takeaways

@dataclass auto-generates __init__, __repr__, and __eq__ so you write each field once.
Dunder methods like __repr__, __eq__, and __len__ let built-in syntax work on your objects.
Defaults follow non-default fields; use field(default_factory=list) for mutable defaults.
frozen=True makes a dataclass read-only, ideal for config.
Prefer a dataclass over a dict when you want named fields, a fixed shape, and editor checks.

Dataclasses and Dunder Methods

What You'll Learn

The Problem Dataclasses Solve

Dunder Methods: How Objects Talk to Python

Dunder method

Triggered by

Defaults, Order, and Immutability

Dataclass vs Dict

Try It

Key Takeaways

Quiz

Questions & Answers

Dataclasses and Dunder Methods

What You'll Learn

The Problem Dataclasses Solve

Dunder Methods: How Objects Talk to Python

Dunder method

Triggered by

Defaults, Order, and Immutability

Dataclass vs Dict

Try It

Key Takeaways

Quiz

Questions & Answers