How to write competent data classes in Python

Photo by the author

# Entry

Standard Python objects store attributes in instance dictionaries. They cannot be encrypted unless you implement hashing manually and they compare all attributes by default. This default behavior is reasonable, but not optimized for applications that create multiple instances or need objects as cache keys.

Data classes address these limitations through configuration rather than custom code. You can employ parameters to change how instances behave and how much memory they consume. Field-level settings also allow you to exclude attributes from comparisons, define sheltered defaults for modifiable values, or control how initialization works.

This article focuses on the key capabilities of data classes that improve performance and maintainability without increasing complexity.

You can find the code on GitHub.

# 1. Frozen data classes for hashability and security

Making data classes immutable provides hashability. This allows you to employ instances as dictionary keys or store them in sets as shown below:

from dataclasses import dataclass

@dataclass(frozen=True)
class CacheKey:
    user_id: int
    resource_type: str
    timestamp: int
    
cache = {}
key = CacheKey(user_id=42, resource_type="profile", timestamp=1698345600)
cache[key] = {"data": "expensive_computation_result"}

The frozen=True parameter makes all fields immutable after initialization and are automatically implemented __hash__(). Without this, you would encounter, among others: TypeError when trying to employ instances as dictionary keys.

This pattern is crucial for building caching layers, deduplication logic, or any data structure that requires mixable types. Immutability also prevents entire categories of errors in which state is modified unexpectedly.

# 2. Slots for improved memory performance

When you instantiate thousands of objects, your memory load increases quickly. Here is an example:

from dataclasses import dataclass

@dataclass(slots=True)
class Measurement:
    sensor_id: int
    temperature: float
    humidity: float

The slots=True parameter eliminates per instance __dict__ which Python usually creates. Instead of storing attributes in a dictionary, slots employ a more compact, fixed-size array.

For a straightforward data class like this, you save a few bytes per instance and get faster access to attributes. The disadvantage is that you cannot dynamically add recent attributes.

# 3. Custom equality with field parameters

Often not all fields are needed to participate in equality checks. This is especially true for metadata or timestamps, as in the following example:

from dataclasses import dataclass, field
from datetime import datetime

@dataclass
class User:
    user_id: int
    email: str
    last_login: datetime = field(compare=False)
    login_count: int = field(compare=False, default=0)

user1 = User(1, "alice@example.com", datetime.now(), 5)
user2 = User(1, "alice@example.com", datetime.now(), 10)
print(user1 == user2)

Exit:

The compare=False a parameter in the field excludes it from being automatically generated __eq__() method.

In this case, two users are considered equal if they have the same ID and email address, regardless of when they logged in and how many times. This prevents false inequality when comparing objects that represent the same logical unit but have different tracking metadata.

# 4. Factory functions with factory default

Using variable default values in function signatures is a Python is crazy. Data classes provide a neat solution:

from dataclasses import dataclass, field

@dataclass
class ShoppingCart:
    user_id: int
    items: list[str] = field(default_factory=list)
    metadata: dict = field(default_factory=dict)

cart1 = ShoppingCart(user_id=1)
cart2 = ShoppingCart(user_id=2)
cart1.items.append("laptop")
print(cart2.items)

The default_factory The parameter accepts a call that generates a recent default value for each instance. Without it, using items: list = [] would create one common list for all instances – a classic modifiable default bug!

This pattern works for lists, dictionaries, sets, and any mutable type. You can also pass custom factory functions for more convoluted initialization logic.

# 5. Post-initialization processing

Sometimes you need to output fields or validate data after it has been automatically generated __init__ runs. Here’s how you can achieve this with post_init hooks:

from dataclasses import dataclass, field

@dataclass
class Rectangle:
    width: float
    height: float
    area: float = field(init=False)
    
    def __post_init__(self):
        self.area = self.width * self.height
        if self.width <= 0 or self.height <= 0:
            raise ValueError("Dimensions must be positive")

rect = Rectangle(5.0, 3.0)
print(rect.area)

The __post_init__ The method runs immediately after generation __init__ ends. The init=False parameter on the area prevents it from becoming __init__ parameter.

This pattern is ideal for calculated fields, validation logic, or normalizing input. It can also be used to transform fields or establish invariants that depend on multiple fields.

# 6. Ordering with order parameters

Sometimes you need instances of data classes to be sortable. Here is an example:

from dataclasses import dataclass

@dataclass(order=True)
class Task:
    priority: int
    name: str
    
tasks = [
    Task(priority=3, name="Low priority task"),
    Task(priority=1, name="Critical bug fix"),
    Task(priority=2, name="Feature request")
]

sorted_tasks = sorted(tasks)
for task in sorted_tasks:
    print(f"{task.priority}: {task.name}")

Exit:

1: Critical bug fix
2: Feature request
3: Low priority task

The order=True parameter generates comparison methods (__lt__, __le__, __gt__, __ge__) based on the order of the fields. Fields are compared from left to right, so priority takes precedence over name in this example.

This feature allows you to sort your collection naturally without having to write custom comparison logic or key functions.

# 7. Ordering fields and InitVar

When the initialization logic requires values that should not become instance attributes, they can be used InitVaras shown below:

from dataclasses import dataclass, field, InitVar

@dataclass
class DatabaseConnection:
    host: str
    port: int
    ssl: InitVar[bool] = True
    connection_string: str = field(init=False)
    
    def __post_init__(self, ssl: bool):
        protocol = "https" if ssl else "http"
        self.connection_string = f"{protocol}://{self.host}:{self.port}"

conn = DatabaseConnection("localhost", 5432, ssl=True)
print(conn.connection_string)  
print(hasattr(conn, 'ssl'))

Exit:

https://localhost:5432
False

The InitVar the type hint denotes the parameter to which it is passed __init__ AND __post_init__ but it does not become a field. This will keep the instance neat while still allowing for convoluted initialization logic. The ssl flag affects the way we build connection strings, but it doesn't have to persist afterwards.

# When not to employ data classes

Data classes are not always the right tool. Don't employ data classes when:

You need convoluted inheritance hierarchies with custom ones __init__ logic on many levels
You build classes with meaningful behaviors and methods (employ regular classes for domain objects)
You need the validation, serialization, or parsing functions that libraries like Pydantic Or attributes provide
You are working with classes that have convoluted state management or lifecycle requirements

Data classes are best used as lightweight data containers rather than full-featured domain objects.

# Application

Writing competent data classes is about understanding how their options interact, not memorizing them all. Knowing When AND Why using each function is more essential than remembering each parameter.

As discussed in the article, using features such as immutability, slots, field customization, and post-initialization hooks enable you to write Python objects that are lean, predictable, and sheltered. These patterns assist prevent errors and reduce memory overhead without increasing complexity.

With this approach, data classes enable you to write neat, competent, and maintainable code. Content coding!

Bala Priya C is a software developer and technical writer from India. He likes working at the intersection of mathematics, programming, data analytics and content creation. Her areas of interest and specialization include DevOps, data analytics and natural language processing. She enjoys reading, writing, coding and coffee! He is currently working on learning and sharing his knowledge with the developer community by writing tutorials, guides, reviews, and more. Bala also creates intriguing resource overviews and coding tutorials.

Categories

How to write competent data classes in Python

# Entry

# 1. Frozen data classes for hashability and security

# 2. Slots for improved memory performance

# 3. Custom equality with field parameters

# 4. Factory functions with factory default

# 5. Post-initialization processing

# 6. Ordering with order parameters

# 7. Ordering fields and InitVar

# When not to employ data classes

# Application

Extra rush for scientists? Using artificial intelligence and quantum computing to generate fresh peptides

El Niño is already wreaking havoc on Pacific fisheries

Tuning explained for noobs (how pre-trained models learn up-to-date skills)

OpenAI is targeting families as ChatGPT moves deeper into households

China’s Tianwen-2 space probe encountered Earth’s quasi-moon

More News

Extra rush for scientists? Using artificial intelligence and quantum computing to generate fresh peptides

El Niño is already wreaking havoc on Pacific fisheries

Tuning explained for noobs (how pre-trained models learn up-to-date skills)

China’s Tianwen-2 space probe encountered Earth’s quasi-moon

Extra rush for scientists? Using artificial intelligence and quantum computing to generate fresh peptides

El Niño is already wreaking havoc on Pacific fisheries

Tuning explained for noobs (how pre-trained models learn up-to-date skills)