
Photo by the author
# Entry
Standard Python objects store attributes in instance dictionaries. They cannot be encrypted unless you implement hashing manually and they compare all attributes by default. This default behavior is reasonable, but not optimized for applications that create multiple instances or need objects as cache keys.
Data classes address these limitations through configuration rather than custom code. You can employ parameters to change how instances behave and how much memory they consume. Field-level settings also allow you to exclude attributes from comparisons, define sheltered defaults for modifiable values, or control how initialization works.
This article focuses on the key capabilities of data classes that improve performance and maintainability without increasing complexity.
You can find the code on GitHub.
# 1. Frozen data classes for hashability and security
Making data classes immutable provides hashability. This allows you to employ instances as dictionary keys or store them in sets as shown below:
from dataclasses import dataclass
@dataclass(frozen=True)
class CacheKey:
user_id: int
resource_type: str
timestamp: int
cache = {}
key = CacheKey(user_id=42, resource_type="profile", timestamp=1698345600)
cache[key] = {"data": "expensive_computation_result"}
The frozen=True parameter makes all fields immutable after initialization and are automatically implemented __hash__(). Without this, you would encounter, among others: TypeError when trying to employ instances as dictionary keys.
This pattern is crucial for building caching layers, deduplication logic, or any data structure that requires mixable types. Immutability also prevents entire categories of errors in which state is modified unexpectedly.
# 2. Slots for improved memory performance
When you instantiate thousands of objects, your memory load increases quickly. Here is an example:
from dataclasses import dataclass
@dataclass(slots=True)
class Measurement:
sensor_id: int
temperature: float
humidity: float
The slots=True parameter eliminates per instance __dict__ which Python usually creates. Instead of storing attributes in a dictionary, slots employ a more compact, fixed-size array.
For a straightforward data class like this, you save a few bytes per instance and get faster access to attributes. The disadvantage is that you cannot dynamically add recent attributes.
# 3. Custom equality with field parameters
Often not all fields are needed to participate in equality checks. This is especially true for metadata or timestamps, as in the following example:
from dataclasses import dataclass, field
from datetime import datetime
@dataclass
class User:
user_id: int
email: str
last_login: datetime = field(compare=False)
login_count: int = field(compare=False, default=0)
user1 = User(1, "alice@example.com", datetime.now(), 5)
user2 = User(1, "alice@example.com", datetime.now(), 10)
print(user1 == user2)
Exit:
The compare=False a parameter in the field excludes it from being automatically generated __eq__() method.
In this case, two users are considered equal if they have the same ID and email address, regardless of when they logged in and how many times. This prevents false inequality when comparing objects that represent the same logical unit but have different tracking metadata.
# 4. Factory functions with factory default
Using variable default values in function signatures is a Python is crazy. Data classes provide a neat solution:
from dataclasses import dataclass, field
@dataclass
class ShoppingCart:
user_id: int
items: list[str] = field(default_factory=list)
metadata: dict = field(default_factory=dict)
cart1 = ShoppingCart(user_id=1)
cart2 = ShoppingCart(user_id=2)
cart1.items.append("laptop")
print(cart2.items)
The default_factory The parameter accepts a call that generates a recent default value for each instance. Without it, using items: list = [] would create one common list for all instances – a classic modifiable default bug!
This pattern works for lists, dictionaries, sets, and any mutable type. You can also pass custom factory functions for more convoluted initialization logic.
# 5. Post-initialization processing
Sometimes you need to output fields or validate data after it has been automatically generated __init__ runs. Here’s how you can achieve this with post_init hooks:
from dataclasses import dataclass, field
@dataclass
class Rectangle:
width: float
height: float
area: float = field(init=False)
def __post_init__(self):
self.area = self.width * self.height
if self.width <= 0 or self.height <= 0:
raise ValueError("Dimensions must be positive")
rect = Rectangle(5.0, 3.0)
print(rect.area)
The __post_init__ The method runs immediately after generation __init__ ends. The init=False parameter on the area prevents it from becoming __init__ parameter.
This pattern is ideal for calculated fields, validation logic, or normalizing input. It can also be used to transform fields or establish invariants that depend on multiple fields.
# 6. Ordering with order parameters
Sometimes you need instances of data classes to be sortable. Here is an example:
from dataclasses import dataclass
@dataclass(order=True)
class Task:
priority: int
name: str
tasks = [
Task(priority=3, name="Low priority task"),
Task(priority=1, name="Critical bug fix"),
Task(priority=2, name="Feature request")
]
sorted_tasks = sorted(tasks)
for task in sorted_tasks:
print(f"{task.priority}: {task.name}")
Exit:
1: Critical bug fix
2: Feature request
3: Low priority task
The order=True parameter generates comparison methods (__lt__, __le__, __gt__, __ge__) based on the order of the fields. Fields are compared from left to right, so priority takes precedence over name in this example.
This feature allows you to sort your collection naturally without having to write custom comparison logic or key functions.
# 7. Ordering fields and InitVar
When the initialization logic requires values that should not become instance attributes, they can be used InitVaras shown below:
from dataclasses import dataclass, field, InitVar
@dataclass
class DatabaseConnection:
host: str
port: int
ssl: InitVar[bool] = True
connection_string: str = field(init=False)
def __post_init__(self, ssl: bool):
protocol = "https" if ssl else "http"
self.connection_string = f"{protocol}://{self.host}:{self.port}"
conn = DatabaseConnection("localhost", 5432, ssl=True)
print(conn.connection_string)
print(hasattr(conn, 'ssl'))
Exit:
https://localhost:5432
False
The InitVar the type hint denotes the parameter to which it is passed __init__ AND __post_init__ but it does not become a field. This will keep the instance neat while still allowing for convoluted initialization logic. The ssl flag affects the way we build connection strings, but it doesn't have to persist afterwards.
# When not to employ data classes
Data classes are not always the right tool. Don't employ data classes when:
- You need convoluted inheritance hierarchies with custom ones
__init__logic on many levels - You build classes with meaningful behaviors and methods (employ regular classes for domain objects)
- You need the validation, serialization, or parsing functions that libraries like Pydantic Or attributes provide
- You are working with classes that have convoluted state management or lifecycle requirements
Data classes are best used as lightweight data containers rather than full-featured domain objects.
# Application
Writing competent data classes is about understanding how their options interact, not memorizing them all. Knowing When AND Why using each function is more essential than remembering each parameter.
As discussed in the article, using features such as immutability, slots, field customization, and post-initialization hooks enable you to write Python objects that are lean, predictable, and sheltered. These patterns assist prevent errors and reduce memory overhead without increasing complexity.
With this approach, data classes enable you to write neat, competent, and maintainable code. Content coding!
Bala Priya C is a software developer and technical writer from India. He likes working at the intersection of mathematics, programming, data analytics and content creation. Her areas of interest and specialization include DevOps, data analytics and natural language processing. She enjoys reading, writing, coding and coffee! He is currently working on learning and sharing his knowledge with the developer community by writing tutorials, guides, reviews, and more. Bala also creates intriguing resource overviews and coding tutorials.
