Data Objects
We often pass around data in dictionaries. Typing these dictionaries is simple, but very limited. We can specify a type for the keys and a type for the values:
movie_ratings: dict[str, float] = {
"The Matrix": 8.7,
"The Matrix Reloaded": 7.2,
"The Matrix Revolutions": 6.7,
"The Matrix Resurrections": 5.7
}
This only works well for uniform dictionaries because we have to describe all the possible values with a single type. It's also not possible to indicate the (non)existence of certain keys. Accessing a non-existent key will cause a runtime error that can't be predicted by the type checker.
Beyond Dictionaries
There are better tools available when the shape of our data becomes more specific:
TypedDict
TypedDict is especially useful because it works wherever dicts are already being used. It is structurally typed, so no constructor or declaration is required at the point of creation.
They do not allow runtime behavior, so a TypedDict cannot specify a default value for a key, nor can it include methods.
from typing import TypedDict
class Movie(TypedDict):
name: str
year: int
rating: float
def get_movie_title(movie: Movie):
return f"{movie['name']} ({movie['year']})"
the_matrix = {
"name": "The Matrix",
"year": 1999,
"rating": 8.7
}
get_movie_title(the_matrix) #> The Matrix (1999)
(Data)Class
Classes can also provide a way to pass around well-typed data:
class Movie:
def __init__(self, name: str, year: int, rating: float = 0):
self.name = name
self.year = year
self.rating = rating
The dataclass
decorator makes this a little easier by automatically providing a typed __init__()
method for a class. This is equivalent to the above:
from dataclasses import dataclass, asdict
@dataclass
class Movie:
name: str
year: int
rating: float = 0
Dataclasses also provide a __repr__()
method for better debugging and logging:
print(MovieClass)
#> <__main__.MovieClass object at 0x104ba3eb0>
print(MovieDataclass)
#> MovieDataclass(name='The Matrix', year=1999, rating=0)
Unlike TypedDicts, dataclasses support runtime behavior; after all, they are just classes. Default values and methods are fair game.
from dataclasses import dataclass, asdict
@dataclass
class Movie:
name: str
year: int
rating: float = 0 # default value
@property
def title(self) -> str:
return f"{self.name} ({self.year})"
the_matrix = Movie(
name="The Matrix",
year=1999
)
the_matrix.title #> The Matrix (1999)
As described in the subtyping section, classes are nominally typed. A subtype of a dataclass must be an instance of that dataclass or one of its subclasses.
Dataclasses can be converted to dictionaries, but the type information is lost:
from dataclasses import asdict
reveal_type(asdict(the_matrix)) #> dict[str, Any]
TypedDict vs Dataclass
In comparison:
TypedDict | Dataclass | |
---|---|---|
subtyping | structural | nominal |
default values | no | yes |
methods | no | yes |
type errors | when used | when constructed |
TypedDicts are easier to use with existing code because they are just like the dictionaries that are probably already being used. If that compatibility isn't a concern, the runtime capabilities of dataclasses may be compelling.