Data | typy

Data Objects

We often pass around data in dictionaries. Typing these dictionaries is simple, but very limited. We can specify a type for the keys and a type for the values:

movie_ratings: dict[str, float] = {
  "The Matrix": 8.7,
  "The Matrix Reloaded": 7.2,
  "The Matrix Revolutions": 6.7,
  "The Matrix Resurrections": 5.7
}

This only works well for uniform dictionaries because we have to describe all the possible values with a single type. It's also not possible to indicate the (non)existence of certain keys. Accessing a non-existent key will cause a runtime error that can't be predicted by the type checker.

Beyond Dictionaries

There are better tools available when the shape of our data becomes more specific:

TypedDict

TypedDict is especially useful because it works wherever dicts are already being used. It is structurally typed, so no constructor or declaration is required at the point of creation.

They do not allow runtime behavior, so a TypedDict cannot specify a default value for a key, nor can it include methods.

from typing import TypedDict

class Movie(TypedDict):
    name: str
    year: int
    rating: float

def get_movie_title(movie: Movie):
    return f"{movie['name']} ({movie['year']})"

the_matrix = {
  "name": "The Matrix",
  "year": 1999,
  "rating": 8.7
}

get_movie_title(the_matrix)  #> The Matrix (1999)

(Data)Class

Classes can also provide a way to pass around well-typed data:

class Movie:
    def __init__(self, name: str, year: int, rating: float = 0):
        self.name = name
        self.year = year
        self.rating = rating

The dataclass decorator makes this a little easier by automatically providing a typed __init__() method for a class. This is equivalent to the above:

from dataclasses import dataclass, asdict

@dataclass
class Movie:
    name: str
    year: int
    rating: float = 0

Dataclasses also provide a __repr__() method for better debugging and logging:

print(MovieClass)
#> <__main__.MovieClass object at 0x104ba3eb0>
print(MovieDataclass)
#> MovieDataclass(name='The Matrix', year=1999, rating=0)

Unlike TypedDicts, dataclasses support runtime behavior; after all, they are just classes. Default values and methods are fair game.

from dataclasses import dataclass, asdict

@dataclass
class Movie:
    name: str
    year: int
    rating: float = 0  # default value

    @property
    def title(self) -> str:
        return f"{self.name} ({self.year})"

the_matrix = Movie(
    name="The Matrix",
    year=1999
)
the_matrix.title  #> The Matrix (1999)

As described in the subtyping section, classes are nominally typed. A subtype of a dataclass must be an instance of that dataclass or one of its subclasses.

Dataclasses can be converted to dictionaries, but the type information is lost:

from dataclasses import asdict

reveal_type(asdict(the_matrix))  #> dict[str, Any]

TypedDict vs Dataclass

In comparison:

	TypedDict	Dataclass
subtyping	structural	nominal
default values	no	yes
methods	no	yes
type errors	when used	when constructed

TypedDicts are easier to use with existing code because they are just like the dictionaries that are probably already being used. If that compatibility isn't a concern, the runtime capabilities of dataclasses may be compelling.

typy

beta

Data Objects

Beyond Dictionaries

TypedDict

(Data)Class

TypedDict vs Dataclass

On this page