Why use Pydantic?¶
Today, Pydantic is downloaded many times a month and used by some of the largest and most recognisable organisations in the world.
It's hard to know why so many people have adopted Pydantic since its inception six years ago, but here are a few guesses.
Type hints powering schema validation¶
The schema that Pydantic validates against is generally defined by Python type hints.
Type hints are great for this since, if you're writing modern Python, you already know how to use them. Using type hints also means that Pydantic integrates well with static typing tools (like mypy and Pyright) and IDEs (like PyCharm and VSCode).
Example - just type hints
from typing import Annotated, Literal
from annotated_types import Gt
from pydantic import BaseModel
class Fruit(BaseModel):
name: str # (1)!
color: Literal['red', 'green'] # (2)!
weight: Annotated[float, Gt(0)] # (3)!
bazam: dict[str, list[tuple[int, bool, float]]] # (4)!
print(
Fruit(
name='Apple',
color='red',
weight=4.2,
bazam={'foobar': [(1, True, 0.1)]},
)
)
#> name='Apple' color='red' weight=4.2 bazam={'foobar': [(1, True, 0.1)]}
- The
name
field is simply annotated withstr
— any string is allowed. - The
Literal
type is used to enforce thatcolor
is either'red'
or'green'
. - Even when we want to apply constraints not encapsulated in Python types, we can use
Annotated
andannotated-types
to enforce constraints while still keeping typing support. - I'm not claiming "bazam" is really an attribute of fruit, but rather to show that arbitrarily complex types can easily be validated.
Learn more
See the documentation on supported types.
Performance¶
Pydantic's core validation logic is implemented in a separate package (pydantic-core
),
where validation for most types is implemented in Rust.
As a result, Pydantic is among the fastest data validation libraries for Python.
Performance Example - Pydantic vs. dedicated code
In general, dedicated code should be much faster than a general-purpose validator, but in this example Pydantic is >300% faster than dedicated code when parsing JSON and validating URLs.
import json
import timeit
from urllib.parse import urlparse
import requests
from pydantic import HttpUrl, TypeAdapter
reps = 7
number = 100
r = requests.get('https://api.github.com/emojis')
r.raise_for_status()
emojis_json = r.content
def emojis_pure_python(raw_data):
data = json.loads(raw_data)
output = {}
for key, value in data.items():
assert isinstance(key, str)
url = urlparse(value)
assert url.scheme in ('https', 'http')
output[key] = url
emojis_pure_python_times = timeit.repeat(
'emojis_pure_python(emojis_json)',
globals={
'emojis_pure_python': emojis_pure_python,
'emojis_json': emojis_json,
},
repeat=reps,
number=number,
)
print(f'pure python: {min(emojis_pure_python_times) / number * 1000:0.2f}ms')
#> pure python: 5.32ms
type_adapter = TypeAdapter(dict[str, HttpUrl])
emojis_pydantic_times = timeit.repeat(
'type_adapter.validate_json(emojis_json)',
globals={
'type_adapter': type_adapter,
'HttpUrl': HttpUrl,
'emojis_json': emojis_json,
},
repeat=reps,
number=number,
)
print(f'pydantic: {min(emojis_pydantic_times) / number * 1000:0.2f}ms')
#> pydantic: 1.54ms
print(
f'Pydantic {min(emojis_pure_python_times) / min(emojis_pydantic_times):0.2f}x faster'
)
#> Pydantic 3.45x faster
Unlike other performance-centric libraries written in compiled languages, Pydantic also has excellent support for customizing validation via functional validators.
Learn more
Samuel Colvin's talk at PyCon 2023 explains how pydantic-core
works and how it integrates with Pydantic.
Serialization¶
Pydantic provides functionality to serialize model in three ways:
- To a Python
dict
made up of the associated Python objects. - To a Python
dict
made up only of "jsonable" types. - To a JSON string.
In all three modes, the output can be customized by excluding specific fields, excluding unset fields, excluding default values, and excluding None
values.
Example - Serialization 3 ways
from datetime import datetime
from pydantic import BaseModel
class Meeting(BaseModel):
when: datetime
where: bytes
why: str = 'No idea'
m = Meeting(when='2020-01-01T12:00', where='home')
print(m.model_dump(exclude_unset=True))
#> {'when': datetime.datetime(2020, 1, 1, 12, 0), 'where': b'home'}
print(m.model_dump(exclude={'where'}, mode='json'))
#> {'when': '2020-01-01T12:00:00', 'why': 'No idea'}
print(m.model_dump_json(exclude_defaults=True))
#> {"when":"2020-01-01T12:00:00","where":"home"}
Learn more
See the documentation on serialization.
JSON Schema¶
A JSON Schema can be generated for any Pydantic schema — allowing self-documenting APIs and integration with a wide variety of tools which support the JSON Schema format.
Example - JSON Schema
from datetime import datetime
from pydantic import BaseModel
class Address(BaseModel):
street: str
city: str
zipcode: str
class Meeting(BaseModel):
when: datetime
where: Address
why: str = 'No idea'
print(Meeting.model_json_schema())
"""
{
'$defs': {
'Address': {
'properties': {
'street': {'title': 'Street', 'type': 'string'},
'city': {'title': 'City', 'type': 'string'},
'zipcode': {'title': 'Zipcode', 'type': 'string'},
},
'required': ['street', 'city', 'zipcode'],
'title': 'Address',
'type': 'object',
}
},
'properties': {
'when': {'format': 'date-time', 'title': 'When', 'type': 'string'},
'where': {'$ref': '#/$defs/Address'},
'why': {'default': 'No idea', 'title': 'Why', 'type': 'string'},
},
'required': ['when', 'where'],
'title': 'Meeting',
'type': 'object',
}
"""
Pydantic is compliant with the latest version of JSON Schema specification (2020-12), which is compatible with OpenAPI 3.1.
Learn more
See the documentation on JSON Schema.
Strict mode and data coercion¶
By default, Pydantic is tolerant to common incorrect types and coerces data to the right type —
e.g. a numeric string passed to an int
field will be parsed as an int
.
Pydantic also has as strict mode, where types are not coerced and a validation error is raised unless the input data exactly matches the expected schema.
But strict mode would be pretty useless when validating JSON data since JSON doesn't have types matching
many common Python types like datetime
, UUID
or bytes
.
To solve this, Pydantic can parse and validate JSON in one step. This allows sensible data conversion
(e.g. when parsing strings into datetime
objects). Since the JSON parsing is
implemented in Rust, it's also very performant.
Example - Strict mode that's actually useful
from datetime import datetime
from pydantic import BaseModel, ValidationError
class Meeting(BaseModel):
when: datetime
where: bytes
m = Meeting.model_validate({'when': '2020-01-01T12:00', 'where': 'home'})
print(m)
#> when=datetime.datetime(2020, 1, 1, 12, 0) where=b'home'
try:
m = Meeting.model_validate(
{'when': '2020-01-01T12:00', 'where': 'home'}, strict=True
)
except ValidationError as e:
print(e)
"""
2 validation errors for Meeting
when
Input should be a valid datetime [type=datetime_type, input_value='2020-01-01T12:00', input_type=str]
where
Input should be a valid bytes [type=bytes_type, input_value='home', input_type=str]
"""
m_json = Meeting.model_validate_json(
'{"when": "2020-01-01T12:00", "where": "home"}'
)
print(m_json)
#> when=datetime.datetime(2020, 1, 1, 12, 0) where=b'home'
Learn more
See the documentation on strict mode.
Dataclasses, TypedDicts, and more¶
Pydantic provides four ways to create schemas and perform validation and serialization:
BaseModel
— Pydantic's own super class with many common utilities available via instance methods.- Pydantic dataclasses — a wrapper around standard dataclasses with additional validation performed.
TypeAdapter
— a general way to adapt any type for validation and serialization. This allows types likeTypedDict
andNamedTuple
to be validated as well as simple types (likeint
ortimedelta
) — all types supported can be used withTypeAdapter
.validate_call
— a decorator to perform validation when calling a function.
Example - schema based on a TypedDict
from datetime import datetime
from typing_extensions import NotRequired, TypedDict
from pydantic import TypeAdapter
class Meeting(TypedDict):
when: datetime
where: bytes
why: NotRequired[str]
meeting_adapter = TypeAdapter(Meeting)
m = meeting_adapter.validate_python( # (1)!
{'when': '2020-01-01T12:00', 'where': 'home'}
)
print(m)
#> {'when': datetime.datetime(2020, 1, 1, 12, 0), 'where': b'home'}
meeting_adapter.dump_python(m, exclude={'where'}) # (2)!
print(meeting_adapter.json_schema()) # (3)!
"""
{
'properties': {
'when': {'format': 'date-time', 'title': 'When', 'type': 'string'},
'where': {'format': 'binary', 'title': 'Where', 'type': 'string'},
'why': {'title': 'Why', 'type': 'string'},
},
'required': ['when', 'where'],
'title': 'Meeting',
'type': 'object',
}
"""
TypeAdapter
for aTypedDict
performing validation, it can also validate JSON data directly withvalidate_json
.dump_python
to serialise aTypedDict
to a python object, it can also serialise to JSON withdump_json
.