In modern architecture, frontend and backend are separated and maintained by different teams. To cooperate, backend exposes services as API endpoints with carefully designed data models, for both request and response. In Python, there are numerous ways to complete this task, such as WTForms, marshmallow. There are also frameworks that are designed to build API server, like FastAPI, Connexion, both are built around OpenAPI specification. In this article, I will introduce Pydantic, a validation and serialization library for Python, to build and enforce API request and response models. The web framework I choose is Flask, but Pydantic is framework-agnostic and can also be used in non-web applications.
Define response model
After pip install pydantic
, let’s define a simple response model to return the currently logged-in user:
1 | from pydantic import BaseModel |
Then use httpie to test the API:
1 | % http localhost:5000/current-user |
- We create Pydantic model by extending
BaseModel
, which is the basic approach. There are others ways likedataclass
,TypeAdapter
, or dynamic creation of models. - Model fields are simply defined by class attributes and type annotations. Unlike other SerDe libraries, Pydantic is natively built with Python type hints. If you are not familiar with it, please check out my previous blog post.
- In the API, we manually create a model instance
user
. Usually we create them from request body or database models, which will be demonstrated later. - Then we serialize, or “dump” the model into a Python dict, that in turn is transformed by Flask into a JSON string. We can also use
user.model_dump_json()
, which returns the JSON string directly, but then the response header needs to be manually set toapplication/json
, so we would rather let Flask do the job. mode="json"
tells Pydantic to serialize field values into JSON representable types. For instance,datetime
andDecimal
will be converted to string. Flask can also do this conversion, but we prefer keeping serialization in Pydantic model for clarity and ease of change.
Create from SQLAlchemy model
Using model constructor to create instance is one way. We can also create from a Python dictionary:
1 | user = User.model_validate({'id': 1, 'username': 'jizhang', 'last_login': datetime.now()}) |
Or an arbitrary class instance:
1 | class UserDto: |
UserDto
can also be a Python dataclass. You may notice the from_attributes
parameter, which means field values are extracted from object’s attributes, instead of dictionary key value pairs. If the model is always created from objects, we can add this configuration to the model:
1 | class User(BaseModel): |
This is actually how we integrate with SQLAlchemy, creating Pydantic model instance from SQLAlchemy model instance:
1 | from sqlalchemy.orm import Mapped, mapped_column |
Would it be nice if our model is both Pydantic model and SQLAlchemy model? SQLModel is exactly designed for this purpose:
1 | from sqlmodel import SQLModel, Field |
But personally I am not in favor of this approach, for it mixes classes from two layers, domain layer and presentation layer. Now the class has two reasons to change, thus violating the single responsibility principle. Use it judiciously.
Nested models
To return a list of users, we can either create a dedicated response model:
1 | class UserListResponse(BaseModel): |
Or, if you prefer to return a list, we can create a custom type with TypeAdapter
:
1 | from pydantic import TypeAdapter |
I recommend the first approach, since it would be easier to add model attributes in the future.
Custom serialization
By default, datetime
object is serialized into ISO 8601 string. If you prefer a different representation, custom serializers can be added. There are several ways to accomplish this task. Decorate a class method with field_serializer
:
1 | from pydantic import field_serializer |
Create a new type with custom serializer:
1 | from typing import Annotated |
Annotated
is widely used in Pydantic, to attach extra information like custom serialization and validation to an existing type. In this example, we use a PlainSerializer
, which takes a function or lambda to serialize the field. There is also a WrapSerializer
, that can be used to apply transformation before and after the default serializer.
Finally, there is the model_serializer
decorator that can be used to transform the whole model, as well as individual fields.
1 | from pydantic import model_serializer |
Now UserListResponse
will be dumped into a list, instead of a dictionary.
Field alias
Sometimes we want to change the key name in serialized data. For instance, change users
to userList
:
1 | from pydantic import Field |
serialization_alias
indicates that the alias is only used for serialization. When creating models, we still use users
as the key. To change both keys to userList
, use Field(alias='userList')
. If this conversion is universal, say you want all your request and response data to use camelCase for keys, add these configurations to your model:
1 | from pydantic.alias_generators import to_camel |
Computed fields
Fields may derive from other fields:
1 | from flask import url_for |
If the field requires extra information, we can add a private attribute to the model. The attribute’s name starts with an underscore, and Pydantic will ignore it in serialization and validation.
1 | from pydantic import PrivateAttr |
Define request model
1 | from pydantic import Field |
model_validate
takes the dictionary returned byget_json
, validates it, and constructs a model instance. There is also amodel_validate_json
method that accepts JSON string.- The validated form data is then passed to an ORM model. Usually this is done by manual assignments, because fields like
password
need to be properly encrypted. Field(exclude=True)
indicates that this field will be excluded in serialization. This is helpful when you do not want some information leaking to the client.
1 | % http localhost:5000/create-user username=jizhang password=password |
Query parameters can be modeled in a similar way:
1 | class SearchForm(BaseModel): |
Use ==
to tell httpie to use GET method:
1 | % http localhost:5000/article/search tags==a,b,c keyword==test |
Custom deserialization
Let’s see how to parse tags
string to a list of tags:
1 | class SearchForm(BaseModel): |
field_validator
is used to compose custom validation rules, which will be discussed in a later section. Normally it executes after Pydantic has done the default validation. In this case, tags
is declared as list[str]
and Pydantic would raise an error when a string is passed to it. So we use mode='before'
to apply this function on the raw input data, and transform it into a list of tags.
There are also annotated validator and model_validator
:
1 | from pydantic import BeforeValidator |
Required field and default value
By default, all model attributes are required. Though keyword
is defined as Optional
, Pydantic will still raise an error if keyword
is missing in the input data.
1 | SearchForm.model_validate_json('{"tags": "a,b,c"}') |
There are several ways to provide a default value for missing keys:
1 | from typing import Optional, Annotated |
default_factory
is useful when the default value is dynamically generated. For list
and dict
, it is okay to use literals []
and {}
, because Pydantic will make a deep copy of it.
Type conversion
For GET requests, input data are always of type dict[str, str]
. For POST requests, though the client could send different types of values via JSON, like boolean and number, there are some types that are not representable in JSON, datetime for an example. When creating models, Pydantic will do proper type conversion. It is actually a part of validation, to ensure the client provides the correct data.
1 | class ConversionForm(BaseModel): |
As a side note, if you are to create model with constructor, and pass a data type that does not match the model definition, mypy will raise an error:
1 | ConversionForm(int_value='10') |
To fix this, you need to enable Pydantic’s mypy plugin in pyproject.toml
:
1 | [tool.mypy] |
Data validation
Type conversion works as the first step of data validation. Pydantic makes sure the model it creates contains attributes with the correct type. For further validation, Pydantic provides some builtin validators, and users are free to create new ones.
Builtin validators
Here are three ways to ensure username
contains 3 to 10 characters, with builtin validators:
1 | # Field definition |
Some useful builtin validators are listed below. For annotated-types package, please check its repository for more.
- String constraints
min_length
max_length
pattern
: Regular expression, e.g.r'^[0-9]+$'
- Numeric constraints
gt
: Greater thanlt
: Less thange
: Greater than or equal tole
: Less than or equal to
- Decimal constraints
max_digits
decimal_places
In addition, Pydantic defines several special types for validation. For instance:
1 | from pydantic import PostiveInt |
int_1
and int_2
are equivalent, they both accept integer that is greater than 0. Other useful predefined types are:
NegativeInt
,NonPositiveInt
,NonNegativeFloat
, etc.StrictInt
: Only accept integer value like10
,-20
. Raise error for string"10"
or float10.0
. Strict mode can be enabled on field level, model level, or per validation.AwareDatetime
: Datetime must contain timezone information, e.g.2024-01-28T07:58:00+08:00
AnyUrl
: Accept a valid URL, and user can access properties likescheme
,host
,path
, etc.Emailstr
: Accept a valid email address. This requires an extra package, i.e.pip intall "pydantic[email]"
IPvAnyAddress
: Accept a valid IPv4 or IPv6 address.Json
: Accept a JSON string and convert it to Python object. For example:
1 | from pydantic import Json |
Choices
Another common use case for validation is to only accept certain values for a field. This can be done with Literal
type:
1 | from typing import Literal |
Or Enum
:
1 | from enum import Enum |
Custom validator
As shown in the previous section, there are three ways to define a validator. But this time we want to apply custom logics after the default validation.
1 | # Field decorator |
Handle validation error
All validation errors, including the ValueError
we raise in custom validator, are wrapped in Pydantic’s ValidationError
. So a common practice is to setup a global error handler for it. Take Flask for an instance:
1 | from flask import Response, jsonfiy |
ValidationError
provides full description of all errors. Here we only take the first error and return the field name and error message:
1 | % http localhost:5000/create-user username=a password=password |
To further customize the validation error, one can construct a PydanticCustomError
:
1 | # In field validator |
Validate routing variables
Pydantic provides a decorator to validate function calls. This can be used to validate Flask’s routing variables as well. For instance, Flask accepts non-negative integer, but Pydantic requires it to be greater than 0.
1 | from pydantic import validate_call, PositiveInt |
Validation result:
1 | % http localhost:5000/user/0 |
Integrate with OpenAPI
The quickest way is to use a framework that builds with Pydantic and OpenAPI, a.k.a. FastAPI. But if you are using a different framework, or maintaining an existing project, there are several options.
Export model to JSON schema
Pydantic provides the facility to export models as JSON schema. We can write a Flask command to save them into a file:
1 | from pydantic.json_schema import models_json_schema |
The generated schemas.json
would be:
1 | { |
Then we create an openapi.yaml
file to use these schemas:
1 | openapi: 3.0.2 |
Open it in some OpenAPI viewer, e.g. OpenAPI extension for VS Code:
Create OpenAPI specification in Python
Install openapi-pydantic, and define OpenAPI like a Pydantic model:
1 | from openapi_pydantic import OpenAPI |
The generated file is similar to the previous one, except that it is written in JSON and schemas are embedded in the components
section.
Decorate API endpoints
SpecTree provides facilities to decorate Flask view methods with Pydantic models. It generates OpenAPI docs in http://localhost:5000/apidoc/swagger.
1 | app = Flask(__name__) |
- Pydantic
BaseModel
needs to be imported frompydantic.v1
, for compatibility reason. - Validation error is returned to client with HTTP status 422 and detailed information:
1 | % http localhost:5000/create-user username=a password=password |