Pydanticで
複雑なJSONを
一発でValidation

Takanori Suzuki

BPStyle 179 / 2025 Nov 4

背景、モチベーション

プロジェクトで複雑なJSONをValidationする必要があった
いままではJSON Schemaを使っていた
JSON Schemaのメンテだるそう
Pydanticに載せ替えたらいい感じになった

システムの概要

MANAVIRIA：タブレット対応デジタル教材

さまざまな解答フォーム形式

記述、選択式、並べ替え等

さまざまな解答形式

編集者画面で教材を作成

フォーム形式ごとに異なる設定項目

記述式
- 表紙形式：フォーム幅
- 解答欄：正解、別解、プレースホルダー
選択式
- 表示形式：ボタンorセレクトボックス、選択肢ラベル
- 解答欄：選択肢リスト、正解リスト
並べ替え他

JSONにしてDBに保存

{
    "question": "Python 3.14の新機能はどれ？"
    "answer_format": "choices",
    "display": {"choices_selector": "button",
                "choices_label": "ABC"}
    "body": {
        "answers": [
            {"answer": "t-string",
             "is_correct": true},
            {"answer": "safe external debugger",
             "is_correct": true},
            {"answer": "lazy import",
             "is_correct": false},
            {"answer": "アノテーションの遅延評価",
             "is_correct": true}
        ]
    }
}

保存時にJSONをValidation

誤ったデータの混入を防ぐ

JSON SchemaでValidation

JSON Schema

json-schema.org
JSONデータの定義をJSONで書ける
Pythonのライブラリ(jsonschema)あり

JSON Schemaのサンプル [1]

{"productId": 5, "productName": "MANAVIRIA"}

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "https://example.com/product.schema.json",
  "title": "Product",
  "description": "A product from Acme's catalog",
  "type": "object",
  "properties": {
    "productId": {
      "description": "The unique identifier for a product",
      "type": "integer"
    },
    "productName": {
      "description": "Name of the product",
      "type": "string"
    }
  }
}

JSON Schema実装のつらみ（私見）

Schemaが長くて見づらい
定義がJSONなので読みにくい
- Pythonコード中に長いdictがある
フォーム形式ごとにバリデーション切り替え
- Pythonのif文とJSON Schemaの混在

PythonだけでJSONをいい感じにValidationできないかなー

PydanticでValidation

Pydantic

docs.pydantic.dev
Python用のデータValidationライブラリ
dataclass、TypedDictなどをValidation可能
型ヒントを使ってルールを定義

PydanticでValidationの結論

めっちゃいい感じにできた（自画自賛）

Pydanticの基本

$ pip install "pydantic"
$ pip install "pydantic[email]"  # email Validationする場合

JSON dataをvalidation[2]

{
    "name": "John Doe",
    "age": 30,
    "email": "john@example.com"
}

from pydantic import BaseModel, EmailStr, PositiveInt

class Person(BaseModel):  # BaseModelを継承
    name: str
    age: PositiveInt  # 正の整数
    email: EmailStr  # メールアドレス

JSON dataをvalidation（続き）

正しいJSONをValidation

from pathlib import Path
from example_model import Person

json_string = Path('person.json').read_text()
person = Person.model_validate_json(json_string)
print(person)
#> name='John Doe' age=30 email='john@example.com'

正しくないJSONをValidation

nameがない
ageがマイナス
emailがメールアドレスじゃない

{
    "age": -30,
    "email": "not-an-email-address"
}

エラーがめちゃ親切

from pydantic import ValidationError

json_string = Path("person_wrong.json").read_text()
try:
    person = Person.model_validate_json(json_string)
except ValidationError as err:
    print(err)

name
  Field required [type=missing, input_value={'age': -30, 'email': 'not-an-email-address'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.12/v/missing
age
  Input should be greater than 0 [type=greater_than, input_value=-30, input_type=int]
    For further information visit https://errors.pydantic.dev/2.12/v/greater_than
email
  value is not a valid email address: An email address must have an @-sign. [type=value_error, input_value='not-an-email-address', input_type=str]

Pydanticで複雑なJSONをValidation

複数のモデルをUnionsでまとめる

フォーム形式（記述、選択式等）ごとにPydanticモデルは必要
Unionsを使用すると「いずれかにマッチ」ができる
Unions - Pydantic Validation

複数のモデルをUnionsでまとめる

from typing import Literal
from pydantic import BaseModel, Field

class Cat(BaseModel):
    pet_type: Literal['cat']
    meows: int

class Dog(BaseModel):
    pet_type: Literal['dog']
    barks: float

class Model(BaseModel):  # pet_typeで見分ける
    pet: Cat | Dog = Field(discriminator='pet_type')

print(Model(pet={'pet_type': 'dog', 'barks': 3.14}))
#> pet=Dog(pet_type='dog', barks=3.14)

複数のフォームをUnionsでまとめる

        ---
title: モデルクラスの構成図
---
classDiagram
    BaseForm <|-- WrittenForm
    BaseForm <|-- ChoicesForm
    WrittenForm <-- AnswerForm
    ChoicesForm <-- AnswerForm
    class BaseForm["BaseForm(共通の要素を定義)"] {
        str: question
        str: answer_format
        object: display
        object: body
    }
    class WrittenForm["WrittenForm(記述式のフォーム)"] {
        WrittenDisplay: display
        WrittenBody: body
    }
    class ChoicesForm["ChoicesForm(選択式のフォーム)"] {
        ChoicesDisplay: display
        ChoicesBody: body
    }
    class AnswerForm["AnswerForm(複数フォームをまとめたモデル)"] {
        WritterForm_or_ChoicesForm: answer_form
    }

複数のフォームをUnionsでまとめる

フォームのベースクラスを定義

"""pydanticで複数のモデルをUnionしていい感じに処理できるか試す"""
from typing import Literal

from pydantic import BaseModel, Field, PositiveInt

class BaseForm(BaseModel):
    """フォームのベースクラス"""
    question: str  # 質問文
    answer_format: str  # 解答欄形式
    display: object  # フォーム形式ごとの表示形式
    body: object  # フォーム形式ごとのボディ

記述式のフォームモデルを定義

class WrittenDisplay(BaseModel):
    """記述式の表示形式"""
    text_input_format: PositiveInt = Field(le=3)

class WrittenBody(BaseModel):
    """記述式のボディ"""
    answers: list[str]
    placeholder: str

class WrittenForm(BaseForm):
    """記述式のモデル"""
    answer_format: Literal["written"]  # 「記述式」にのみマッチ
    display: WrittenDisplay
    body: WrittenBody

選択式のフォームモデルを定義

class ChoicesDisplay(BaseModel):
    """選択式の表示形式"""
    choices_selector: str  # ラジオ or セレクトボックス
    choices_label: str  # ABCなどのラベル形式

class ChoicesAnswer(BaseModel):
    """選択式の1つの選択肢"""
    answer: str  # 選択肢
    is_correct: bool  # 正解フラグ

class ChoicesBody(BaseModel):
    """記述式のボディ"""
    answers: list[ChoicesAnswer]

class ChoicesForm(BaseForm):
    """選択式のモデル"""
    answer_format: Literal["choices"]  # 「選択式」にのみマッチ
    display: ChoicesDisplay
    body: ChoicesBody

Unionsで複数のフォームを1つにまとめる

class WrittenForm(BaseForm):
    """記述式のモデル"""
    answer_format: Literal["written"]  # 「記述式」にのみマッチ
    display: WrittenDisplay
    body: WrittenBody

class ChoicesForm(BaseForm):
    """選択式のモデル"""
    answer_format: Literal["choices"]  # 「選択式」にのみマッチ
    display: ChoicesDisplay
    body: ChoicesBody

class AnswerForm(BaseModel):
    """いずれかのフォーム形式にマッチするモデル"""
    answer_form: WrittenForm | ChoicesForm = Field(discriminator="answer_format")

記述式をValidation

# 記述式のサンプル
written = {
    "question": "Pythonの作者は？", # 採点形式: 自動
    "answer_format": "written", # 記述式
    "display": {
        "text_input_format": 1,
    },
    "body": {
        "answers": ["Guido van Rossum"],
        "placeholder": "作者名をアルファベットで書いてください",
    },
}

written_form = AnswerForm(answer_form=written)
print(written_form)

選択式をValidation

# 選択式のサンプル
choices = {
    "question": "Python 3.14の新機能はどれ？",
    "answer_format": "choices",
    "display": {
        "choices_selector": "button",
        "choices_label": "ABC",
    },
    "body": {
        "answers": [
            {"answer": "t-string", "is_correct": True},
            {"answer": "safe external debugger", "is_correct": True},
            {"answer": "lazy import", "is_correct": False},
        ],
    },
}

choices_form = AnswerForm(answer_form=choices)
print(choices_form)

きちんとValidationできてるーーーー

# 見やすさのために改行を入れてます
answer_form=WrittenForm(
    question='Pythonの作者は？',
    answer_format='written',
    display=WrittenDisplay(textInputFormat=1),
    body=WrittenBody(answers=['Guido van Rossum'], placeholder='作者名をアルファベットで書いてください'))
answer_form=ChoicesForm(
    question='Python 3.14の新機能はどれ？',
    answer_format='choices',
    display=ChoicesDisplay(choices_selector='button', choices_label='ABC'),
    body=ChoicesBody(answers=[
        ChoicesAnswer(answer='t-string', is_correct=True),
        ChoicesAnswer(answer='safe external debugger', is_correct=True),
        ChoicesAnswer(answer='lazy import', is_correct=False)
    ]))

Pydanticで一発でValidationできそう！

SchemaからPydanticコード生成

実際のJSON Schemaはもっと複雑
フォーム形式も6パターン
Pydanticのコード書くのは大変そう

datamodel-code-generator

koxudaxi.github.io/datamodel-code-generator
各種データ定義からPythonのコードを生成
入力：OpenAPI、JSON Schema、YAML、GraphQL、Python辞書など
出力：Pydantic、dataclass、TypedDictなど

datamodel-code-generator

基本的な使い方
実際はフォーム形式ごとにJSONファイルを作成し、モデルコードを生成

% pip install datamodel-code-generator
% datamodel-codegen --input scheama.json \
  --input-file-type jsonschema \
  --output-model-type pydantic_v2.BaseModel \
  --output model.py

生成コードで各フォームの
Pydanticモデルができた！

さらにValidationルールを追加

データを意味的に解釈してチェックしたい
複数の項目の組み合わせでチェックしたい
→Constraints追加、Validatorの作成

任意の値のみ選択可能にする

Enumで定義した値のみ指定可 [3]

from enum import Enum

class TextInputFormat(Enum):
    """記述式のテキスト入力形式"""
    HALF_WIDTH = 1  # 幅50%
    FULL_WIDTH = 2  # 幅100%（1行）
	
class WrittenDisplay(BaseModel):
    """記述式の表示形式"""
    text_input_format: TextInputFormat

数値の範囲や文字数を指定

Field()に数値の範囲[4]、文字数[5]などの条件を指定できる

class WrittenDisplay(BaseModel):
    # 数値の上限を指定
    max_length: PositiveInt = Field(..., le=100)
    # 文字数の範囲を指定
    question: str = Field(..., min_length=20, max_length=500)

選択肢の中に正解があるか

@model_validatorでValidatorを定義 [6]

class ChoicesAnswer(BaseModel):  # 選択式の1つの選択肢
    answer: str  # 選択肢
    is_correct: bool  # 正解フラグ

class ChoicesBody(BaseModel):  # 選択式
    answers: list[ChoicesAnswer]  # 複数の選択肢
	
    @model_validator(mode="after")
    def at_least_one_correct(self) -> Self:
        """answersに1つ以上のis_correct: Trueがあるか"""
        if not any(a.is_correct for a in self.answers):
            raise ValueError("正解の選択肢がありません")
        return self

他にもいろいろできるんで
詳しくはドキュメント読んでね

docs.pydantic.dev

複雑なデータをValidation
→Pydanticを検討しよう！

Thank You

slides.takanory.net 20251204bpstyle/code

takanory takanory takanory takanory

takanory profile kuro-chan and kuri-chan

Pydanticで複雑なJSONを一発でValidation

背景、モチベーション

システムの概要

さまざまな解答フォーム形式

編集者画面で教材を作成

フォーム形式ごとに異なる設定項目

JSONにしてDBに保存

保存時にJSONをValidation

JSON SchemaでValidation

JSON Schema

JSON Schemaのサンプル [1]

JSON Schema実装のつらみ（私見）

PythonだけでJSONをいい感じにValidationできないかなー

PydanticでValidation

Pydantic

PydanticでValidationの結論

Pydanticの基本

JSON dataをvalidation[2]

JSON dataをvalidation（続き）

正しくないJSONをValidation

エラーがめちゃ親切

Pydanticで複雑なJSONをValidation

複数のモデルをUnionsでまとめる

複数のモデルをUnionsでまとめる

複数のフォームをUnionsでまとめる

複数のフォームをUnionsでまとめる

複数のフォームをUnionsでまとめる

Pydanticで一発でValidationできそう！

SchemaからPydanticコード生成

SchemaからPydanticコード生成

datamodel-code-generator

datamodel-code-generator

生成コードで各フォームのPydanticモデルができた！

さらにValidationルールを追加

さらにValidationルールを追加

任意の値のみ選択可能にする

数値の範囲や文字数を指定

選択肢の中に正解があるか

他にもいろいろできるんで詳しくはドキュメント読んでね

複雑なデータをValidation→Pydanticを検討しよう！

Thank You

Pydanticで
複雑なJSONを
一発でValidation

生成コードで各フォームの
Pydanticモデルができた！

他にもいろいろできるんで
詳しくはドキュメント読んでね

複雑なデータをValidation
→Pydanticを検討しよう！