# Pydantic — Data Validation for AI Agent Pipelines

> Python's most popular data validation library, essential for AI agent tool definitions. Pydantic enforces type safety in LLM structured outputs, API schemas, and config files.

## Install

Save the content below to `.claude/skills/` or append to your `CLAUDE.md`:

## Quick Use

```bash
pip install pydantic
```

```python
from pydantic import BaseModel, Field
from typing import Optional

class UserProfile(BaseModel):
    name: str = Field(description="Full name")
    age: int = Field(ge=0, le=150, description="Age in years")
    email: str = Field(pattern=r'^[\w.-]+@[\w.-]+\.\w+$')
    bio: Optional[str] = None

# Validates automatically
user = UserProfile(name="Alice", age=30, email="alice@example.com")
print(user.model_dump_json())

# Raises ValidationError
try:
    bad = UserProfile(name="Bob", age=-5, email="not-email")
except Exception as e:
    print(e)  # age: Input should be >= 0; email: invalid pattern
```

## What is Pydantic?

Pydantic is Python's most popular data validation library with 200M+ monthly downloads. In the AI ecosystem, it is foundational — used to define LLM tool schemas, validate structured outputs, configure agents, and build API contracts. If you are building AI agents in Python, you are almost certainly using Pydantic.

**Answer-Ready**: Pydantic is Python's #1 data validation library (200M+ downloads/month). Essential for AI: defines LLM tool schemas, validates structured outputs, configures agents. Used by FastAPI, LangChain, Instructor, DSPy, and every major AI framework. V2 is 5-50x faster than V1. 22k+ GitHub stars.

**Best for**: Python developers building AI agents, APIs, or data pipelines. **Works with**: Every Python AI framework. **Setup time**: Under 1 minute.

## Why Pydantic Matters for AI

### 1. LLM Tool Definitions

```python
from pydantic import BaseModel

class SearchTool(BaseModel):
    query: str = Field(description="Search query")
    max_results: int = Field(default=5, ge=1, le=20)
    language: str = Field(default="en")

# Auto-generates JSON Schema for LLM tool calling
print(SearchTool.model_json_schema())
```

### 2. Structured Output Validation

```python
class ExtractedEntity(BaseModel):
    name: str
    entity_type: str = Field(description="person, org, or location")
    confidence: float = Field(ge=0, le=1)

# Validate LLM output
raw = {"name": "Anthropic", "entity_type": "org", "confidence": 0.95}
entity = ExtractedEntity.model_validate(raw)
```

### 3. Agent Configuration

```python
class AgentConfig(BaseModel):
    model: str = "claude-sonnet-4-20250514"
    temperature: float = Field(default=0.7, ge=0, le=2)
    max_tokens: int = Field(default=4096, ge=1)
    tools: list[str] = []
    system_prompt: str = ""

config = AgentConfig.model_validate_json(open("config.json").read())
```

### 4. API Contracts (FastAPI)

```python
from fastapi import FastAPI

app = FastAPI()

class ChatRequest(BaseModel):
    message: str
    model: str = "claude-sonnet-4-20250514"

class ChatResponse(BaseModel):
    reply: str
    tokens_used: int

@app.post("/chat", response_model=ChatResponse)
async def chat(req: ChatRequest):
    ...
```

## Pydantic V2 Performance

| Operation | V1 | V2 | Speedup |
|-----------|----|----|---------|
| Model creation | 1x | 5-10x | 5-10x |
| JSON parsing | 1x | 10-50x | 10-50x |
| Serialization | 1x | 5-20x | 5-20x |

V2 uses a Rust core (`pydantic-core`) for dramatic speed improvements.

## AI Frameworks Using Pydantic

| Framework | How It Uses Pydantic |
|-----------|---------------------|
| LangChain | Tool definitions, output parsers |
| Instructor | Structured output validation |
| DSPy | Signature definitions |
| FastAPI | Request/response models |
| Pydantic AI | Agent framework built on Pydantic |
| Guardrails AI | Validator definitions |

## FAQ

**Q: V1 or V2?**
A: Always V2. It is 5-50x faster and the ecosystem has migrated. V1 is in maintenance mode.

**Q: How does it relate to JSON Schema?**
A: Pydantic models auto-generate JSON Schema via `model_json_schema()`. This is how LLMs understand your tool parameters.

**Q: Can I use it for runtime config?**
A: Yes, `pydantic-settings` loads from env vars, .env files, and config files with full validation.

## Source & Thanks

> Created by [Samuel Colvin](https://github.com/pydantic). Licensed under MIT.
>
> [pydantic/pydantic](https://github.com/pydantic/pydantic) — 22k+ stars

<!-- ZH -->

## Quick Use

```bash
pip install pydantic
```

The bedrock of data validation in the Python AI ecosystem.

## What is Pydantic?

Python's most popular data validation library (200M+ monthly downloads). Infrastructure for AI: define tool schemas, validate structured output, configure agents.

**TL;DR**: #1 Python data validation (200M+/mo). AI essentials: LLM tool schemas + structured-output validation + agent config. Depended on by all major AI frameworks. V2 is 5–50x faster than V1. 22k+ stars.

**Best for**: Python developers building AI agents, APIs, or data pipelines.

## Pydantic in AI

### 1. Tool Definition — auto-generated JSON Schema
### 2. Output Validation — validate structured data returned by LLMs
### 3. Agent Config — type-safe config files

## FAQ

**Q: V1 or V2?**
A: Always V2 — 5–50x faster; the ecosystem has migrated.

## Source & Thanks

> [pydantic/pydantic](https://github.com/pydantic/pydantic) — 22k+ stars, MIT

---
Source: https://tokrepo.com/en/workflows/pydantic-data-validation-ai-agent-pipelines-1960042c
Author: Pydantic