Instructor — Structured LLM Outputs with Pydantic
Extract structured data from LLMs using Pydantic models. Works with OpenAI, Anthropic, Gemini, and local models. The simplest way to get reliable JSON from any LLM.
Review-first install path
This asset needs a review step. The copied prompt tells the agent to dry-run, show the writes, then proceed only after confirmation.
npx -y tokrepo@latest install 9301dfb7-b047-4c15-94d2-47d349a77865 --target codexDry-run first, confirm the writes, then run this command.
What it is
Instructor is a Python library that patches LLM client libraries to return structured data validated by Pydantic models. Instead of parsing raw text or hoping for valid JSON, you define a Pydantic schema and Instructor ensures the LLM output matches it. It works with OpenAI, Anthropic, Google Gemini, and local models.
It targets developers building applications that need reliable, typed data extraction from LLM responses — classification, entity extraction, data transformation, and structured summarization.
How it saves time or tokens
Instructor handles retries, validation, and schema enforcement automatically. When the LLM returns malformed data, Instructor re-prompts with the validation error, fixing the output without manual intervention. This reduces debugging time and wasted tokens on malformed responses. Estimated token usage is around 4,200 tokens.
How to use
- Install Instructor:
pip install instructor
- Patch your client and define a model:
import instructor
from openai import OpenAI
from pydantic import BaseModel
client = instructor.from_openai(OpenAI())
class User(BaseModel):
name: str
age: int
user = client.chat.completions.create(
model='gpt-4o',
response_model=User,
messages=[{'role': 'user', 'content': 'Extract: John is 25 years old'}]
)
print(user.name, user.age)
- The response is a validated Pydantic object, not raw text.
Example
import instructor
from openai import OpenAI
from pydantic import BaseModel
from typing import List
client = instructor.from_openai(OpenAI())
class Contact(BaseModel):
name: str
email: str
company: str
class ContactList(BaseModel):
contacts: List[Contact]
result = client.chat.completions.create(
model='gpt-4o',
response_model=ContactList,
messages=[{'role': 'user', 'content': 'Extract contacts from this email thread...'}]
)
for c in result.contacts:
print(f'{c.name} at {c.company}')
Related on TokRepo
- AI Tools for Coding — Developer tools for LLM application development
- AI Tools for API — API integration and structured output tools
Key considerations
When evaluating Instructor for your workflow, consider the following factors. First, assess whether your team has the technical prerequisites to adopt this tool effectively. Second, evaluate the maintenance burden against the productivity gains. Third, check community activity and documentation quality to ensure long-term viability. Integration with your existing toolchain matters more than feature count alone. Start with a small pilot project before rolling out across the organization. Monitor resource usage during the initial adoption phase to identify bottlenecks early. Document your configuration decisions so team members can onboard independently.
Common pitfalls
- Complex nested Pydantic models increase the chance of validation failures; keep schemas as flat as possible.
- Retry logic consumes additional tokens; set a max_retries limit to avoid runaway costs.
- Not all model providers support function calling natively; Instructor uses different strategies (JSON mode, tool calling) depending on the provider.
Frequently Asked Questions
Instructor works with OpenAI, Anthropic Claude, Google Gemini, Mistral, Cohere, and local models via Ollama or LiteLLM. Each provider uses the appropriate structured output mechanism available.
Instructor validates the LLM response against your Pydantic model. If validation fails (wrong type, missing field, constraint violation), it automatically re-prompts the LLM with the error message for correction.
Yes. Instructor supports streaming partial objects as they are generated. You get incremental updates to your Pydantic model as tokens arrive, useful for long-running extractions.
Yes. Instructor provides async patching for OpenAI and other async client libraries. Use instructor.from_openai(AsyncOpenAI()) to enable async structured outputs.
Instructor retries up to the configured max_retries (default: 1). Each retry includes the validation error in the prompt. If all retries fail, it raises a validation exception that you handle in your application.
Citations (3)
- Instructor GitHub— Structured LLM outputs using Pydantic models
- Instructor Documentation— Supports OpenAI, Anthropic, Gemini, and local models
- Instructor Retry Docs— Automatic retry with validation errors
Related on TokRepo
Source & Thanks
- GitHub: jxnl/instructor (8k+ stars)
- Docs: python.useinstructor.com
- Author: Jason Liu
Discussion
Related Assets
Instructor — Structured Outputs from LLMs
Get structured, validated outputs from LLMs using Pydantic models. Works with OpenAI, Anthropic, Google, Ollama, and more. Retry logic, streaming, partial responses. 12.6K+ stars.
Instructor — Typed Structured Outputs for LLMs
Instructor turns LLM replies into validated Pydantic models with retries. `pip install instructor`, then extract typed objects across major providers.
Outlines — Structured Outputs with Any Model
Outlines generates structured outputs (Pydantic types, enums, ints) from LLMs. `pip install outlines`, connect a backend, then request typed results.
Pydantic — Data Validation for AI Agent Pipelines
Python's most popular data validation library, essential for AI agent tool definitions. Pydantic enforces type safety in LLM structured outputs, API schemas, and config files.