SkillsApr 7, 2026·1 min read

Instructor — Structured LLM Outputs with Pydantic

Extract structured data from LLMs using Pydantic models. Works with OpenAI, Anthropic, Gemini, and local models. The simplest way to get reliable JSON from any LLM.

Pydantic · Community

Agent ready

Review-first install path

This asset needs a review step. The copied prompt tells the agent to dry-run, show the writes, then proceed only after confirmation.

Needs Confirmation · 66/100Policy: confirm

Agent surface

Any MCP/CLI agent

Kind

Skill

Install

Single

Trust

Trust: Community

Entrypoint

Instructor — Structured LLM Outputs with Pydantic

Review-first command

npx -y tokrepo@latest install 9301dfb7-b047-4c15-94d2-47d349a77865 --target codex

Dry-run first, confirm the writes, then run this command.

TL;DR

Instructor uses Pydantic models to extract structured, validated JSON from any LLM including OpenAI, Anthropic, and local models.

§01

What it is

Instructor is a Python library that patches LLM client libraries to return structured data validated by Pydantic models. Instead of parsing raw text or hoping for valid JSON, you define a Pydantic schema and Instructor ensures the LLM output matches it. It works with OpenAI, Anthropic, Google Gemini, and local models.

It targets developers building applications that need reliable, typed data extraction from LLM responses — classification, entity extraction, data transformation, and structured summarization.

§02

How it saves time or tokens

Instructor handles retries, validation, and schema enforcement automatically. When the LLM returns malformed data, Instructor re-prompts with the validation error, fixing the output without manual intervention. This reduces debugging time and wasted tokens on malformed responses. Estimated token usage is around 4,200 tokens.

§03

How to use

Install Instructor:

pip install instructor

Patch your client and define a model:

import instructor
from openai import OpenAI
from pydantic import BaseModel

client = instructor.from_openai(OpenAI())

class User(BaseModel):
    name: str
    age: int

user = client.chat.completions.create(
    model='gpt-4o',
    response_model=User,
    messages=[{'role': 'user', 'content': 'Extract: John is 25 years old'}]
)
print(user.name, user.age)

The response is a validated Pydantic object, not raw text.

§04

Example

import instructor
from openai import OpenAI
from pydantic import BaseModel
from typing import List

client = instructor.from_openai(OpenAI())

class Contact(BaseModel):
    name: str
    email: str
    company: str

class ContactList(BaseModel):
    contacts: List[Contact]

result = client.chat.completions.create(
    model='gpt-4o',
    response_model=ContactList,
    messages=[{'role': 'user', 'content': 'Extract contacts from this email thread...'}]
)
for c in result.contacts:
    print(f'{c.name} at {c.company}')

§05

Related on TokRepo

AI Tools for Coding — Developer tools for LLM application development
AI Tools for API — API integration and structured output tools

Key considerations

When evaluating Instructor for your workflow, consider the following factors. First, assess whether your team has the technical prerequisites to adopt this tool effectively. Second, evaluate the maintenance burden against the productivity gains. Third, check community activity and documentation quality to ensure long-term viability. Integration with your existing toolchain matters more than feature count alone. Start with a small pilot project before rolling out across the organization. Monitor resource usage during the initial adoption phase to identify bottlenecks early. Document your configuration decisions so team members can onboard independently.

§06

Common pitfalls

Complex nested Pydantic models increase the chance of validation failures; keep schemas as flat as possible.
Retry logic consumes additional tokens; set a max_retries limit to avoid runaway costs.
Not all model providers support function calling natively; Instructor uses different strategies (JSON mode, tool calling) depending on the provider.

Frequently Asked Questions

Which LLM providers does Instructor support?+

Instructor works with OpenAI, Anthropic Claude, Google Gemini, Mistral, Cohere, and local models via Ollama or LiteLLM. Each provider uses the appropriate structured output mechanism available.

How does validation work?+

Instructor validates the LLM response against your Pydantic model. If validation fails (wrong type, missing field, constraint violation), it automatically re-prompts the LLM with the error message for correction.

Can I use Instructor with streaming?+

Yes. Instructor supports streaming partial objects as they are generated. You get incremental updates to your Pydantic model as tokens arrive, useful for long-running extractions.

Does Instructor work with async code?+

Yes. Instructor provides async patching for OpenAI and other async client libraries. Use instructor.from_openai(AsyncOpenAI()) to enable async structured outputs.

What happens when the LLM cannot match the schema?+

Instructor retries up to the configured max_retries (default: 1). Each retry includes the validation error in the prompt. If all retries fail, it raises a validation exception that you handle in your application.

Citations (3)

Instructor GitHub— Structured LLM outputs using Pydantic models
Instructor Documentation— Supports OpenAI, Anthropic, Gemini, and local models
Instructor Retry Docs— Automatic retry with validation errors

Related on TokRepo

AI coding tools API tools Featured workflows

🙏

Source & Thanks

GitHub: jxnl/instructor (8k+ stars)
Docs: python.useinstructor.com
Author: Jason Liu

Discussion

No comments yet. Be the first to share your thoughts.

Related Assets

Instructor — Structured Outputs from LLMs

Get structured, validated outputs from LLMs using Pydantic models. Works with OpenAI, Anthropic, Google, Ollama, and more. Retry logic, streaming, partial responses. 12.6K+ stars.

Skills

Script Depot

Instructor — Typed Structured Outputs for LLMs

Instructor turns LLM replies into validated Pydantic models with retries. `pip install instructor`, then extract typed objects across major providers.

Skills

Agent Toolkit

Outlines — Structured Outputs with Any Model

Outlines generates structured outputs (Pydantic types, enums, ints) from LLMs. `pip install outlines`, connect a backend, then request typed results.

Skills

Agent Toolkit

Pydantic AI — Production AI Agent Framework

Build production-ready AI agents in Python with type-safe structured outputs, dependency injection, and multi-model support. By the creators of Pydantic.

Skills

Pydantic