Cette page est affichée en anglais. Une traduction française est en cours.
SkillsApr 7, 2026·1 min de lecture

Instructor — Structured LLM Outputs with Pydantic

Extract structured data from LLMs using Pydantic models. Works with OpenAI, Anthropic, Gemini, and local models. The simplest way to get reliable JSON from any LLM.

Pydantic
Pydantic · Community
Prêt pour agents

Installation avec revue préalable

Cet actif nécessite une revue. Le prompt copié demande un dry-run, affiche les écritures, puis continue seulement après confirmation.

Needs Confirmation · 66/100Policy : confirmer
Surface agent
Tout agent MCP/CLI
Type
Skill
Installation
Single
Confiance
Confiance : Community
Point d'entrée
Instructor — Structured LLM Outputs with Pydantic
Commande avec revue préalable
npx -y tokrepo@latest install 9301dfb7-b047-4c15-94d2-47d349a77865 --target codex

Dry-run d'abord, confirmez les écritures, puis lancez cette commande.

TL;DR
Instructor uses Pydantic models to extract structured, validated JSON from any LLM including OpenAI, Anthropic, and local models.
§01

What it is

Instructor is a Python library that patches LLM client libraries to return structured data validated by Pydantic models. Instead of parsing raw text or hoping for valid JSON, you define a Pydantic schema and Instructor ensures the LLM output matches it. It works with OpenAI, Anthropic, Google Gemini, and local models.

It targets developers building applications that need reliable, typed data extraction from LLM responses — classification, entity extraction, data transformation, and structured summarization.

§02

How it saves time or tokens

Instructor handles retries, validation, and schema enforcement automatically. When the LLM returns malformed data, Instructor re-prompts with the validation error, fixing the output without manual intervention. This reduces debugging time and wasted tokens on malformed responses. Estimated token usage is around 4,200 tokens.

§03

How to use

  1. Install Instructor:
pip install instructor
  1. Patch your client and define a model:
import instructor
from openai import OpenAI
from pydantic import BaseModel

client = instructor.from_openai(OpenAI())

class User(BaseModel):
    name: str
    age: int

user = client.chat.completions.create(
    model='gpt-4o',
    response_model=User,
    messages=[{'role': 'user', 'content': 'Extract: John is 25 years old'}]
)
print(user.name, user.age)
  1. The response is a validated Pydantic object, not raw text.
§04

Example

import instructor
from openai import OpenAI
from pydantic import BaseModel
from typing import List

client = instructor.from_openai(OpenAI())

class Contact(BaseModel):
    name: str
    email: str
    company: str

class ContactList(BaseModel):
    contacts: List[Contact]

result = client.chat.completions.create(
    model='gpt-4o',
    response_model=ContactList,
    messages=[{'role': 'user', 'content': 'Extract contacts from this email thread...'}]
)
for c in result.contacts:
    print(f'{c.name} at {c.company}')
§05

Related on TokRepo

Key considerations

When evaluating Instructor for your workflow, consider the following factors. First, assess whether your team has the technical prerequisites to adopt this tool effectively. Second, evaluate the maintenance burden against the productivity gains. Third, check community activity and documentation quality to ensure long-term viability. Integration with your existing toolchain matters more than feature count alone. Start with a small pilot project before rolling out across the organization. Monitor resource usage during the initial adoption phase to identify bottlenecks early. Document your configuration decisions so team members can onboard independently.

§06

Common pitfalls

  • Complex nested Pydantic models increase the chance of validation failures; keep schemas as flat as possible.
  • Retry logic consumes additional tokens; set a max_retries limit to avoid runaway costs.
  • Not all model providers support function calling natively; Instructor uses different strategies (JSON mode, tool calling) depending on the provider.

Questions fréquentes

Which LLM providers does Instructor support?+

Instructor works with OpenAI, Anthropic Claude, Google Gemini, Mistral, Cohere, and local models via Ollama or LiteLLM. Each provider uses the appropriate structured output mechanism available.

How does validation work?+

Instructor validates the LLM response against your Pydantic model. If validation fails (wrong type, missing field, constraint violation), it automatically re-prompts the LLM with the error message for correction.

Can I use Instructor with streaming?+

Yes. Instructor supports streaming partial objects as they are generated. You get incremental updates to your Pydantic model as tokens arrive, useful for long-running extractions.

Does Instructor work with async code?+

Yes. Instructor provides async patching for OpenAI and other async client libraries. Use instructor.from_openai(AsyncOpenAI()) to enable async structured outputs.

What happens when the LLM cannot match the schema?+

Instructor retries up to the configured max_retries (default: 1). Each retry includes the validation error in the prompt. If all retries fail, it raises a validation exception that you handle in your application.

Sources citées (3)
🙏

Source et remerciements

Fil de discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires