Skills2026年4月7日·1 分钟阅读

Instructor — Structured LLM Outputs with Pydantic

Extract structured data from LLMs using Pydantic models. Works with OpenAI, Anthropic, Gemini, and local models. The simplest way to get reliable JSON from any LLM.

Agent 就绪

先审查再安装

这个资产需要先审查。复制的指令会要求 Agent dry-run、列出写入项,确认后再继续。

Needs Confirmation · 66/100策略:需确认
Agent 入口
任意 MCP/CLI Agent
类型
Skill
安装
Single
信任
信任等级:Community
入口
Instructor — Structured LLM Outputs with Pydantic
先审查命令
npx -y tokrepo@latest install 9301dfb7-b047-4c15-94d2-47d349a77865 --target codex

先 dry-run,确认写入项后再运行此命令。

TL;DR
Instructor uses Pydantic models to extract structured, validated JSON from any LLM including OpenAI, Anthropic, and local models.
§01

What it is

Instructor is a Python library that patches LLM client libraries to return structured data validated by Pydantic models. Instead of parsing raw text or hoping for valid JSON, you define a Pydantic schema and Instructor ensures the LLM output matches it. It works with OpenAI, Anthropic, Google Gemini, and local models.

It targets developers building applications that need reliable, typed data extraction from LLM responses — classification, entity extraction, data transformation, and structured summarization.

§02

How it saves time or tokens

Instructor handles retries, validation, and schema enforcement automatically. When the LLM returns malformed data, Instructor re-prompts with the validation error, fixing the output without manual intervention. This reduces debugging time and wasted tokens on malformed responses. Estimated token usage is around 4,200 tokens.

§03

How to use

  1. Install Instructor:
pip install instructor
  1. Patch your client and define a model:
import instructor
from openai import OpenAI
from pydantic import BaseModel

client = instructor.from_openai(OpenAI())

class User(BaseModel):
    name: str
    age: int

user = client.chat.completions.create(
    model='gpt-4o',
    response_model=User,
    messages=[{'role': 'user', 'content': 'Extract: John is 25 years old'}]
)
print(user.name, user.age)
  1. The response is a validated Pydantic object, not raw text.
§04

Example

import instructor
from openai import OpenAI
from pydantic import BaseModel
from typing import List

client = instructor.from_openai(OpenAI())

class Contact(BaseModel):
    name: str
    email: str
    company: str

class ContactList(BaseModel):
    contacts: List[Contact]

result = client.chat.completions.create(
    model='gpt-4o',
    response_model=ContactList,
    messages=[{'role': 'user', 'content': 'Extract contacts from this email thread...'}]
)
for c in result.contacts:
    print(f'{c.name} at {c.company}')
§05

Related on TokRepo

Key considerations

When evaluating Instructor for your workflow, consider the following factors. First, assess whether your team has the technical prerequisites to adopt this tool effectively. Second, evaluate the maintenance burden against the productivity gains. Third, check community activity and documentation quality to ensure long-term viability. Integration with your existing toolchain matters more than feature count alone. Start with a small pilot project before rolling out across the organization. Monitor resource usage during the initial adoption phase to identify bottlenecks early. Document your configuration decisions so team members can onboard independently.

§06

Common pitfalls

  • Complex nested Pydantic models increase the chance of validation failures; keep schemas as flat as possible.
  • Retry logic consumes additional tokens; set a max_retries limit to avoid runaway costs.
  • Not all model providers support function calling natively; Instructor uses different strategies (JSON mode, tool calling) depending on the provider.

常见问题

Which LLM providers does Instructor support?+

Instructor works with OpenAI, Anthropic Claude, Google Gemini, Mistral, Cohere, and local models via Ollama or LiteLLM. Each provider uses the appropriate structured output mechanism available.

How does validation work?+

Instructor validates the LLM response against your Pydantic model. If validation fails (wrong type, missing field, constraint violation), it automatically re-prompts the LLM with the error message for correction.

Can I use Instructor with streaming?+

Yes. Instructor supports streaming partial objects as they are generated. You get incremental updates to your Pydantic model as tokens arrive, useful for long-running extractions.

Does Instructor work with async code?+

Yes. Instructor provides async patching for OpenAI and other async client libraries. Use instructor.from_openai(AsyncOpenAI()) to enable async structured outputs.

What happens when the LLM cannot match the schema?+

Instructor retries up to the configured max_retries (default: 1). Each retry includes the validation error in the prompt. If all retries fail, it raises a validation exception that you handle in your application.

引用来源 (3)
🙏

来源与感谢

讨论

登录后参与讨论。
还没有评论,来写第一条吧。

相关资产