Prompts2026年4月8日·1 分钟阅读

Structured Outputs — Force LLMs to Return Valid JSON

Complete guide to getting reliable structured JSON from LLMs. Covers OpenAI structured outputs, Claude tool use, Instructor library, and Outlines for guaranteed valid responses.

Agent 就绪

先审查再安装

这个资产需要先审查。复制的指令会要求 Agent dry-run、列出写入项,确认后再继续。

Needs Confirmation · 64/100策略:需确认
Agent 入口
任意 MCP/CLI Agent
类型
Prompt
安装
Single
信任
信任等级:Community
入口
Structured Outputs — Force LLMs to Return Valid JSON
先审查命令
npx -y tokrepo@latest install 26c0617e-28c8-4a26-8a87-b765d3921208 --target codex

先 dry-run,确认写入项后再运行此命令。

TL;DR
A guide to techniques for getting reliable, validated JSON output from any LLM.
§01

What it is

This guide covers techniques for forcing LLMs to return valid structured JSON output. It explains OpenAI's structured outputs mode, Claude's tool use for structured responses, the Instructor library for Pydantic-based extraction, and Outlines for guided generation. The goal is eliminating parsing failures from malformed LLM output.

This resource is for developers building production LLM applications where output reliability matters. If your pipeline breaks when the LLM returns markdown instead of JSON, these techniques solve that problem.

§02

How it saves time or tokens

Without structured outputs, developers write fragile regex parsers and retry loops to extract data from LLM responses. Structured output techniques guarantee valid JSON on every call, eliminating retries and parsing failures. This reduces both token waste (no retry calls) and engineering time (no custom parsers). The estimated token cost for applying these techniques is around 4,000 tokens.

§03

How to use

  1. Choose a structured output technique based on your LLM provider.
  2. Define your output schema (JSON Schema, Pydantic model, or type definition).
  3. Pass the schema to the LLM along with your prompt.
  4. Receive validated, parsed output directly.
# OpenAI Structured Outputs
from openai import OpenAI
from pydantic import BaseModel

class ExtractedData(BaseModel):
    name: str
    age: int
    skills: list[str]

client = OpenAI()
response = client.beta.chat.completions.parse(
    model='gpt-4o',
    messages=[{'role': 'user', 'content': 'Extract: John is 30, knows Python and Go'}],
    response_format=ExtractedData
)

data = response.choices[0].message.parsed
print(data.name)    # 'John'
print(data.age)     # 30
print(data.skills)  # ['Python', 'Go']
§04

Example

Using Claude's tool use for structured output:

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model='claude-sonnet-4-20250514',
    max_tokens=1024,
    tools=[{
        'name': 'extract_data',
        'description': 'Extract structured data from text',
        'input_schema': {
            'type': 'object',
            'properties': {
                'name': {'type': 'string'},
                'age': {'type': 'integer'},
                'skills': {'type': 'array', 'items': {'type': 'string'}}
            },
            'required': ['name', 'age', 'skills']
        }
    }],
    tool_choice={'type': 'tool', 'name': 'extract_data'},
    messages=[{'role': 'user', 'content': 'Extract: John is 30, knows Python and Go'}]
)

# response.content[0].input contains validated JSON
§05

Related on TokRepo

§06

Common pitfalls

  • Structured output modes add latency compared to free-form generation. The model must constrain its output at each token.
  • Complex nested schemas may increase error rates. Keep schemas as flat as possible.
  • OpenAI and Claude handle structured outputs differently. Code written for one provider needs adaptation for the other.
  • Enum fields work well for classification but the model may hallucinate values not in the enum if the schema is not strict.
  • The Instructor library adds a dependency but provides a unified interface across providers, which simplifies multi-provider setups.

常见问题

What is the difference between structured outputs and JSON mode?+

JSON mode guarantees valid JSON syntax but does not enforce a specific schema. Structured outputs guarantee both valid JSON and conformance to a defined schema with specific fields, types, and required properties.

Does Claude support structured outputs natively?+

Claude uses tool use as its structured output mechanism. Define a tool with the desired output schema and force the model to call it. The tool input will contain validated structured data.

What is the Instructor library?+

Instructor is a Python library that adds Pydantic-based structured output to any LLM provider. Define a Pydantic model, and Instructor handles schema conversion, validation, and retry logic across OpenAI, Anthropic, and others.

How does Outlines differ from API-based structured outputs?+

Outlines uses guided generation at the token level for local models. It constrains the model's vocabulary at each step to guarantee valid output. This works with open-source models where API-level structured output is not available.

Can structured outputs handle optional fields?+

Yes. JSON Schema supports optional fields, default values, and nullable types. Both OpenAI and Claude handle optional fields correctly when the schema specifies them.

引用来源 (3)
🙏

来源与感谢

讨论

登录后参与讨论。
还没有评论,来写第一条吧。

相关资产