Prompts2026年4月8日·1 分钟阅读

Structured Outputs — Force LLMs to Return Valid JSON

Complete guide to getting reliable structured JSON from LLMs. Covers OpenAI structured outputs, Claude tool use, Instructor library, and Outlines for guaranteed valid responses.

Prompt Lab · Community

Agent 就绪

先审查再安装

这个资产需要先审查。复制的指令会要求 Agent dry-run、列出写入项，确认后再继续。

Needs Confirmation · 64/100策略：需确认

Agent 入口

任意 MCP/CLI Agent

类型

Prompt

安装

Single

信任

信任等级：Community

入口

Structured Outputs — Force LLMs to Return Valid JSON

先审查命令

npx -y tokrepo@latest install 26c0617e-28c8-4a26-8a87-b765d3921208 --target codex

先 dry-run，确认写入项后再运行此命令。

TL;DR

A guide to techniques for getting reliable, validated JSON output from any LLM.

§01

What it is

This guide covers techniques for forcing LLMs to return valid structured JSON output. It explains OpenAI's structured outputs mode, Claude's tool use for structured responses, the Instructor library for Pydantic-based extraction, and Outlines for guided generation. The goal is eliminating parsing failures from malformed LLM output.

This resource is for developers building production LLM applications where output reliability matters. If your pipeline breaks when the LLM returns markdown instead of JSON, these techniques solve that problem.

§02

How it saves time or tokens

Without structured outputs, developers write fragile regex parsers and retry loops to extract data from LLM responses. Structured output techniques guarantee valid JSON on every call, eliminating retries and parsing failures. This reduces both token waste (no retry calls) and engineering time (no custom parsers). The estimated token cost for applying these techniques is around 4,000 tokens.

§03

How to use

Choose a structured output technique based on your LLM provider.
Define your output schema (JSON Schema, Pydantic model, or type definition).
Pass the schema to the LLM along with your prompt.
Receive validated, parsed output directly.

# OpenAI Structured Outputs
from openai import OpenAI
from pydantic import BaseModel

class ExtractedData(BaseModel):
    name: str
    age: int
    skills: list[str]

client = OpenAI()
response = client.beta.chat.completions.parse(
    model='gpt-4o',
    messages=[{'role': 'user', 'content': 'Extract: John is 30, knows Python and Go'}],
    response_format=ExtractedData
)

data = response.choices[0].message.parsed
print(data.name)    # 'John'
print(data.age)     # 30
print(data.skills)  # ['Python', 'Go']

§04

Example

Using Claude's tool use for structured output:

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model='claude-sonnet-4-20250514',
    max_tokens=1024,
    tools=[{
        'name': 'extract_data',
        'description': 'Extract structured data from text',
        'input_schema': {
            'type': 'object',
            'properties': {
                'name': {'type': 'string'},
                'age': {'type': 'integer'},
                'skills': {'type': 'array', 'items': {'type': 'string'}}
            },
            'required': ['name', 'age', 'skills']
        }
    }],
    tool_choice={'type': 'tool', 'name': 'extract_data'},
    messages=[{'role': 'user', 'content': 'Extract: John is 30, knows Python and Go'}]
)

# response.content[0].input contains validated JSON

§05

Related on TokRepo

Prompt library — Prompt engineering techniques
AI coding tools — Development tools for LLM apps

§06

Common pitfalls

Structured output modes add latency compared to free-form generation. The model must constrain its output at each token.
Complex nested schemas may increase error rates. Keep schemas as flat as possible.
OpenAI and Claude handle structured outputs differently. Code written for one provider needs adaptation for the other.
Enum fields work well for classification but the model may hallucinate values not in the enum if the schema is not strict.
The Instructor library adds a dependency but provides a unified interface across providers, which simplifies multi-provider setups.

常见问题

What is the difference between structured outputs and JSON mode?+

JSON mode guarantees valid JSON syntax but does not enforce a specific schema. Structured outputs guarantee both valid JSON and conformance to a defined schema with specific fields, types, and required properties.

Does Claude support structured outputs natively?+

Claude uses tool use as its structured output mechanism. Define a tool with the desired output schema and force the model to call it. The tool input will contain validated structured data.

What is the Instructor library?+

Instructor is a Python library that adds Pydantic-based structured output to any LLM provider. Define a Pydantic model, and Instructor handles schema conversion, validation, and retry logic across OpenAI, Anthropic, and others.

How does Outlines differ from API-based structured outputs?+

Outlines uses guided generation at the token level for local models. It constrains the model's vocabulary at each step to guarantee valid output. This works with open-source models where API-level structured output is not available.

Can structured outputs handle optional fields?+

Yes. JSON Schema supports optional fields, default values, and nullable types. Both OpenAI and Claude handle optional fields correctly when the schema specifies them.

引用来源 (3)

OpenAI Structured Outputs Docs— OpenAI structured outputs for guaranteed JSON schema conformance
Anthropic Tool Use Docs— Claude tool use for structured responses
Instructor GitHub— Instructor library for Pydantic-based LLM outputs

🙏

来源与感谢

OpenAI Docs | Anthropic Docs | Instructor | Outlines

讨论

登录后参与讨论。

还没有评论，来写第一条吧。

Structured Outputs — Force LLMs to Return Valid JSON

先审查再安装

What it is

How it saves time or tokens

How to use

Example

Related on TokRepo

Common pitfalls

常见问题

引用来源 (3)

TokRepo 相关

来源与感谢

讨论

相关资产

Outlines — Structured Output from Any LLM

Instructor — Structured Outputs from LLMs

Instructor — Structured LLM Outputs with Pydantic

Outlines — Guaranteed Structured LLM Outputs