Structured Outputs — Force LLMs to Return Valid JSON
Complete guide to getting reliable structured JSON from LLMs. Covers OpenAI structured outputs, Claude tool use, Instructor library, and Outlines for guaranteed valid responses.
What it is
This guide covers techniques for forcing LLMs to return valid structured JSON output. It explains OpenAI's structured outputs mode, Claude's tool use for structured responses, the Instructor library for Pydantic-based extraction, and Outlines for guided generation. The goal is eliminating parsing failures from malformed LLM output.
This resource is for developers building production LLM applications where output reliability matters. If your pipeline breaks when the LLM returns markdown instead of JSON, these techniques solve that problem.
How it saves time or tokens
Without structured outputs, developers write fragile regex parsers and retry loops to extract data from LLM responses. Structured output techniques guarantee valid JSON on every call, eliminating retries and parsing failures. This reduces both token waste (no retry calls) and engineering time (no custom parsers). The estimated token cost for applying these techniques is around 4,000 tokens.
How to use
- Choose a structured output technique based on your LLM provider.
- Define your output schema (JSON Schema, Pydantic model, or type definition).
- Pass the schema to the LLM along with your prompt.
- Receive validated, parsed output directly.
# OpenAI Structured Outputs
from openai import OpenAI
from pydantic import BaseModel
class ExtractedData(BaseModel):
name: str
age: int
skills: list[str]
client = OpenAI()
response = client.beta.chat.completions.parse(
model='gpt-4o',
messages=[{'role': 'user', 'content': 'Extract: John is 30, knows Python and Go'}],
response_format=ExtractedData
)
data = response.choices[0].message.parsed
print(data.name) # 'John'
print(data.age) # 30
print(data.skills) # ['Python', 'Go']
Example
Using Claude's tool use for structured output:
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model='claude-sonnet-4-20250514',
max_tokens=1024,
tools=[{
'name': 'extract_data',
'description': 'Extract structured data from text',
'input_schema': {
'type': 'object',
'properties': {
'name': {'type': 'string'},
'age': {'type': 'integer'},
'skills': {'type': 'array', 'items': {'type': 'string'}}
},
'required': ['name', 'age', 'skills']
}
}],
tool_choice={'type': 'tool', 'name': 'extract_data'},
messages=[{'role': 'user', 'content': 'Extract: John is 30, knows Python and Go'}]
)
# response.content[0].input contains validated JSON
Related on TokRepo
- Prompt library — Prompt engineering techniques
- AI coding tools — Development tools for LLM apps
Common pitfalls
- Structured output modes add latency compared to free-form generation. The model must constrain its output at each token.
- Complex nested schemas may increase error rates. Keep schemas as flat as possible.
- OpenAI and Claude handle structured outputs differently. Code written for one provider needs adaptation for the other.
- Enum fields work well for classification but the model may hallucinate values not in the enum if the schema is not strict.
- The Instructor library adds a dependency but provides a unified interface across providers, which simplifies multi-provider setups.
Frequently Asked Questions
JSON mode guarantees valid JSON syntax but does not enforce a specific schema. Structured outputs guarantee both valid JSON and conformance to a defined schema with specific fields, types, and required properties.
Claude uses tool use as its structured output mechanism. Define a tool with the desired output schema and force the model to call it. The tool input will contain validated structured data.
Instructor is a Python library that adds Pydantic-based structured output to any LLM provider. Define a Pydantic model, and Instructor handles schema conversion, validation, and retry logic across OpenAI, Anthropic, and others.
Outlines uses guided generation at the token level for local models. It constrains the model's vocabulary at each step to guarantee valid output. This works with open-source models where API-level structured output is not available.
Yes. JSON Schema supports optional fields, default values, and nullable types. Both OpenAI and Claude handle optional fields correctly when the schema specifies them.
Citations (3)
- OpenAI Structured Outputs Docs— OpenAI structured outputs for guaranteed JSON schema conformance
- Anthropic Tool Use Docs— Claude tool use for structured responses
- Instructor GitHub— Instructor library for Pydantic-based LLM outputs
Related on TokRepo
Source & Thanks
References:
- OpenAI Structured Outputs
- Anthropic Tool Use
- Instructor — 9k+ stars
- Outlines — 10k+ stars