PromptsApr 8, 2026·3 min read

Structured Outputs — Force LLMs to Return Valid JSON

Complete guide to getting reliable structured JSON from LLMs. Covers OpenAI structured outputs, Claude tool use, Instructor library, and Outlines for guaranteed valid responses.

TL;DR
A guide to techniques for getting reliable, validated JSON output from any LLM.
§01

What it is

This guide covers techniques for forcing LLMs to return valid structured JSON output. It explains OpenAI's structured outputs mode, Claude's tool use for structured responses, the Instructor library for Pydantic-based extraction, and Outlines for guided generation. The goal is eliminating parsing failures from malformed LLM output.

This resource is for developers building production LLM applications where output reliability matters. If your pipeline breaks when the LLM returns markdown instead of JSON, these techniques solve that problem.

§02

How it saves time or tokens

Without structured outputs, developers write fragile regex parsers and retry loops to extract data from LLM responses. Structured output techniques guarantee valid JSON on every call, eliminating retries and parsing failures. This reduces both token waste (no retry calls) and engineering time (no custom parsers). The estimated token cost for applying these techniques is around 4,000 tokens.

§03

How to use

  1. Choose a structured output technique based on your LLM provider.
  2. Define your output schema (JSON Schema, Pydantic model, or type definition).
  3. Pass the schema to the LLM along with your prompt.
  4. Receive validated, parsed output directly.
# OpenAI Structured Outputs
from openai import OpenAI
from pydantic import BaseModel

class ExtractedData(BaseModel):
    name: str
    age: int
    skills: list[str]

client = OpenAI()
response = client.beta.chat.completions.parse(
    model='gpt-4o',
    messages=[{'role': 'user', 'content': 'Extract: John is 30, knows Python and Go'}],
    response_format=ExtractedData
)

data = response.choices[0].message.parsed
print(data.name)    # 'John'
print(data.age)     # 30
print(data.skills)  # ['Python', 'Go']
§04

Example

Using Claude's tool use for structured output:

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model='claude-sonnet-4-20250514',
    max_tokens=1024,
    tools=[{
        'name': 'extract_data',
        'description': 'Extract structured data from text',
        'input_schema': {
            'type': 'object',
            'properties': {
                'name': {'type': 'string'},
                'age': {'type': 'integer'},
                'skills': {'type': 'array', 'items': {'type': 'string'}}
            },
            'required': ['name', 'age', 'skills']
        }
    }],
    tool_choice={'type': 'tool', 'name': 'extract_data'},
    messages=[{'role': 'user', 'content': 'Extract: John is 30, knows Python and Go'}]
)

# response.content[0].input contains validated JSON
§05

Related on TokRepo

§06

Common pitfalls

  • Structured output modes add latency compared to free-form generation. The model must constrain its output at each token.
  • Complex nested schemas may increase error rates. Keep schemas as flat as possible.
  • OpenAI and Claude handle structured outputs differently. Code written for one provider needs adaptation for the other.
  • Enum fields work well for classification but the model may hallucinate values not in the enum if the schema is not strict.
  • The Instructor library adds a dependency but provides a unified interface across providers, which simplifies multi-provider setups.

Frequently Asked Questions

What is the difference between structured outputs and JSON mode?+

JSON mode guarantees valid JSON syntax but does not enforce a specific schema. Structured outputs guarantee both valid JSON and conformance to a defined schema with specific fields, types, and required properties.

Does Claude support structured outputs natively?+

Claude uses tool use as its structured output mechanism. Define a tool with the desired output schema and force the model to call it. The tool input will contain validated structured data.

What is the Instructor library?+

Instructor is a Python library that adds Pydantic-based structured output to any LLM provider. Define a Pydantic model, and Instructor handles schema conversion, validation, and retry logic across OpenAI, Anthropic, and others.

How does Outlines differ from API-based structured outputs?+

Outlines uses guided generation at the token level for local models. It constrains the model's vocabulary at each step to guarantee valid output. This works with open-source models where API-level structured output is not available.

Can structured outputs handle optional fields?+

Yes. JSON Schema supports optional fields, default values, and nullable types. Both OpenAI and Claude handle optional fields correctly when the schema specifies them.

Citations (3)
🙏

Source & Thanks

References:

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.