Quick Use
- Use
response_format={'type':'json_schema','json_schema':{'schema':{...}}}for strict mode - Or
response_format={'type':'json_object'}for loose JSON - Tools via standard OpenAI
tools=[...]
Intro
Fireworks AI exposes OpenAI's structured-output features — response_format='json_object', response_format='json_schema', and tool/function calling — on Llama 3.3 70B, Mixtral 8×22B, Qwen 2.5 72B, and DeepSeek-V3. Same code as OpenAI, costs 5-10× less. Best for: data extraction pipelines, structured agents that don't justify gpt-4o cost, regression-tested classifiers. Works with: openai-python, openai-node, LangChain with_structured_output, Vercel AI SDK. Setup time: 5 minutes.
JSON Schema mode (strict structured output)
from openai import OpenAI
import json
client = OpenAI(base_url="https://api.fireworks.ai/inference/v1", api_key=os.environ["FIREWORKS_API_KEY"])
schema = {
"type": "object",
"properties": {
"name": {"type": "string"},
"salary": {"type": "number"},
"company": {"type": "string"},
"is_remote":{"type": "boolean"},
},
"required": ["name", "company"],
}
resp = client.chat.completions.create(
model="accounts/fireworks/models/llama-v3p3-70b-instruct",
messages=[{"role": "user", "content": "Extract: Jane Smith earns $180K at Stripe and works fully remote"}],
response_format={"type": "json_schema", "json_schema": {"name": "Person", "schema": schema}},
)
data = json.loads(resp.choices[0].message.content)
print(data) # {"name":"Jane Smith","salary":180000,"company":"Stripe","is_remote":true}Plain JSON mode (less strict, no schema)
resp = client.chat.completions.create(
model="accounts/fireworks/models/mixtral-8x22b-instruct",
messages=[{"role": "user", "content": "Return a JSON object with three coding tips"}],
response_format={"type": "json_object"},
)Tool calling
tools = [{
"type": "function",
"function": {
"name": "create_invoice",
"description": "Create a Stripe invoice for a customer",
"parameters": {
"type": "object",
"properties": {
"customer_email": {"type": "string"},
"amount_usd": {"type": "number"},
"description": {"type": "string"},
},
"required": ["customer_email", "amount_usd"],
},
},
}]
resp = client.chat.completions.create(
model="accounts/fireworks/models/llama-v3p3-70b-instruct",
messages=[{"role": "user", "content": "Bill jane@acme.com $5000 for May retainer"}],
tools=tools,
)
print(resp.choices[0].message.tool_calls)Why not just gpt-4o?
| Item | Llama 3.3 70B (Fireworks) | gpt-4o |
|---|---|---|
| Cost / 1M input | $0.90 | $5.00 |
| Cost / 1M output | $0.90 | $15.00 |
| JSON Schema strict | Yes | Yes |
| Tool calling | Yes | Yes |
| Latency p50 | ~700ms | ~600ms |
| Quality on extraction | ~95% of gpt-4o | 100% baseline |
FAQ
Q: Is JSON Schema mode strict like OpenAI's? A: Yes — Fireworks compiles the schema into a constrained-decoding grammar. Output is guaranteed valid against the schema. Prompt the model normally; structure is enforced at decode time.
Q: Which models support tool calling? A: Llama 3.x (8B + 70B), Mixtral 8×22B, Qwen 2.5, DeepSeek-V3. The model card on docs.fireworks.ai marks tool support per model. Llama 3.3 70B is the production default.
Q: How does this compare to Outlines / Instructor? A: Outlines and Instructor are client-side libraries that re-prompt or post-process. Fireworks' JSON Schema runs server-side via constrained decoding — fewer round trips, lower latency, no token budget for retries. Use them in addition to Fireworks for things like Pydantic class binding (Instructor).
Source & Thanks
Built by Fireworks AI. Structured output docs at docs.fireworks.ai/structured-responses.
Open SDKs at github.com/fw-ai