What is Fireworks JSON Mode + Function Calling on Open Models?

Fireworks supports OpenAI-compat JSON mode, JSON Schema, and tool calling on Llama 3.3, Mixtral, Qwen. Same code, cheaper open weights.

How do I install Fireworks JSON Mode + Function Calling on Open Models?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

Fireworks JSON Mode + Function Calling on Open Models

from openai import OpenAI import json client = OpenAI(base_url="https://api.fireworks.ai/inference/v1", api_key=os.environ["FIREWORKS_API_KEY"]) schema = { "type": "object", "properties": { "name": {"type": "string"}, "salary": {"type": "number"}, "company": {"type": "string"}, "is_remote":{"type": "boolean"}, }, "required": ["name", "company"], } resp = client.chat.completions.create( model="accounts/fireworks/models/llama-v3p3-70b-instruct", messages=[{"role": "user", "content": "Extract: Jane Smith earns $180K at Stripe and works fully remote"}], response_format={"type": "json_schema", "json_schema": {"name": "Person", "schema": schema}}, ) data = json.loads(resp.choices[0].message.content) print(data) # {"name":"Jane Smith","salary":180000,"company":"Stripe","is_remote":true}

resp = client.chat.completions.create( model="accounts/fireworks/models/mixtral-8x22b-instruct", messages=[{"role": "user", "content": "Return a JSON object with three coding tips"}], response_format={"type": "json_object"}, )

tools = [{ "type": "function", "function": { "name": "create_invoice", "description": "Create a Stripe invoice for a customer", "parameters": { "type": "object", "properties": { "customer_email": {"type": "string"}, "amount_usd": {"type": "number"}, "description": {"type": "string"}, }, "required": ["customer_email", "amount_usd"], }, }, }] resp = client.chat.completions.create( model="accounts/fireworks/models/llama-v3p3-70b-instruct", messages=[{"role": "user", "content": "Bill jane@acme.com $5000 for May retainer"}], tools=tools, ) print(resp.choices[0].message.tool_calls)

Item

Llama 3.3 70B (Fireworks)

gpt-4o

Cost / 1M input

$0.90

$5.00

Cost / 1M output

$0.90

$15.00

JSON Schema strict

Yes

Tool calling

Yes

Latency p50

~700ms

~600ms

Quality on extraction

~95% of gpt-4o

100% baseline

Quick Use

Use response_format={'type':'json_schema','json_schema':{'schema':{...}}} for strict mode
Or response_format={'type':'json_object'} for loose JSON
Tools via standard OpenAI tools=[...]

Intro

Fireworks AI exposes OpenAI's structured-output features — response_format='json_object', response_format='json_schema', and tool/function calling — on Llama 3.3 70B, Mixtral 8×22B, Qwen 2.5 72B, and DeepSeek-V3. Same code as OpenAI, costs 5-10× less. Best for: data extraction pipelines, structured agents that don't justify gpt-4o cost, regression-tested classifiers. Works with: openai-python, openai-node, LangChain with_structured_output, Vercel AI SDK. Setup time: 5 minutes.

JSON Schema mode (strict structured output)

from openai import OpenAI
import json

client = OpenAI(base_url="https://api.fireworks.ai/inference/v1", api_key=os.environ["FIREWORKS_API_KEY"])

schema = {
    "type": "object",
    "properties": {
        "name":     {"type": "string"},
        "salary":   {"type": "number"},
        "company":  {"type": "string"},
        "is_remote":{"type": "boolean"},
    },
    "required": ["name", "company"],
}

resp = client.chat.completions.create(
    model="accounts/fireworks/models/llama-v3p3-70b-instruct",
    messages=[{"role": "user", "content": "Extract: Jane Smith earns $180K at Stripe and works fully remote"}],
    response_format={"type": "json_schema", "json_schema": {"name": "Person", "schema": schema}},
)
data = json.loads(resp.choices[0].message.content)
print(data)  # {"name":"Jane Smith","salary":180000,"company":"Stripe","is_remote":true}

Plain JSON mode (less strict, no schema)

resp = client.chat.completions.create(
    model="accounts/fireworks/models/mixtral-8x22b-instruct",
    messages=[{"role": "user", "content": "Return a JSON object with three coding tips"}],
    response_format={"type": "json_object"},
)

Tool calling

tools = [{
    "type": "function",
    "function": {
        "name": "create_invoice",
        "description": "Create a Stripe invoice for a customer",
        "parameters": {
            "type": "object",
            "properties": {
                "customer_email": {"type": "string"},
                "amount_usd":     {"type": "number"},
                "description":    {"type": "string"},
            },
            "required": ["customer_email", "amount_usd"],
        },
    },
}]

resp = client.chat.completions.create(
    model="accounts/fireworks/models/llama-v3p3-70b-instruct",
    messages=[{"role": "user", "content": "Bill jane@acme.com $5000 for May retainer"}],
    tools=tools,
)
print(resp.choices[0].message.tool_calls)

Why not just gpt-4o?

Item	Llama 3.3 70B (Fireworks)	gpt-4o
Cost / 1M input	$0.90	$5.00
Cost / 1M output	$0.90	$15.00
JSON Schema strict	Yes	Yes
Tool calling	Yes	Yes
Latency p50	~700ms	~600ms
Quality on extraction	~95% of gpt-4o	100% baseline

FAQ

Q: Is JSON Schema mode strict like OpenAI's? A: Yes — Fireworks compiles the schema into a constrained-decoding grammar. Output is guaranteed valid against the schema. Prompt the model normally; structure is enforced at decode time.

Q: Which models support tool calling? A: Llama 3.x (8B + 70B), Mixtral 8×22B, Qwen 2.5, DeepSeek-V3. The model card on docs.fireworks.ai marks tool support per model. Llama 3.3 70B is the production default.

Q: How does this compare to Outlines / Instructor? A: Outlines and Instructor are client-side libraries that re-prompt or post-process. Fireworks' JSON Schema runs server-side via constrained decoding — fewer round trips, lower latency, no token budget for retries. Use them in addition to Fireworks for things like Pydantic class binding (Instructor).

Source & Thanks

Built by Fireworks AI. Structured output docs at docs.fireworks.ai/structured-responses.

Open SDKs at github.com/fw-ai

Fireworks JSON Mode + Function Calling on Open Models

Safe staging for this asset

JSON Schema mode (strict structured output)

Plain JSON mode (less strict, no schema)

Tool calling

Why not just gpt-4o?

FAQ

Quick Use

Intro

JSON Schema mode (strict structured output)

Plain JSON mode (less strict, no schema)

Tool calling

Why not just gpt-4o?

FAQ

Source & Thanks

Source & Thanks

Discussion

Related Assets

Fireworks Inference — 100+ Open Models on OpenAI-Compat API

Replicate — Run AI Models via Simple API Calls

Quicktype — Generate Type-Safe Code from JSON, Schema, and GraphQL

tiktoken — Fast BPE Tokenizer for OpenAI Models