SkillsMay 8, 2026·4 min read

Fireworks JSON Mode + Function Calling on Open Models

Fireworks supports OpenAI-compat JSON mode, JSON Schema, and tool calling on Llama 3.3, Mixtral, Qwen. Same code, cheaper open weights.

Agent ready

This asset can be read and installed directly by agents

TokRepo exposes a universal CLI command, install contract, metadata JSON, adapter-aware plan, and raw content links so agents can judge fit, risk, and next actions.

Stage only · 17/100Stage only
Agent surface
Any MCP/CLI agent
Kind
Skill
Install
Stage only
Trust
Trust: New
Entrypoint
Asset
Universal CLI install command
npx tokrepo install e0bbff9c-bb67-4574-bb3e-d7b9375ed44b
Intro

Fireworks AI exposes OpenAI's structured-output features — response_format='json_object', response_format='json_schema', and tool/function calling — on Llama 3.3 70B, Mixtral 8×22B, Qwen 2.5 72B, and DeepSeek-V3. Same code as OpenAI, costs 5-10× less. Best for: data extraction pipelines, structured agents that don't justify gpt-4o cost, regression-tested classifiers. Works with: openai-python, openai-node, LangChain with_structured_output, Vercel AI SDK. Setup time: 5 minutes.


JSON Schema mode (strict structured output)

from openai import OpenAI
import json

client = OpenAI(base_url="https://api.fireworks.ai/inference/v1", api_key=os.environ["FIREWORKS_API_KEY"])

schema = {
    "type": "object",
    "properties": {
        "name":     {"type": "string"},
        "salary":   {"type": "number"},
        "company":  {"type": "string"},
        "is_remote":{"type": "boolean"},
    },
    "required": ["name", "company"],
}

resp = client.chat.completions.create(
    model="accounts/fireworks/models/llama-v3p3-70b-instruct",
    messages=[{"role": "user", "content": "Extract: Jane Smith earns $180K at Stripe and works fully remote"}],
    response_format={"type": "json_schema", "json_schema": {"name": "Person", "schema": schema}},
)
data = json.loads(resp.choices[0].message.content)
print(data)  # {"name":"Jane Smith","salary":180000,"company":"Stripe","is_remote":true}

Plain JSON mode (less strict, no schema)

resp = client.chat.completions.create(
    model="accounts/fireworks/models/mixtral-8x22b-instruct",
    messages=[{"role": "user", "content": "Return a JSON object with three coding tips"}],
    response_format={"type": "json_object"},
)

Tool calling

tools = [{
    "type": "function",
    "function": {
        "name": "create_invoice",
        "description": "Create a Stripe invoice for a customer",
        "parameters": {
            "type": "object",
            "properties": {
                "customer_email": {"type": "string"},
                "amount_usd":     {"type": "number"},
                "description":    {"type": "string"},
            },
            "required": ["customer_email", "amount_usd"],
        },
    },
}]

resp = client.chat.completions.create(
    model="accounts/fireworks/models/llama-v3p3-70b-instruct",
    messages=[{"role": "user", "content": "Bill jane@acme.com $5000 for May retainer"}],
    tools=tools,
)
print(resp.choices[0].message.tool_calls)

Why not just gpt-4o?

Item Llama 3.3 70B (Fireworks) gpt-4o
Cost / 1M input $0.90 $5.00
Cost / 1M output $0.90 $15.00
JSON Schema strict Yes Yes
Tool calling Yes Yes
Latency p50 ~700ms ~600ms
Quality on extraction ~95% of gpt-4o 100% baseline

FAQ

Q: Is JSON Schema mode strict like OpenAI's? A: Yes — Fireworks compiles the schema into a constrained-decoding grammar. Output is guaranteed valid against the schema. Prompt the model normally; structure is enforced at decode time.

Q: Which models support tool calling? A: Llama 3.x (8B + 70B), Mixtral 8×22B, Qwen 2.5, DeepSeek-V3. The model card on docs.fireworks.ai marks tool support per model. Llama 3.3 70B is the production default.

Q: How does this compare to Outlines / Instructor? A: Outlines and Instructor are client-side libraries that re-prompt or post-process. Fireworks' JSON Schema runs server-side via constrained decoding — fewer round trips, lower latency, no token budget for retries. Use them in addition to Fireworks for things like Pydantic class binding (Instructor).


Quick Use

  1. Use response_format={'type':'json_schema','json_schema':{'schema':{...}}} for strict mode
  2. Or response_format={'type':'json_object'} for loose JSON
  3. Tools via standard OpenAI tools=[...]

Intro

Fireworks AI exposes OpenAI's structured-output features — response_format='json_object', response_format='json_schema', and tool/function calling — on Llama 3.3 70B, Mixtral 8×22B, Qwen 2.5 72B, and DeepSeek-V3. Same code as OpenAI, costs 5-10× less. Best for: data extraction pipelines, structured agents that don't justify gpt-4o cost, regression-tested classifiers. Works with: openai-python, openai-node, LangChain with_structured_output, Vercel AI SDK. Setup time: 5 minutes.


JSON Schema mode (strict structured output)

from openai import OpenAI
import json

client = OpenAI(base_url="https://api.fireworks.ai/inference/v1", api_key=os.environ["FIREWORKS_API_KEY"])

schema = {
    "type": "object",
    "properties": {
        "name":     {"type": "string"},
        "salary":   {"type": "number"},
        "company":  {"type": "string"},
        "is_remote":{"type": "boolean"},
    },
    "required": ["name", "company"],
}

resp = client.chat.completions.create(
    model="accounts/fireworks/models/llama-v3p3-70b-instruct",
    messages=[{"role": "user", "content": "Extract: Jane Smith earns $180K at Stripe and works fully remote"}],
    response_format={"type": "json_schema", "json_schema": {"name": "Person", "schema": schema}},
)
data = json.loads(resp.choices[0].message.content)
print(data)  # {"name":"Jane Smith","salary":180000,"company":"Stripe","is_remote":true}

Plain JSON mode (less strict, no schema)

resp = client.chat.completions.create(
    model="accounts/fireworks/models/mixtral-8x22b-instruct",
    messages=[{"role": "user", "content": "Return a JSON object with three coding tips"}],
    response_format={"type": "json_object"},
)

Tool calling

tools = [{
    "type": "function",
    "function": {
        "name": "create_invoice",
        "description": "Create a Stripe invoice for a customer",
        "parameters": {
            "type": "object",
            "properties": {
                "customer_email": {"type": "string"},
                "amount_usd":     {"type": "number"},
                "description":    {"type": "string"},
            },
            "required": ["customer_email", "amount_usd"],
        },
    },
}]

resp = client.chat.completions.create(
    model="accounts/fireworks/models/llama-v3p3-70b-instruct",
    messages=[{"role": "user", "content": "Bill jane@acme.com $5000 for May retainer"}],
    tools=tools,
)
print(resp.choices[0].message.tool_calls)

Why not just gpt-4o?

Item Llama 3.3 70B (Fireworks) gpt-4o
Cost / 1M input $0.90 $5.00
Cost / 1M output $0.90 $15.00
JSON Schema strict Yes Yes
Tool calling Yes Yes
Latency p50 ~700ms ~600ms
Quality on extraction ~95% of gpt-4o 100% baseline

FAQ

Q: Is JSON Schema mode strict like OpenAI's? A: Yes — Fireworks compiles the schema into a constrained-decoding grammar. Output is guaranteed valid against the schema. Prompt the model normally; structure is enforced at decode time.

Q: Which models support tool calling? A: Llama 3.x (8B + 70B), Mixtral 8×22B, Qwen 2.5, DeepSeek-V3. The model card on docs.fireworks.ai marks tool support per model. Llama 3.3 70B is the production default.

Q: How does this compare to Outlines / Instructor? A: Outlines and Instructor are client-side libraries that re-prompt or post-process. Fireworks' JSON Schema runs server-side via constrained decoding — fewer round trips, lower latency, no token budget for retries. Use them in addition to Fireworks for things like Pydantic class binding (Instructor).


Source & Thanks

Built by Fireworks AI. Structured output docs at docs.fireworks.ai/structured-responses.

Open SDKs at github.com/fw-ai

🙏

Source & Thanks

Built by Fireworks AI. Structured output docs at docs.fireworks.ai/structured-responses.

Open SDKs at github.com/fw-ai

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets