Cette page est affichée en anglais. Une traduction française est en cours.
SkillsMay 8, 2026·4 min de lecture

Fireworks JSON Mode + Function Calling on Open Models

Fireworks supports OpenAI-compat JSON mode, JSON Schema, and tool calling on Llama 3.3, Mixtral, Qwen. Same code, cheaper open weights.

Prêt pour agents

Cet actif peut être lu et installé directement par les agents

TokRepo expose une commande CLI universelle, un contrat d'installation, le metadata JSON, un plan selon l'adaptateur et le contenu raw pour aider les agents à juger l'adaptation, le risque et les prochaines actions.

Stage only · 17/100Stage only
Surface agent
Tout agent MCP/CLI
Type
Skill
Installation
Stage only
Confiance
Confiance : New
Point d'entrée
Asset
Commande CLI universelle
npx tokrepo install e0bbff9c-bb67-4574-bb3e-d7b9375ed44b
Introduction

Fireworks AI exposes OpenAI's structured-output features — response_format='json_object', response_format='json_schema', and tool/function calling — on Llama 3.3 70B, Mixtral 8×22B, Qwen 2.5 72B, and DeepSeek-V3. Same code as OpenAI, costs 5-10× less. Best for: data extraction pipelines, structured agents that don't justify gpt-4o cost, regression-tested classifiers. Works with: openai-python, openai-node, LangChain with_structured_output, Vercel AI SDK. Setup time: 5 minutes.


JSON Schema mode (strict structured output)

from openai import OpenAI
import json

client = OpenAI(base_url="https://api.fireworks.ai/inference/v1", api_key=os.environ["FIREWORKS_API_KEY"])

schema = {
    "type": "object",
    "properties": {
        "name":     {"type": "string"},
        "salary":   {"type": "number"},
        "company":  {"type": "string"},
        "is_remote":{"type": "boolean"},
    },
    "required": ["name", "company"],
}

resp = client.chat.completions.create(
    model="accounts/fireworks/models/llama-v3p3-70b-instruct",
    messages=[{"role": "user", "content": "Extract: Jane Smith earns $180K at Stripe and works fully remote"}],
    response_format={"type": "json_schema", "json_schema": {"name": "Person", "schema": schema}},
)
data = json.loads(resp.choices[0].message.content)
print(data)  # {"name":"Jane Smith","salary":180000,"company":"Stripe","is_remote":true}

Plain JSON mode (less strict, no schema)

resp = client.chat.completions.create(
    model="accounts/fireworks/models/mixtral-8x22b-instruct",
    messages=[{"role": "user", "content": "Return a JSON object with three coding tips"}],
    response_format={"type": "json_object"},
)

Tool calling

tools = [{
    "type": "function",
    "function": {
        "name": "create_invoice",
        "description": "Create a Stripe invoice for a customer",
        "parameters": {
            "type": "object",
            "properties": {
                "customer_email": {"type": "string"},
                "amount_usd":     {"type": "number"},
                "description":    {"type": "string"},
            },
            "required": ["customer_email", "amount_usd"],
        },
    },
}]

resp = client.chat.completions.create(
    model="accounts/fireworks/models/llama-v3p3-70b-instruct",
    messages=[{"role": "user", "content": "Bill jane@acme.com $5000 for May retainer"}],
    tools=tools,
)
print(resp.choices[0].message.tool_calls)

Why not just gpt-4o?

Item Llama 3.3 70B (Fireworks) gpt-4o
Cost / 1M input $0.90 $5.00
Cost / 1M output $0.90 $15.00
JSON Schema strict Yes Yes
Tool calling Yes Yes
Latency p50 ~700ms ~600ms
Quality on extraction ~95% of gpt-4o 100% baseline

FAQ

Q: Is JSON Schema mode strict like OpenAI's? A: Yes — Fireworks compiles the schema into a constrained-decoding grammar. Output is guaranteed valid against the schema. Prompt the model normally; structure is enforced at decode time.

Q: Which models support tool calling? A: Llama 3.x (8B + 70B), Mixtral 8×22B, Qwen 2.5, DeepSeek-V3. The model card on docs.fireworks.ai marks tool support per model. Llama 3.3 70B is the production default.

Q: How does this compare to Outlines / Instructor? A: Outlines and Instructor are client-side libraries that re-prompt or post-process. Fireworks' JSON Schema runs server-side via constrained decoding — fewer round trips, lower latency, no token budget for retries. Use them in addition to Fireworks for things like Pydantic class binding (Instructor).


Quick Use

  1. Use response_format={'type':'json_schema','json_schema':{'schema':{...}}} for strict mode
  2. Or response_format={'type':'json_object'} for loose JSON
  3. Tools via standard OpenAI tools=[...]

Intro

Fireworks AI exposes OpenAI's structured-output features — response_format='json_object', response_format='json_schema', and tool/function calling — on Llama 3.3 70B, Mixtral 8×22B, Qwen 2.5 72B, and DeepSeek-V3. Same code as OpenAI, costs 5-10× less. Best for: data extraction pipelines, structured agents that don't justify gpt-4o cost, regression-tested classifiers. Works with: openai-python, openai-node, LangChain with_structured_output, Vercel AI SDK. Setup time: 5 minutes.


JSON Schema mode (strict structured output)

from openai import OpenAI
import json

client = OpenAI(base_url="https://api.fireworks.ai/inference/v1", api_key=os.environ["FIREWORKS_API_KEY"])

schema = {
    "type": "object",
    "properties": {
        "name":     {"type": "string"},
        "salary":   {"type": "number"},
        "company":  {"type": "string"},
        "is_remote":{"type": "boolean"},
    },
    "required": ["name", "company"],
}

resp = client.chat.completions.create(
    model="accounts/fireworks/models/llama-v3p3-70b-instruct",
    messages=[{"role": "user", "content": "Extract: Jane Smith earns $180K at Stripe and works fully remote"}],
    response_format={"type": "json_schema", "json_schema": {"name": "Person", "schema": schema}},
)
data = json.loads(resp.choices[0].message.content)
print(data)  # {"name":"Jane Smith","salary":180000,"company":"Stripe","is_remote":true}

Plain JSON mode (less strict, no schema)

resp = client.chat.completions.create(
    model="accounts/fireworks/models/mixtral-8x22b-instruct",
    messages=[{"role": "user", "content": "Return a JSON object with three coding tips"}],
    response_format={"type": "json_object"},
)

Tool calling

tools = [{
    "type": "function",
    "function": {
        "name": "create_invoice",
        "description": "Create a Stripe invoice for a customer",
        "parameters": {
            "type": "object",
            "properties": {
                "customer_email": {"type": "string"},
                "amount_usd":     {"type": "number"},
                "description":    {"type": "string"},
            },
            "required": ["customer_email", "amount_usd"],
        },
    },
}]

resp = client.chat.completions.create(
    model="accounts/fireworks/models/llama-v3p3-70b-instruct",
    messages=[{"role": "user", "content": "Bill jane@acme.com $5000 for May retainer"}],
    tools=tools,
)
print(resp.choices[0].message.tool_calls)

Why not just gpt-4o?

Item Llama 3.3 70B (Fireworks) gpt-4o
Cost / 1M input $0.90 $5.00
Cost / 1M output $0.90 $15.00
JSON Schema strict Yes Yes
Tool calling Yes Yes
Latency p50 ~700ms ~600ms
Quality on extraction ~95% of gpt-4o 100% baseline

FAQ

Q: Is JSON Schema mode strict like OpenAI's? A: Yes — Fireworks compiles the schema into a constrained-decoding grammar. Output is guaranteed valid against the schema. Prompt the model normally; structure is enforced at decode time.

Q: Which models support tool calling? A: Llama 3.x (8B + 70B), Mixtral 8×22B, Qwen 2.5, DeepSeek-V3. The model card on docs.fireworks.ai marks tool support per model. Llama 3.3 70B is the production default.

Q: How does this compare to Outlines / Instructor? A: Outlines and Instructor are client-side libraries that re-prompt or post-process. Fireworks' JSON Schema runs server-side via constrained decoding — fewer round trips, lower latency, no token budget for retries. Use them in addition to Fireworks for things like Pydantic class binding (Instructor).


Source & Thanks

Built by Fireworks AI. Structured output docs at docs.fireworks.ai/structured-responses.

Open SDKs at github.com/fw-ai

🙏

Source et remerciements

Built by Fireworks AI. Structured output docs at docs.fireworks.ai/structured-responses.

Open SDKs at github.com/fw-ai

Fil de discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires