How do I install Fireworks JSON Mode + Function Calling on Open Models?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

Cette page est affichée en anglais. Une traduction française est en cours.

SkillsMay 8, 2026·4 min de lecture

Fireworks JSON Mode + Function Calling on Open Models

Name: Fireworks JSON Mode + Function Calling on Open Models
Author: Fireworks AI

Fireworks supports OpenAI-compat JSON mode, JSON Schema, and tool calling on Llama 3.3, Mixtral, Qwen. Same code, cheaper open weights.

Fireworks AI · Community

Prêt pour agents

Cet actif peut être lu et installé directement par les agents

TokRepo expose une commande CLI universelle, un contrat d'installation, le metadata JSON, un plan selon l'adaptateur et le contenu raw pour aider les agents à juger l'adaptation, le risque et les prochaines actions.

Stage only · 17/100Stage only

Surface agent

Tout agent MCP/CLI

Type

Skill

Installation

Stage only

Confiance

Confiance : New

Point d'entrée

Asset

Commande CLI universelle

npx tokrepo install e0bbff9c-bb67-4574-bb3e-d7b9375ed44b

contrat d'installation JSON metadata plan adaptateur contenu raw

Introduction

Fireworks AI exposes OpenAI's structured-output features — response_format='json_object', response_format='json_schema', and tool/function calling — on Llama 3.3 70B, Mixtral 8×22B, Qwen 2.5 72B, and DeepSeek-V3. Same code as OpenAI, costs 5-10× less. Best for: data extraction pipelines, structured agents that don't justify gpt-4o cost, regression-tested classifiers. Works with: openai-python, openai-node, LangChain with_structured_output, Vercel AI SDK. Setup time: 5 minutes.

JSON Schema mode (strict structured output)

from openai import OpenAI
import json

client = OpenAI(base_url="https://api.fireworks.ai/inference/v1", api_key=os.environ["FIREWORKS_API_KEY"])

schema = {
    "type": "object",
    "properties": {
        "name":     {"type": "string"},
        "salary":   {"type": "number"},
        "company":  {"type": "string"},
        "is_remote":{"type": "boolean"},
    },
    "required": ["name", "company"],
}

resp = client.chat.completions.create(
    model="accounts/fireworks/models/llama-v3p3-70b-instruct",
    messages=[{"role": "user", "content": "Extract: Jane Smith earns $180K at Stripe and works fully remote"}],
    response_format={"type": "json_schema", "json_schema": {"name": "Person", "schema": schema}},
)
data = json.loads(resp.choices[0].message.content)
print(data)  # {"name":"Jane Smith","salary":180000,"company":"Stripe","is_remote":true}

Plain JSON mode (less strict, no schema)

resp = client.chat.completions.create(
    model="accounts/fireworks/models/mixtral-8x22b-instruct",
    messages=[{"role": "user", "content": "Return a JSON object with three coding tips"}],
    response_format={"type": "json_object"},
)

Tool calling

tools = [{
    "type": "function",
    "function": {
        "name": "create_invoice",
        "description": "Create a Stripe invoice for a customer",
        "parameters": {
            "type": "object",
            "properties": {
                "customer_email": {"type": "string"},
                "amount_usd":     {"type": "number"},
                "description":    {"type": "string"},
            },
            "required": ["customer_email", "amount_usd"],
        },
    },
}]

resp = client.chat.completions.create(
    model="accounts/fireworks/models/llama-v3p3-70b-instruct",
    messages=[{"role": "user", "content": "Bill jane@acme.com $5000 for May retainer"}],
    tools=tools,
)
print(resp.choices[0].message.tool_calls)

Why not just gpt-4o?

Item	Llama 3.3 70B (Fireworks)	gpt-4o
Cost / 1M input	$0.90	$5.00
Cost / 1M output	$0.90	$15.00
JSON Schema strict	Yes	Yes
Tool calling	Yes	Yes
Latency p50	~700ms	~600ms
Quality on extraction	~95% of gpt-4o	100% baseline

FAQ

Q: Is JSON Schema mode strict like OpenAI's? A: Yes — Fireworks compiles the schema into a constrained-decoding grammar. Output is guaranteed valid against the schema. Prompt the model normally; structure is enforced at decode time.

Q: Which models support tool calling? A: Llama 3.x (8B + 70B), Mixtral 8×22B, Qwen 2.5, DeepSeek-V3. The model card on docs.fireworks.ai marks tool support per model. Llama 3.3 70B is the production default.

Q: How does this compare to Outlines / Instructor? A: Outlines and Instructor are client-side libraries that re-prompt or post-process. Fireworks' JSON Schema runs server-side via constrained decoding — fewer round trips, lower latency, no token budget for retries. Use them in addition to Fireworks for things like Pydantic class binding (Instructor).

Quick Use

Use response_format={'type':'json_schema','json_schema':{'schema':{...}}} for strict mode
Or response_format={'type':'json_object'} for loose JSON
Tools via standard OpenAI tools=[...]

Intro

JSON Schema mode (strict structured output)

from openai import OpenAI
import json

client = OpenAI(base_url="https://api.fireworks.ai/inference/v1", api_key=os.environ["FIREWORKS_API_KEY"])

schema = {
    "type": "object",
    "properties": {
        "name":     {"type": "string"},
        "salary":   {"type": "number"},
        "company":  {"type": "string"},
        "is_remote":{"type": "boolean"},
    },
    "required": ["name", "company"],
}

resp = client.chat.completions.create(
    model="accounts/fireworks/models/llama-v3p3-70b-instruct",
    messages=[{"role": "user", "content": "Extract: Jane Smith earns $180K at Stripe and works fully remote"}],
    response_format={"type": "json_schema", "json_schema": {"name": "Person", "schema": schema}},
)
data = json.loads(resp.choices[0].message.content)
print(data)  # {"name":"Jane Smith","salary":180000,"company":"Stripe","is_remote":true}

Plain JSON mode (less strict, no schema)

resp = client.chat.completions.create(
    model="accounts/fireworks/models/mixtral-8x22b-instruct",
    messages=[{"role": "user", "content": "Return a JSON object with three coding tips"}],
    response_format={"type": "json_object"},
)

Tool calling

tools = [{
    "type": "function",
    "function": {
        "name": "create_invoice",
        "description": "Create a Stripe invoice for a customer",
        "parameters": {
            "type": "object",
            "properties": {
                "customer_email": {"type": "string"},
                "amount_usd":     {"type": "number"},
                "description":    {"type": "string"},
            },
            "required": ["customer_email", "amount_usd"],
        },
    },
}]

resp = client.chat.completions.create(
    model="accounts/fireworks/models/llama-v3p3-70b-instruct",
    messages=[{"role": "user", "content": "Bill jane@acme.com $5000 for May retainer"}],
    tools=tools,
)
print(resp.choices[0].message.tool_calls)

Why not just gpt-4o?

Item	Llama 3.3 70B (Fireworks)	gpt-4o
Cost / 1M input	$0.90	$5.00
Cost / 1M output	$0.90	$15.00
JSON Schema strict	Yes	Yes
Tool calling	Yes	Yes
Latency p50	~700ms	~600ms
Quality on extraction	~95% of gpt-4o	100% baseline

FAQ

Source & Thanks

Built by Fireworks AI. Structured output docs at docs.fireworks.ai/structured-responses.

Open SDKs at github.com/fw-ai

🙏

Source et remerciements

Built by Fireworks AI. Structured output docs at docs.fireworks.ai/structured-responses.

Open SDKs at github.com/fw-ai

Fil de discussion

Connectez-vous pour rejoindre la discussion.

Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires

Fireworks Inference — 100+ Open Models on OpenAI-Compat API

Fireworks runs Llama, Mixtral, DeepSeek, Qwen, Phi via OpenAI-compat API. Sub-second TTFT, speculative decoding on flagship models.

Knowledge

Fireworks AI

Groq Tool Use — Llama 3.3 Function Calling at 280 tok/s

Groq runs Llama 3.3 70B with native tool calling at 280 tok/sec. Multi-turn loops in 1-2 sec. Drop-in OpenAI format. Parallel calls supported.

Skills

Groq

Together AI Chat Completions Skill for Claude Code

Skill that teaches Claude Code how to use Together AI chat completions API. Covers streaming, tool calling, JSON mode, and model selection with correct SDK patterns.

Skills

Together AI

Open Interpreter OS Mode — Natural-Language Computer Control

Open Interpreter OS Mode adds full computer control via screenshots + clicks. Drives any GUI app — terminal, browser, Photoshop — with natural language.

Scripts

Open Interpreter