How do I install Fireworks JSON Mode + Function Calling on Open Models?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

Esta página se muestra en inglés. Una traducción al español está en curso.

SkillsMay 8, 2026·4 min de lectura

Fireworks JSON Mode + Function Calling on Open Models

Name: Fireworks JSON Mode + Function Calling on Open Models
Author: Fireworks AI

Fireworks supports OpenAI-compat JSON mode, JSON Schema, and tool calling on Llama 3.3, Mixtral, Qwen. Same code, cheaper open weights.

Fireworks AI · Community

Listo para agents

Este activo puede ser leído e instalado directamente por agents

TokRepo expone un comando CLI universal, contrato de instalación, metadata JSON, plan según adaptador y contenido raw para que los agents evalúen compatibilidad, riesgo y próximos pasos.

Stage only · 17/100Stage only

Superficie agent

Cualquier agent MCP/CLI

Tipo

Skill

Instalación

Stage only

Confianza

Confianza: New

Entrada

Asset

Comando CLI universal

npx tokrepo install e0bbff9c-bb67-4574-bb3e-d7b9375ed44b

contrato de instalación JSON de metadata plan adaptador contenido raw

Introducción

Fireworks AI exposes OpenAI's structured-output features — response_format='json_object', response_format='json_schema', and tool/function calling — on Llama 3.3 70B, Mixtral 8×22B, Qwen 2.5 72B, and DeepSeek-V3. Same code as OpenAI, costs 5-10× less. Best for: data extraction pipelines, structured agents that don't justify gpt-4o cost, regression-tested classifiers. Works with: openai-python, openai-node, LangChain with_structured_output, Vercel AI SDK. Setup time: 5 minutes.

JSON Schema mode (strict structured output)

from openai import OpenAI
import json

client = OpenAI(base_url="https://api.fireworks.ai/inference/v1", api_key=os.environ["FIREWORKS_API_KEY"])

schema = {
    "type": "object",
    "properties": {
        "name":     {"type": "string"},
        "salary":   {"type": "number"},
        "company":  {"type": "string"},
        "is_remote":{"type": "boolean"},
    },
    "required": ["name", "company"],
}

resp = client.chat.completions.create(
    model="accounts/fireworks/models/llama-v3p3-70b-instruct",
    messages=[{"role": "user", "content": "Extract: Jane Smith earns $180K at Stripe and works fully remote"}],
    response_format={"type": "json_schema", "json_schema": {"name": "Person", "schema": schema}},
)
data = json.loads(resp.choices[0].message.content)
print(data)  # {"name":"Jane Smith","salary":180000,"company":"Stripe","is_remote":true}

Plain JSON mode (less strict, no schema)

resp = client.chat.completions.create(
    model="accounts/fireworks/models/mixtral-8x22b-instruct",
    messages=[{"role": "user", "content": "Return a JSON object with three coding tips"}],
    response_format={"type": "json_object"},
)

Tool calling

tools = [{
    "type": "function",
    "function": {
        "name": "create_invoice",
        "description": "Create a Stripe invoice for a customer",
        "parameters": {
            "type": "object",
            "properties": {
                "customer_email": {"type": "string"},
                "amount_usd":     {"type": "number"},
                "description":    {"type": "string"},
            },
            "required": ["customer_email", "amount_usd"],
        },
    },
}]

resp = client.chat.completions.create(
    model="accounts/fireworks/models/llama-v3p3-70b-instruct",
    messages=[{"role": "user", "content": "Bill jane@acme.com $5000 for May retainer"}],
    tools=tools,
)
print(resp.choices[0].message.tool_calls)

Why not just gpt-4o?

Item	Llama 3.3 70B (Fireworks)	gpt-4o
Cost / 1M input	$0.90	$5.00
Cost / 1M output	$0.90	$15.00
JSON Schema strict	Yes	Yes
Tool calling	Yes	Yes
Latency p50	~700ms	~600ms
Quality on extraction	~95% of gpt-4o	100% baseline

FAQ

Q: Is JSON Schema mode strict like OpenAI's? A: Yes — Fireworks compiles the schema into a constrained-decoding grammar. Output is guaranteed valid against the schema. Prompt the model normally; structure is enforced at decode time.

Q: Which models support tool calling? A: Llama 3.x (8B + 70B), Mixtral 8×22B, Qwen 2.5, DeepSeek-V3. The model card on docs.fireworks.ai marks tool support per model. Llama 3.3 70B is the production default.

Q: How does this compare to Outlines / Instructor? A: Outlines and Instructor are client-side libraries that re-prompt or post-process. Fireworks' JSON Schema runs server-side via constrained decoding — fewer round trips, lower latency, no token budget for retries. Use them in addition to Fireworks for things like Pydantic class binding (Instructor).

Quick Use

Use response_format={'type':'json_schema','json_schema':{'schema':{...}}} for strict mode
Or response_format={'type':'json_object'} for loose JSON
Tools via standard OpenAI tools=[...]

Intro

JSON Schema mode (strict structured output)

from openai import OpenAI
import json

client = OpenAI(base_url="https://api.fireworks.ai/inference/v1", api_key=os.environ["FIREWORKS_API_KEY"])

schema = {
    "type": "object",
    "properties": {
        "name":     {"type": "string"},
        "salary":   {"type": "number"},
        "company":  {"type": "string"},
        "is_remote":{"type": "boolean"},
    },
    "required": ["name", "company"],
}

resp = client.chat.completions.create(
    model="accounts/fireworks/models/llama-v3p3-70b-instruct",
    messages=[{"role": "user", "content": "Extract: Jane Smith earns $180K at Stripe and works fully remote"}],
    response_format={"type": "json_schema", "json_schema": {"name": "Person", "schema": schema}},
)
data = json.loads(resp.choices[0].message.content)
print(data)  # {"name":"Jane Smith","salary":180000,"company":"Stripe","is_remote":true}

Plain JSON mode (less strict, no schema)

resp = client.chat.completions.create(
    model="accounts/fireworks/models/mixtral-8x22b-instruct",
    messages=[{"role": "user", "content": "Return a JSON object with three coding tips"}],
    response_format={"type": "json_object"},
)

Tool calling

tools = [{
    "type": "function",
    "function": {
        "name": "create_invoice",
        "description": "Create a Stripe invoice for a customer",
        "parameters": {
            "type": "object",
            "properties": {
                "customer_email": {"type": "string"},
                "amount_usd":     {"type": "number"},
                "description":    {"type": "string"},
            },
            "required": ["customer_email", "amount_usd"],
        },
    },
}]

resp = client.chat.completions.create(
    model="accounts/fireworks/models/llama-v3p3-70b-instruct",
    messages=[{"role": "user", "content": "Bill jane@acme.com $5000 for May retainer"}],
    tools=tools,
)
print(resp.choices[0].message.tool_calls)

Why not just gpt-4o?

Item	Llama 3.3 70B (Fireworks)	gpt-4o
Cost / 1M input	$0.90	$5.00
Cost / 1M output	$0.90	$15.00
JSON Schema strict	Yes	Yes
Tool calling	Yes	Yes
Latency p50	~700ms	~600ms
Quality on extraction	~95% of gpt-4o	100% baseline

FAQ

Source & Thanks

Built by Fireworks AI. Structured output docs at docs.fireworks.ai/structured-responses.

Open SDKs at github.com/fw-ai

🙏

Fuente y agradecimientos

Built by Fireworks AI. Structured output docs at docs.fireworks.ai/structured-responses.

Open SDKs at github.com/fw-ai

Discusión

Inicia sesión para unirte a la discusión.

Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados

Fireworks Inference — 100+ Open Models on OpenAI-Compat API

Fireworks runs Llama, Mixtral, DeepSeek, Qwen, Phi via OpenAI-compat API. Sub-second TTFT, speculative decoding on flagship models.

Knowledge

Fireworks AI

Groq Tool Use — Llama 3.3 Function Calling at 280 tok/s

Groq runs Llama 3.3 70B with native tool calling at 280 tok/sec. Multi-turn loops in 1-2 sec. Drop-in OpenAI format. Parallel calls supported.

Skills

Groq

Together AI Chat Completions Skill for Claude Code

Skill that teaches Claude Code how to use Together AI chat completions API. Covers streaming, tool calling, JSON mode, and model selection with correct SDK patterns.

Skills

Together AI

Open Interpreter OS Mode — Natural-Language Computer Control

Open Interpreter OS Mode adds full computer control via screenshots + clicks. Drives any GUI app — terminal, browser, Photoshop — with natural language.

Scripts

Open Interpreter