What is Fireworks JSON Mode + Function Calling on Open Models?

Fireworks supports OpenAI-compat JSON mode, JSON Schema, and tool calling on Llama 3.3, Mixtral, Qwen. Same code, cheaper open weights.

Is Fireworks JSON Mode + Function Calling on Open Models free to use?

Yes. Fireworks JSON Mode + Function Calling on Open Models is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install Fireworks JSON Mode + Function Calling on Open Models?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

Fireworks JSON Mode + Function Calling on Open Models

简介

Fireworks AI 在 Llama 3.3 70B、Mixtral 8×22B、Qwen 2.5 72B、DeepSeek-V3 上暴露 OpenAI 的结构化输出能力 —— response_format='json_object'、response_format='json_schema'、tool/function calling。代码跟 OpenAI 一样，成本低 5-10×。适合数据抽取流水线、不值得花 gpt-4o 钱的结构化 agent、回归测过的分类器。兼容 openai-python、openai-node、LangChain with_structured_output、Vercel AI SDK。装机时间 5 分钟。

JSON Schema 模式（严格结构化输出）

from openai import OpenAI
import json

client = OpenAI(base_url="https://api.fireworks.ai/inference/v1", api_key=os.environ["FIREWORKS_API_KEY"])

schema = {
    "type": "object",
    "properties": {
        "name":     {"type": "string"},
        "salary":   {"type": "number"},
        "company":  {"type": "string"},
        "is_remote":{"type": "boolean"},
    },
    "required": ["name", "company"],
}

resp = client.chat.completions.create(
    model="accounts/fireworks/models/llama-v3p3-70b-instruct",
    messages=[{"role": "user", "content": "抽取：Jane Smith 在 Stripe 年薪 18 万，全远程"}],
    response_format={"type": "json_schema", "json_schema": {"name": "Person", "schema": schema}},
)
data = json.loads(resp.choices[0].message.content)
print(data)  # {"name":"Jane Smith","salary":180000,"company":"Stripe","is_remote":true}

普通 JSON 模式（较松，不带 schema）

resp = client.chat.completions.create(
    model="accounts/fireworks/models/mixtral-8x22b-instruct",
    messages=[{"role": "user", "content": "返回一个含三条编码 tip 的 JSON 对象"}],
    response_format={"type": "json_object"},
)

Tool calling

tools = [{
    "type": "function",
    "function": {
        "name": "create_invoice",
        "description": "给客户开 Stripe 发票",
        "parameters": {
            "type": "object",
            "properties": {
                "customer_email": {"type": "string"},
                "amount_usd":     {"type": "number"},
                "description":    {"type": "string"},
            },
            "required": ["customer_email", "amount_usd"],
        },
    },
}]

resp = client.chat.completions.create(
    model="accounts/fireworks/models/llama-v3p3-70b-instruct",
    messages=[{"role": "user", "content": "给 jane@acme.com 开 5000 美元 5 月留存费的账单"}],
    tools=tools,
)
print(resp.choices[0].message.tool_calls)

为啥不直接用 gpt-4o？

项	Llama 3.3 70B（Fireworks）	gpt-4o
输入 / 百万 token	$0.90	$5.00
输出 / 百万 token	$0.90	$15.00
JSON Schema 严格	是	是
Tool calling	是	是
p50 延迟	~700ms	~600ms
抽取质量	约 gpt-4o 的 95%	100% baseline

FAQ

Q: JSON Schema 模式跟 OpenAI 一样严格吗？ A: 是 —— Fireworks 把 schema 编译成约束解码语法。输出对 schema 一定合法。普通 prompt 模型，结构在解码时强制。

Q: 哪些模型支持 tool calling？ A: Llama 3.x（8B + 70B）、Mixtral 8×22B、Qwen 2.5、DeepSeek-V3。docs.fireworks.ai 模型卡每模型标了 tool 支持。Llama 3.3 70B 是生产默认。

Q: 跟 Outlines / Instructor 比？ A: Outlines 和 Instructor 是客户端库，重 prompt 或后处理。Fireworks JSON Schema 服务端走约束解码 —— 更少往返、更低延迟、不为重试花 token。可以叠用 Instructor 做 Pydantic 类绑定。

Fireworks JSON Mode + Function Calling on Open Models

这个资产会安全暂存

简介

JSON Schema 模式（严格结构化输出）

普通 JSON 模式（较松，不带 schema）

Tool calling

为啥不直接用 gpt-4o？

FAQ

来源与感谢

讨论

相关资产

Fireworks Inference — 100+ Open Models on OpenAI-Compat API

Replicate — Run AI Models via Simple API Calls

Quicktype — Generate Type-Safe Code from JSON, Schema, and GraphQL

tiktoken — Fast BPE Tokenizer for OpenAI Models