# Fireworks JSON Mode + Function Calling on Open Models > Fireworks supports OpenAI-compat JSON mode, JSON Schema, and tool calling on Llama 3.3, Mixtral, Qwen. Same code, cheaper open weights. ## Install Save the content below to `.claude/skills/` or append to your `CLAUDE.md`: ## Quick Use 1. Use `response_format={'type':'json_schema','json_schema':{'schema':{...}}}` for strict mode 2. Or `response_format={'type':'json_object'}` for loose JSON 3. Tools via standard OpenAI `tools=[...]` --- ## Intro Fireworks AI exposes OpenAI's structured-output features — `response_format='json_object'`, `response_format='json_schema'`, and tool/function calling — on Llama 3.3 70B, Mixtral 8×22B, Qwen 2.5 72B, and DeepSeek-V3. Same code as OpenAI, costs 5-10× less. Best for: data extraction pipelines, structured agents that don't justify gpt-4o cost, regression-tested classifiers. Works with: openai-python, openai-node, LangChain `with_structured_output`, Vercel AI SDK. Setup time: 5 minutes. --- ### JSON Schema mode (strict structured output) ```python from openai import OpenAI import json client = OpenAI(base_url="https://api.fireworks.ai/inference/v1", api_key=os.environ["FIREWORKS_API_KEY"]) schema = { "type": "object", "properties": { "name": {"type": "string"}, "salary": {"type": "number"}, "company": {"type": "string"}, "is_remote":{"type": "boolean"}, }, "required": ["name", "company"], } resp = client.chat.completions.create( model="accounts/fireworks/models/llama-v3p3-70b-instruct", messages=[{"role": "user", "content": "Extract: Jane Smith earns $180K at Stripe and works fully remote"}], response_format={"type": "json_schema", "json_schema": {"name": "Person", "schema": schema}}, ) data = json.loads(resp.choices[0].message.content) print(data) # {"name":"Jane Smith","salary":180000,"company":"Stripe","is_remote":true} ``` ### Plain JSON mode (less strict, no schema) ```python resp = client.chat.completions.create( model="accounts/fireworks/models/mixtral-8x22b-instruct", messages=[{"role": "user", "content": "Return a JSON object with three coding tips"}], response_format={"type": "json_object"}, ) ``` ### Tool calling ```python tools = [{ "type": "function", "function": { "name": "create_invoice", "description": "Create a Stripe invoice for a customer", "parameters": { "type": "object", "properties": { "customer_email": {"type": "string"}, "amount_usd": {"type": "number"}, "description": {"type": "string"}, }, "required": ["customer_email", "amount_usd"], }, }, }] resp = client.chat.completions.create( model="accounts/fireworks/models/llama-v3p3-70b-instruct", messages=[{"role": "user", "content": "Bill jane@acme.com $5000 for May retainer"}], tools=tools, ) print(resp.choices[0].message.tool_calls) ``` ### Why not just gpt-4o? | Item | Llama 3.3 70B (Fireworks) | gpt-4o | |---|---|---| | Cost / 1M input | $0.90 | $5.00 | | Cost / 1M output | $0.90 | $15.00 | | JSON Schema strict | Yes | Yes | | Tool calling | Yes | Yes | | Latency p50 | ~700ms | ~600ms | | Quality on extraction | ~95% of gpt-4o | 100% baseline | --- ### FAQ **Q: Is JSON Schema mode strict like OpenAI's?** A: Yes — Fireworks compiles the schema into a constrained-decoding grammar. Output is guaranteed valid against the schema. Prompt the model normally; structure is enforced at decode time. **Q: Which models support tool calling?** A: Llama 3.x (8B + 70B), Mixtral 8×22B, Qwen 2.5, DeepSeek-V3. The model card on docs.fireworks.ai marks tool support per model. Llama 3.3 70B is the production default. **Q: How does this compare to Outlines / Instructor?** A: Outlines and Instructor are client-side libraries that re-prompt or post-process. Fireworks' JSON Schema runs server-side via constrained decoding — fewer round trips, lower latency, no token budget for retries. Use them in addition to Fireworks for things like Pydantic class binding (Instructor). --- ## Source & Thanks > Built by [Fireworks AI](https://github.com/fw-ai). Structured output docs at [docs.fireworks.ai/structured-responses](https://docs.fireworks.ai/structured-responses). > > Open SDKs at [github.com/fw-ai](https://github.com/fw-ai) --- ## 快速使用 1. 严格模式用 `response_format={'type':'json_schema','json_schema':{'schema':{...}}}` 2. 松 JSON 用 `response_format={'type':'json_object'}` 3. Tools 走标准 OpenAI `tools=[...]` --- ## 简介 Fireworks AI 在 Llama 3.3 70B、Mixtral 8×22B、Qwen 2.5 72B、DeepSeek-V3 上暴露 OpenAI 的结构化输出能力 —— `response_format='json_object'`、`response_format='json_schema'`、tool/function calling。代码跟 OpenAI 一样,成本低 5-10×。适合数据抽取流水线、不值得花 gpt-4o 钱的结构化 agent、回归测过的分类器。兼容 openai-python、openai-node、LangChain `with_structured_output`、Vercel AI SDK。装机时间 5 分钟。 --- ### JSON Schema 模式(严格结构化输出) ```python from openai import OpenAI import json client = OpenAI(base_url="https://api.fireworks.ai/inference/v1", api_key=os.environ["FIREWORKS_API_KEY"]) schema = { "type": "object", "properties": { "name": {"type": "string"}, "salary": {"type": "number"}, "company": {"type": "string"}, "is_remote":{"type": "boolean"}, }, "required": ["name", "company"], } resp = client.chat.completions.create( model="accounts/fireworks/models/llama-v3p3-70b-instruct", messages=[{"role": "user", "content": "抽取:Jane Smith 在 Stripe 年薪 18 万,全远程"}], response_format={"type": "json_schema", "json_schema": {"name": "Person", "schema": schema}}, ) data = json.loads(resp.choices[0].message.content) print(data) # {"name":"Jane Smith","salary":180000,"company":"Stripe","is_remote":true} ``` ### 普通 JSON 模式(较松,不带 schema) ```python resp = client.chat.completions.create( model="accounts/fireworks/models/mixtral-8x22b-instruct", messages=[{"role": "user", "content": "返回一个含三条编码 tip 的 JSON 对象"}], response_format={"type": "json_object"}, ) ``` ### Tool calling ```python tools = [{ "type": "function", "function": { "name": "create_invoice", "description": "给客户开 Stripe 发票", "parameters": { "type": "object", "properties": { "customer_email": {"type": "string"}, "amount_usd": {"type": "number"}, "description": {"type": "string"}, }, "required": ["customer_email", "amount_usd"], }, }, }] resp = client.chat.completions.create( model="accounts/fireworks/models/llama-v3p3-70b-instruct", messages=[{"role": "user", "content": "给 jane@acme.com 开 5000 美元 5 月留存费的账单"}], tools=tools, ) print(resp.choices[0].message.tool_calls) ``` ### 为啥不直接用 gpt-4o? | 项 | Llama 3.3 70B(Fireworks)| gpt-4o | |---|---|---| | 输入 / 百万 token | $0.90 | $5.00 | | 输出 / 百万 token | $0.90 | $15.00 | | JSON Schema 严格 | 是 | 是 | | Tool calling | 是 | 是 | | p50 延迟 | ~700ms | ~600ms | | 抽取质量 | 约 gpt-4o 的 95% | 100% baseline | --- ### FAQ **Q: JSON Schema 模式跟 OpenAI 一样严格吗?** A: 是 —— Fireworks 把 schema 编译成约束解码语法。输出对 schema 一定合法。普通 prompt 模型,结构在解码时强制。 **Q: 哪些模型支持 tool calling?** A: Llama 3.x(8B + 70B)、Mixtral 8×22B、Qwen 2.5、DeepSeek-V3。docs.fireworks.ai 模型卡每模型标了 tool 支持。Llama 3.3 70B 是生产默认。 **Q: 跟 Outlines / Instructor 比?** A: Outlines 和 Instructor 是客户端库,重 prompt 或后处理。Fireworks JSON Schema 服务端走约束解码 —— 更少往返、更低延迟、不为重试花 token。可以叠用 Instructor 做 Pydantic 类绑定。 --- ## 来源与感谢 > Built by [Fireworks AI](https://github.com/fw-ai). Structured output docs at [docs.fireworks.ai/structured-responses](https://docs.fireworks.ai/structured-responses). > > Open SDKs at [github.com/fw-ai](https://github.com/fw-ai) --- Source: https://tokrepo.com/en/workflows/fireworks-json-mode-function-calling-on-open-models Author: Fireworks AI