# Fireworks JSON Mode + Function Calling on Open Models

> Fireworks supports OpenAI-compat JSON mode, JSON Schema, and tool calling on Llama 3.3, Mixtral, Qwen. Same code, cheaper open weights.

## Install

Save the content below to `.claude/skills/` or append to your `CLAUDE.md`:

## Quick Use

1. Use `response_format={'type':'json_schema','json_schema':{'schema':{...}}}` for strict mode
2. Or `response_format={'type':'json_object'}` for loose JSON
3. Tools via standard OpenAI `tools=[...]`

---

## Intro

Fireworks AI exposes OpenAI's structured-output features — `response_format='json_object'`, `response_format='json_schema'`, and tool/function calling — on Llama 3.3 70B, Mixtral 8×22B, Qwen 2.5 72B, and DeepSeek-V3. Same code as OpenAI, costs 5-10× less. Best for: data extraction pipelines, structured agents that don't justify gpt-4o cost, regression-tested classifiers. Works with: openai-python, openai-node, LangChain `with_structured_output`, Vercel AI SDK. Setup time: 5 minutes.

---

### JSON Schema mode (strict structured output)

```python
from openai import OpenAI
import json

client = OpenAI(base_url="https://api.fireworks.ai/inference/v1", api_key=os.environ["FIREWORKS_API_KEY"])

schema = {
    "type": "object",
    "properties": {
        "name":     {"type": "string"},
        "salary":   {"type": "number"},
        "company":  {"type": "string"},
        "is_remote":{"type": "boolean"},
    },
    "required": ["name", "company"],
}

resp = client.chat.completions.create(
    model="accounts/fireworks/models/llama-v3p3-70b-instruct",
    messages=[{"role": "user", "content": "Extract: Jane Smith earns $180K at Stripe and works fully remote"}],
    response_format={"type": "json_schema", "json_schema": {"name": "Person", "schema": schema}},
)
data = json.loads(resp.choices[0].message.content)
print(data)  # {"name":"Jane Smith","salary":180000,"company":"Stripe","is_remote":true}
```

### Plain JSON mode (less strict, no schema)

```python
resp = client.chat.completions.create(
    model="accounts/fireworks/models/mixtral-8x22b-instruct",
    messages=[{"role": "user", "content": "Return a JSON object with three coding tips"}],
    response_format={"type": "json_object"},
)
```

### Tool calling

```python
tools = [{
    "type": "function",
    "function": {
        "name": "create_invoice",
        "description": "Create a Stripe invoice for a customer",
        "parameters": {
            "type": "object",
            "properties": {
                "customer_email": {"type": "string"},
                "amount_usd":     {"type": "number"},
                "description":    {"type": "string"},
            },
            "required": ["customer_email", "amount_usd"],
        },
    },
}]

resp = client.chat.completions.create(
    model="accounts/fireworks/models/llama-v3p3-70b-instruct",
    messages=[{"role": "user", "content": "Bill jane@acme.com $5000 for May retainer"}],
    tools=tools,
)
print(resp.choices[0].message.tool_calls)
```

### Why not just gpt-4o?

| Item | Llama 3.3 70B (Fireworks) | gpt-4o |
|---|---|---|
| Cost / 1M input | $0.90 | $5.00 |
| Cost / 1M output | $0.90 | $15.00 |
| JSON Schema strict | Yes | Yes |
| Tool calling | Yes | Yes |
| Latency p50 | ~700ms | ~600ms |
| Quality on extraction | ~95% of gpt-4o | 100% baseline |

---

### FAQ

**Q: Is JSON Schema mode strict like OpenAI's?**
A: Yes — Fireworks compiles the schema into a constrained-decoding grammar. Output is guaranteed valid against the schema. Prompt the model normally; structure is enforced at decode time.

**Q: Which models support tool calling?**
A: Llama 3.x (8B + 70B), Mixtral 8×22B, Qwen 2.5, DeepSeek-V3. The model card on docs.fireworks.ai marks tool support per model. Llama 3.3 70B is the production default.

**Q: How does this compare to Outlines / Instructor?**
A: Outlines and Instructor are client-side libraries that re-prompt or post-process. Fireworks' JSON Schema runs server-side via constrained decoding — fewer round trips, lower latency, no token budget for retries. Use them in addition to Fireworks for things like Pydantic class binding (Instructor).

---

## Source & Thanks

> Built by [Fireworks AI](https://github.com/fw-ai). Structured output docs at [docs.fireworks.ai/structured-responses](https://docs.fireworks.ai/structured-responses).
>
> Open SDKs at [github.com/fw-ai](https://github.com/fw-ai)

---

<!-- ZH -->

## 快速使用

1. 严格模式用 `response_format={'type':'json_schema','json_schema':{'schema':{...}}}`
2. 松 JSON 用 `response_format={'type':'json_object'}`
3. Tools 走标准 OpenAI `tools=[...]`

---

## 简介

Fireworks AI 在 Llama 3.3 70B、Mixtral 8×22B、Qwen 2.5 72B、DeepSeek-V3 上暴露 OpenAI 的结构化输出能力 —— `response_format='json_object'`、`response_format='json_schema'`、tool/function calling。代码跟 OpenAI 一样，成本低 5-10×。适合数据抽取流水线、不值得花 gpt-4o 钱的结构化 agent、回归测过的分类器。兼容 openai-python、openai-node、LangChain `with_structured_output`、Vercel AI SDK。装机时间 5 分钟。

---

### JSON Schema 模式（严格结构化输出）

```python
from openai import OpenAI
import json

client = OpenAI(base_url="https://api.fireworks.ai/inference/v1", api_key=os.environ["FIREWORKS_API_KEY"])

schema = {
    "type": "object",
    "properties": {
        "name":     {"type": "string"},
        "salary":   {"type": "number"},
        "company":  {"type": "string"},
        "is_remote":{"type": "boolean"},
    },
    "required": ["name", "company"],
}

resp = client.chat.completions.create(
    model="accounts/fireworks/models/llama-v3p3-70b-instruct",
    messages=[{"role": "user", "content": "抽取：Jane Smith 在 Stripe 年薪 18 万，全远程"}],
    response_format={"type": "json_schema", "json_schema": {"name": "Person", "schema": schema}},
)
data = json.loads(resp.choices[0].message.content)
print(data)  # {"name":"Jane Smith","salary":180000,"company":"Stripe","is_remote":true}
```

### 普通 JSON 模式（较松，不带 schema）

```python
resp = client.chat.completions.create(
    model="accounts/fireworks/models/mixtral-8x22b-instruct",
    messages=[{"role": "user", "content": "返回一个含三条编码 tip 的 JSON 对象"}],
    response_format={"type": "json_object"},
)
```

### Tool calling

```python
tools = [{
    "type": "function",
    "function": {
        "name": "create_invoice",
        "description": "给客户开 Stripe 发票",
        "parameters": {
            "type": "object",
            "properties": {
                "customer_email": {"type": "string"},
                "amount_usd":     {"type": "number"},
                "description":    {"type": "string"},
            },
            "required": ["customer_email", "amount_usd"],
        },
    },
}]

resp = client.chat.completions.create(
    model="accounts/fireworks/models/llama-v3p3-70b-instruct",
    messages=[{"role": "user", "content": "给 jane@acme.com 开 5000 美元 5 月留存费的账单"}],
    tools=tools,
)
print(resp.choices[0].message.tool_calls)
```

### 为啥不直接用 gpt-4o？

| 项 | Llama 3.3 70B（Fireworks）| gpt-4o |
|---|---|---|
| 输入 / 百万 token | $0.90 | $5.00 |
| 输出 / 百万 token | $0.90 | $15.00 |
| JSON Schema 严格 | 是 | 是 |
| Tool calling | 是 | 是 |
| p50 延迟 | ~700ms | ~600ms |
| 抽取质量 | 约 gpt-4o 的 95% | 100% baseline |

---

### FAQ

**Q: JSON Schema 模式跟 OpenAI 一样严格吗？**
A: 是 —— Fireworks 把 schema 编译成约束解码语法。输出对 schema 一定合法。普通 prompt 模型，结构在解码时强制。

**Q: 哪些模型支持 tool calling？**
A: Llama 3.x（8B + 70B）、Mixtral 8×22B、Qwen 2.5、DeepSeek-V3。docs.fireworks.ai 模型卡每模型标了 tool 支持。Llama 3.3 70B 是生产默认。

**Q: 跟 Outlines / Instructor 比？**
A: Outlines 和 Instructor 是客户端库，重 prompt 或后处理。Fireworks JSON Schema 服务端走约束解码 —— 更少往返、更低延迟、不为重试花 token。可以叠用 Instructor 做 Pydantic 类绑定。

---

## 来源与感谢

> Built by [Fireworks AI](https://github.com/fw-ai). Structured output docs at [docs.fireworks.ai/structured-responses](https://docs.fireworks.ai/structured-responses).
>
> Open SDKs at [github.com/fw-ai](https://github.com/fw-ai)


---
Source: https://tokrepo.com/en/workflows/fireworks-json-mode-function-calling-on-open-models
Author: Fireworks AI