# Groq Tool Use — Llama 3.3 Function Calling at 280 tok/s

> Groq runs Llama 3.3 70B with native tool calling at 280 tok/sec. Multi-turn loops in 1-2 sec. Drop-in OpenAI format. Parallel calls supported.

## Install

Save the content below to `.claude/skills/` or append to your `CLAUDE.md`:

## Quick Use

1. Use `model='llama-3.3-70b-versatile'` for production tool calling
2. Pass `tools=[...]` in OpenAI format — same code as openai
3. Set `parallel_tool_calls=True` for multi-tool comparisons

---

## Intro

Llama 3.3 70B on Groq supports the OpenAI tool-calling spec — `tools=[...]`, `tool_choice`, `parallel_tool_calls` — and runs the whole tool loop at 280 tokens/sec. A typical 3-turn agent (model → tool → model → tool → model) finishes in 1.5-2 seconds end-to-end, fast enough for interactive UIs without spinners. Best for: real-time agents, latency-sensitive copilots, voice agents with tool use, anything where slow inference forced you off Llama before. Works with: openai-python, openai-node, LangChain `bind_tools`, Vercel AI SDK. Setup time: 5 minutes.

---

### Single-turn tool call

```python
from openai import OpenAI
import json, requests

client = OpenAI(base_url="https://api.groq.com/openai/v1", api_key=os.environ["GROQ_API_KEY"])

tools = [{
    "type": "function",
    "function": {
        "name": "get_stock_price",
        "description": "Get current stock price by ticker",
        "parameters": {
            "type": "object",
            "properties": {"ticker": {"type": "string"}},
            "required": ["ticker"],
        },
    },
}]

def get_stock_price(ticker):
    return requests.get(f"https://example-finance.com/{ticker}").json()

messages = [{"role": "user", "content": "What's NVDA trading at?"}]
resp = client.chat.completions.create(model="llama-3.3-70b-versatile", messages=messages, tools=tools)

call = resp.choices[0].message.tool_calls[0]
args = json.loads(call.function.arguments)
result = get_stock_price(args["ticker"])

messages.append(resp.choices[0].message)
messages.append({"role": "tool", "tool_call_id": call.id, "content": json.dumps(result)})
final = client.chat.completions.create(model="llama-3.3-70b-versatile", messages=messages, tools=tools)
print(final.choices[0].message.content)
```

### Parallel tool calls

```python
resp = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=[{"role": "user", "content": "Compare NVDA, AMD, and INTC current prices"}],
    tools=tools,
    parallel_tool_calls=True,
)
# resp.choices[0].message.tool_calls is now a list of 3 calls run in parallel.
```

### Forcing a specific tool

```python
client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=messages,
    tools=tools,
    tool_choice={"type": "function", "function": {"name": "get_stock_price"}},  # required
)
```

### Best practices on Groq

- Keep tool descriptions short and verb-led — Llama 3.3 picks tools by name + first sentence of description.
- Set `temperature=0` for tool-heavy agents — reduces tool-name hallucination.
- Use `llama-3.3-70b-versatile` for serious tool use; `llama-3.1-8b-instant` works only for simple parameter extraction.
- For agentic loops, batch tool results into one assistant turn rather than many — fewer round trips.

---

### FAQ

**Q: How does Llama 3.3 tool quality compare to GPT-4o?**
A: Roughly on par for typical tools (1-3 args, clear names). GPT-4o still leads on long-tail tools with overlapping descriptions. The 280 tok/s speed often wins net latency even when GPT-4o is one round shorter.

**Q: Does Groq support structured outputs / JSON mode?**
A: Yes — `response_format={'type': 'json_object'}` works on Llama 3.3 70B. JSON schema mode (`json_schema`) was added in 2025; check console.groq.com/docs for current support level on your model.

**Q: What about agent frameworks?**
A: LangChain `ChatOpenAI(base_url='...groq.com/openai/v1')` works directly. CrewAI, AutoGen, OpenAI Agents SDK all work via OpenAI-compatible config. Vercel AI SDK has a first-class `@ai-sdk/groq` provider.

---

## Source & Thanks

> Built by [Groq](https://github.com/groq). Tool-use docs at [console.groq.com/docs/tool-use](https://console.groq.com/docs/tool-use).
>
> [groq/groq-python](https://github.com/groq/groq-python) — official SDK

---

<!-- ZH -->

## 快速使用

1. 生产 tool calling 用 `model='llama-3.3-70b-versatile'`
2. 按 OpenAI 格式传 `tools=[...]` —— 跟 openai 同样代码
3. 多工具对比设 `parallel_tool_calls=True`

---

## 简介

Groq 上的 Llama 3.3 70B 支持 OpenAI tool-calling 规范 —— `tools=[...]`、`tool_choice`、`parallel_tool_calls` —— 整个 tool loop 280 tokens/秒跑。典型 3 轮 agent（模型 → 工具 → 模型 → 工具 → 模型）端到端 1.5-2 秒完成，交互 UI 不用转圈。适合实时 agent、延迟敏感 copilot、带 tool use 的语音 agent、任何之前因为推理慢离开 Llama 的场景。兼容 openai-python、openai-node、LangChain `bind_tools`、Vercel AI SDK。装机时间 5 分钟。

---

### 单轮 tool 调用

```python
from openai import OpenAI
import json, requests

client = OpenAI(base_url="https://api.groq.com/openai/v1", api_key=os.environ["GROQ_API_KEY"])

tools = [{
    "type": "function",
    "function": {
        "name": "get_stock_price",
        "description": "按 ticker 拿当前股价",
        "parameters": {
            "type": "object",
            "properties": {"ticker": {"type": "string"}},
            "required": ["ticker"],
        },
    },
}]

def get_stock_price(ticker):
    return requests.get(f"https://example-finance.com/{ticker}").json()

messages = [{"role": "user", "content": "NVDA 现在多少？"}]
resp = client.chat.completions.create(model="llama-3.3-70b-versatile", messages=messages, tools=tools)

call = resp.choices[0].message.tool_calls[0]
args = json.loads(call.function.arguments)
result = get_stock_price(args["ticker"])

messages.append(resp.choices[0].message)
messages.append({"role": "tool", "tool_call_id": call.id, "content": json.dumps(result)})
final = client.chat.completions.create(model="llama-3.3-70b-versatile", messages=messages, tools=tools)
print(final.choices[0].message.content)
```

### 并行 tool 调用

```python
resp = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=[{"role": "user", "content": "对比 NVDA、AMD、INTC 当前股价"}],
    tools=tools,
    parallel_tool_calls=True,
)
# resp.choices[0].message.tool_calls 是并行跑的 3 个调用列表。
```

### 强制用某个 tool

```python
client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=messages,
    tools=tools,
    tool_choice={"type": "function", "function": {"name": "get_stock_price"}},  # 强制
)
```

### Groq 上的最佳实践

- Tool 描述短、动词开头 —— Llama 3.3 按名字 + 描述第一句选工具。
- 重 tool agent 设 `temperature=0` —— 减少 tool 名幻觉。
- 严肃 tool use 用 `llama-3.3-70b-versatile`；`llama-3.1-8b-instant` 只能做简单参数抽取。
- 多步 agent loop 把多个 tool 结果合到一轮 assistant 里，少往返。

---

### FAQ

**Q: Llama 3.3 tool 质量对比 GPT-4o？**
A: 典型工具（1-3 参数、名字清晰）大致打平。描述重叠的长尾工具上 GPT-4o 还领先。280 tok/秒速度让净延迟常赢，即使 GPT-4o 少一轮。

**Q: Groq 支持结构化输出 / JSON 模式吗？**
A: 支持 —— `response_format={'type': 'json_object'}` 在 Llama 3.3 70B 上能用。JSON schema 模式（`json_schema`）2025 年加入；当前模型支持级别看 console.groq.com/docs。

**Q: Agent 框架呢？**
A: LangChain `ChatOpenAI(base_url='...groq.com/openai/v1')` 直接可用。CrewAI、AutoGen、OpenAI Agents SDK 都走 OpenAI 兼容配置。Vercel AI SDK 有头等 `@ai-sdk/groq` provider。

---

## 来源与感谢

> Built by [Groq](https://github.com/groq). Tool-use docs at [console.groq.com/docs/tool-use](https://console.groq.com/docs/tool-use).
>
> [groq/groq-python](https://github.com/groq/groq-python) — official SDK


---
Source: https://tokrepo.com/en/workflows/groq-tool-use-llama-3-3-function-calling-at-280-tok-s
Author: Groq