# Groq Tool Use — Llama 3.3 Function Calling at 280 tok/s > Groq runs Llama 3.3 70B with native tool calling at 280 tok/sec. Multi-turn loops in 1-2 sec. Drop-in OpenAI format. Parallel calls supported. ## Install Save the content below to `.claude/skills/` or append to your `CLAUDE.md`: ## Quick Use 1. Use `model='llama-3.3-70b-versatile'` for production tool calling 2. Pass `tools=[...]` in OpenAI format — same code as openai 3. Set `parallel_tool_calls=True` for multi-tool comparisons --- ## Intro Llama 3.3 70B on Groq supports the OpenAI tool-calling spec — `tools=[...]`, `tool_choice`, `parallel_tool_calls` — and runs the whole tool loop at 280 tokens/sec. A typical 3-turn agent (model → tool → model → tool → model) finishes in 1.5-2 seconds end-to-end, fast enough for interactive UIs without spinners. Best for: real-time agents, latency-sensitive copilots, voice agents with tool use, anything where slow inference forced you off Llama before. Works with: openai-python, openai-node, LangChain `bind_tools`, Vercel AI SDK. Setup time: 5 minutes. --- ### Single-turn tool call ```python from openai import OpenAI import json, requests client = OpenAI(base_url="https://api.groq.com/openai/v1", api_key=os.environ["GROQ_API_KEY"]) tools = [{ "type": "function", "function": { "name": "get_stock_price", "description": "Get current stock price by ticker", "parameters": { "type": "object", "properties": {"ticker": {"type": "string"}}, "required": ["ticker"], }, }, }] def get_stock_price(ticker): return requests.get(f"https://example-finance.com/{ticker}").json() messages = [{"role": "user", "content": "What's NVDA trading at?"}] resp = client.chat.completions.create(model="llama-3.3-70b-versatile", messages=messages, tools=tools) call = resp.choices[0].message.tool_calls[0] args = json.loads(call.function.arguments) result = get_stock_price(args["ticker"]) messages.append(resp.choices[0].message) messages.append({"role": "tool", "tool_call_id": call.id, "content": json.dumps(result)}) final = client.chat.completions.create(model="llama-3.3-70b-versatile", messages=messages, tools=tools) print(final.choices[0].message.content) ``` ### Parallel tool calls ```python resp = client.chat.completions.create( model="llama-3.3-70b-versatile", messages=[{"role": "user", "content": "Compare NVDA, AMD, and INTC current prices"}], tools=tools, parallel_tool_calls=True, ) # resp.choices[0].message.tool_calls is now a list of 3 calls run in parallel. ``` ### Forcing a specific tool ```python client.chat.completions.create( model="llama-3.3-70b-versatile", messages=messages, tools=tools, tool_choice={"type": "function", "function": {"name": "get_stock_price"}}, # required ) ``` ### Best practices on Groq - Keep tool descriptions short and verb-led — Llama 3.3 picks tools by name + first sentence of description. - Set `temperature=0` for tool-heavy agents — reduces tool-name hallucination. - Use `llama-3.3-70b-versatile` for serious tool use; `llama-3.1-8b-instant` works only for simple parameter extraction. - For agentic loops, batch tool results into one assistant turn rather than many — fewer round trips. --- ### FAQ **Q: How does Llama 3.3 tool quality compare to GPT-4o?** A: Roughly on par for typical tools (1-3 args, clear names). GPT-4o still leads on long-tail tools with overlapping descriptions. The 280 tok/s speed often wins net latency even when GPT-4o is one round shorter. **Q: Does Groq support structured outputs / JSON mode?** A: Yes — `response_format={'type': 'json_object'}` works on Llama 3.3 70B. JSON schema mode (`json_schema`) was added in 2025; check console.groq.com/docs for current support level on your model. **Q: What about agent frameworks?** A: LangChain `ChatOpenAI(base_url='...groq.com/openai/v1')` works directly. CrewAI, AutoGen, OpenAI Agents SDK all work via OpenAI-compatible config. Vercel AI SDK has a first-class `@ai-sdk/groq` provider. --- ## Source & Thanks > Built by [Groq](https://github.com/groq). Tool-use docs at [console.groq.com/docs/tool-use](https://console.groq.com/docs/tool-use). > > [groq/groq-python](https://github.com/groq/groq-python) — official SDK --- ## 快速使用 1. 生产 tool calling 用 `model='llama-3.3-70b-versatile'` 2. 按 OpenAI 格式传 `tools=[...]` —— 跟 openai 同样代码 3. 多工具对比设 `parallel_tool_calls=True` --- ## 简介 Groq 上的 Llama 3.3 70B 支持 OpenAI tool-calling 规范 —— `tools=[...]`、`tool_choice`、`parallel_tool_calls` —— 整个 tool loop 280 tokens/秒跑。典型 3 轮 agent(模型 → 工具 → 模型 → 工具 → 模型)端到端 1.5-2 秒完成,交互 UI 不用转圈。适合实时 agent、延迟敏感 copilot、带 tool use 的语音 agent、任何之前因为推理慢离开 Llama 的场景。兼容 openai-python、openai-node、LangChain `bind_tools`、Vercel AI SDK。装机时间 5 分钟。 --- ### 单轮 tool 调用 ```python from openai import OpenAI import json, requests client = OpenAI(base_url="https://api.groq.com/openai/v1", api_key=os.environ["GROQ_API_KEY"]) tools = [{ "type": "function", "function": { "name": "get_stock_price", "description": "按 ticker 拿当前股价", "parameters": { "type": "object", "properties": {"ticker": {"type": "string"}}, "required": ["ticker"], }, }, }] def get_stock_price(ticker): return requests.get(f"https://example-finance.com/{ticker}").json() messages = [{"role": "user", "content": "NVDA 现在多少?"}] resp = client.chat.completions.create(model="llama-3.3-70b-versatile", messages=messages, tools=tools) call = resp.choices[0].message.tool_calls[0] args = json.loads(call.function.arguments) result = get_stock_price(args["ticker"]) messages.append(resp.choices[0].message) messages.append({"role": "tool", "tool_call_id": call.id, "content": json.dumps(result)}) final = client.chat.completions.create(model="llama-3.3-70b-versatile", messages=messages, tools=tools) print(final.choices[0].message.content) ``` ### 并行 tool 调用 ```python resp = client.chat.completions.create( model="llama-3.3-70b-versatile", messages=[{"role": "user", "content": "对比 NVDA、AMD、INTC 当前股价"}], tools=tools, parallel_tool_calls=True, ) # resp.choices[0].message.tool_calls 是并行跑的 3 个调用列表。 ``` ### 强制用某个 tool ```python client.chat.completions.create( model="llama-3.3-70b-versatile", messages=messages, tools=tools, tool_choice={"type": "function", "function": {"name": "get_stock_price"}}, # 强制 ) ``` ### Groq 上的最佳实践 - Tool 描述短、动词开头 —— Llama 3.3 按名字 + 描述第一句选工具。 - 重 tool agent 设 `temperature=0` —— 减少 tool 名幻觉。 - 严肃 tool use 用 `llama-3.3-70b-versatile`;`llama-3.1-8b-instant` 只能做简单参数抽取。 - 多步 agent loop 把多个 tool 结果合到一轮 assistant 里,少往返。 --- ### FAQ **Q: Llama 3.3 tool 质量对比 GPT-4o?** A: 典型工具(1-3 参数、名字清晰)大致打平。描述重叠的长尾工具上 GPT-4o 还领先。280 tok/秒速度让净延迟常赢,即使 GPT-4o 少一轮。 **Q: Groq 支持结构化输出 / JSON 模式吗?** A: 支持 —— `response_format={'type': 'json_object'}` 在 Llama 3.3 70B 上能用。JSON schema 模式(`json_schema`)2025 年加入;当前模型支持级别看 console.groq.com/docs。 **Q: Agent 框架呢?** A: LangChain `ChatOpenAI(base_url='...groq.com/openai/v1')` 直接可用。CrewAI、AutoGen、OpenAI Agents SDK 都走 OpenAI 兼容配置。Vercel AI SDK 有头等 `@ai-sdk/groq` provider。 --- ## 来源与感谢 > Built by [Groq](https://github.com/groq). Tool-use docs at [console.groq.com/docs/tool-use](https://console.groq.com/docs/tool-use). > > [groq/groq-python](https://github.com/groq/groq-python) — official SDK --- Source: https://tokrepo.com/en/workflows/groq-tool-use-llama-3-3-function-calling-at-280-tok-s Author: Groq