What is Groq Tool Use — Llama 3.3 Function Calling at 280 tok/s?

Groq runs Llama 3.3 70B with native tool calling at 280 tok/sec. Multi-turn loops in 1-2 sec. Drop-in OpenAI format. Parallel calls supported.

Is Groq Tool Use — Llama 3.3 Function Calling at 280 tok/s free to use?

Yes. Groq Tool Use — Llama 3.3 Function Calling at 280 tok/s is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install Groq Tool Use — Llama 3.3 Function Calling at 280 tok/s?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

Groq Tool Use — Llama 3.3 Function Calling at 280 tok/s

Name: Groq Tool Use — Llama 3.3 Function Calling at 280 tok/s
Author: Groq

简介

Groq 上的 Llama 3.3 70B 支持 OpenAI tool-calling 规范 —— tools=[...]、tool_choice、parallel_tool_calls —— 整个 tool loop 280 tokens/秒跑。典型 3 轮 agent（模型 → 工具 → 模型 → 工具 → 模型）端到端 1.5-2 秒完成，交互 UI 不用转圈。适合实时 agent、延迟敏感 copilot、带 tool use 的语音 agent、任何之前因为推理慢离开 Llama 的场景。兼容 openai-python、openai-node、LangChain bind_tools、Vercel AI SDK。装机时间 5 分钟。

单轮 tool 调用

from openai import OpenAI
import json, requests

client = OpenAI(base_url="https://api.groq.com/openai/v1", api_key=os.environ["GROQ_API_KEY"])

tools = [{
    "type": "function",
    "function": {
        "name": "get_stock_price",
        "description": "按 ticker 拿当前股价",
        "parameters": {
            "type": "object",
            "properties": {"ticker": {"type": "string"}},
            "required": ["ticker"],
        },
    },
}]

def get_stock_price(ticker):
    return requests.get(f"https://example-finance.com/{ticker}").json()

messages = [{"role": "user", "content": "NVDA 现在多少？"}]
resp = client.chat.completions.create(model="llama-3.3-70b-versatile", messages=messages, tools=tools)

call = resp.choices[0].message.tool_calls[0]
args = json.loads(call.function.arguments)
result = get_stock_price(args["ticker"])

messages.append(resp.choices[0].message)
messages.append({"role": "tool", "tool_call_id": call.id, "content": json.dumps(result)})
final = client.chat.completions.create(model="llama-3.3-70b-versatile", messages=messages, tools=tools)
print(final.choices[0].message.content)

并行 tool 调用

resp = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=[{"role": "user", "content": "对比 NVDA、AMD、INTC 当前股价"}],
    tools=tools,
    parallel_tool_calls=True,
)
# resp.choices[0].message.tool_calls 是并行跑的 3 个调用列表。

强制用某个 tool

client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=messages,
    tools=tools,
    tool_choice={"type": "function", "function": {"name": "get_stock_price"}},  # 强制
)

Groq 上的最佳实践

Tool 描述短、动词开头 —— Llama 3.3 按名字 + 描述第一句选工具。
重 tool agent 设 temperature=0 —— 减少 tool 名幻觉。
严肃 tool use 用 llama-3.3-70b-versatile；llama-3.1-8b-instant 只能做简单参数抽取。
多步 agent loop 把多个 tool 结果合到一轮 assistant 里，少往返。

FAQ

Q: Llama 3.3 tool 质量对比 GPT-4o？ A: 典型工具（1-3 参数、名字清晰）大致打平。描述重叠的长尾工具上 GPT-4o 还领先。280 tok/秒速度让净延迟常赢，即使 GPT-4o 少一轮。

Q: Groq 支持结构化输出 / JSON 模式吗？ A: 支持 —— response_format={'type': 'json_object'} 在 Llama 3.3 70B 上能用。JSON schema 模式（json_schema）2025 年加入；当前模型支持级别看 console.groq.com/docs。

Q: Agent 框架呢？ A: LangChain ChatOpenAI(base_url='...groq.com/openai/v1') 直接可用。CrewAI、AutoGen、OpenAI Agents SDK 都走 OpenAI 兼容配置。Vercel AI SDK 有头等 @ai-sdk/groq provider。

Groq Tool Use — Llama 3.3 Function Calling at 280 tok/s

这个资产可以被 Agent 直接读取和安装

简介

单轮 tool 调用

并行 tool 调用

强制用某个 tool

Groq 上的最佳实践

FAQ

来源与感谢

讨论

相关资产

Fireworks JSON Mode + Function Calling on Open Models

Groq Whisper — Sub-Second Speech-to-Text for Voice Agents

Together AI Fine-Tuning Skill for Claude Code

GroqCloud Quickstart — 250 tokens/sec OpenAI-Compat API