Esta página se muestra en inglés. Una traducción al español está en curso.
SkillsMay 8, 2026·4 min de lectura

Groq Tool Use — Llama 3.3 Function Calling at 280 tok/s

Groq runs Llama 3.3 70B with native tool calling at 280 tok/sec. Multi-turn loops in 1-2 sec. Drop-in OpenAI format. Parallel calls supported.

Listo para agents

Este activo puede ser leído e instalado directamente por agents

TokRepo expone un comando CLI universal, contrato de instalación, metadata JSON, plan según adaptador y contenido raw para que los agents evalúen compatibilidad, riesgo y próximos pasos.

Stage only · 17/100Stage only
Superficie agent
Cualquier agent MCP/CLI
Tipo
Skill
Instalación
Stage only
Confianza
Confianza: New
Entrada
Asset
Comando CLI universal
npx tokrepo install d7a23b2b-55ce-4312-a36e-568346d1fdb3
Introducción

Llama 3.3 70B on Groq supports the OpenAI tool-calling spec — tools=[...], tool_choice, parallel_tool_calls — and runs the whole tool loop at 280 tokens/sec. A typical 3-turn agent (model → tool → model → tool → model) finishes in 1.5-2 seconds end-to-end, fast enough for interactive UIs without spinners. Best for: real-time agents, latency-sensitive copilots, voice agents with tool use, anything where slow inference forced you off Llama before. Works with: openai-python, openai-node, LangChain bind_tools, Vercel AI SDK. Setup time: 5 minutes.


Single-turn tool call

from openai import OpenAI
import json, requests

client = OpenAI(base_url="https://api.groq.com/openai/v1", api_key=os.environ["GROQ_API_KEY"])

tools = [{
    "type": "function",
    "function": {
        "name": "get_stock_price",
        "description": "Get current stock price by ticker",
        "parameters": {
            "type": "object",
            "properties": {"ticker": {"type": "string"}},
            "required": ["ticker"],
        },
    },
}]

def get_stock_price(ticker):
    return requests.get(f"https://example-finance.com/{ticker}").json()

messages = [{"role": "user", "content": "What's NVDA trading at?"}]
resp = client.chat.completions.create(model="llama-3.3-70b-versatile", messages=messages, tools=tools)

call = resp.choices[0].message.tool_calls[0]
args = json.loads(call.function.arguments)
result = get_stock_price(args["ticker"])

messages.append(resp.choices[0].message)
messages.append({"role": "tool", "tool_call_id": call.id, "content": json.dumps(result)})
final = client.chat.completions.create(model="llama-3.3-70b-versatile", messages=messages, tools=tools)
print(final.choices[0].message.content)

Parallel tool calls

resp = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=[{"role": "user", "content": "Compare NVDA, AMD, and INTC current prices"}],
    tools=tools,
    parallel_tool_calls=True,
)
# resp.choices[0].message.tool_calls is now a list of 3 calls run in parallel.

Forcing a specific tool

client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=messages,
    tools=tools,
    tool_choice={"type": "function", "function": {"name": "get_stock_price"}},  # required
)

Best practices on Groq

  • Keep tool descriptions short and verb-led — Llama 3.3 picks tools by name + first sentence of description.
  • Set temperature=0 for tool-heavy agents — reduces tool-name hallucination.
  • Use llama-3.3-70b-versatile for serious tool use; llama-3.1-8b-instant works only for simple parameter extraction.
  • For agentic loops, batch tool results into one assistant turn rather than many — fewer round trips.

FAQ

Q: How does Llama 3.3 tool quality compare to GPT-4o? A: Roughly on par for typical tools (1-3 args, clear names). GPT-4o still leads on long-tail tools with overlapping descriptions. The 280 tok/s speed often wins net latency even when GPT-4o is one round shorter.

Q: Does Groq support structured outputs / JSON mode? A: Yes — response_format={'type': 'json_object'} works on Llama 3.3 70B. JSON schema mode (json_schema) was added in 2025; check console.groq.com/docs for current support level on your model.

Q: What about agent frameworks? A: LangChain ChatOpenAI(base_url='...groq.com/openai/v1') works directly. CrewAI, AutoGen, OpenAI Agents SDK all work via OpenAI-compatible config. Vercel AI SDK has a first-class @ai-sdk/groq provider.


Quick Use

  1. Use model='llama-3.3-70b-versatile' for production tool calling
  2. Pass tools=[...] in OpenAI format — same code as openai
  3. Set parallel_tool_calls=True for multi-tool comparisons

Intro

Llama 3.3 70B on Groq supports the OpenAI tool-calling spec — tools=[...], tool_choice, parallel_tool_calls — and runs the whole tool loop at 280 tokens/sec. A typical 3-turn agent (model → tool → model → tool → model) finishes in 1.5-2 seconds end-to-end, fast enough for interactive UIs without spinners. Best for: real-time agents, latency-sensitive copilots, voice agents with tool use, anything where slow inference forced you off Llama before. Works with: openai-python, openai-node, LangChain bind_tools, Vercel AI SDK. Setup time: 5 minutes.


Single-turn tool call

from openai import OpenAI
import json, requests

client = OpenAI(base_url="https://api.groq.com/openai/v1", api_key=os.environ["GROQ_API_KEY"])

tools = [{
    "type": "function",
    "function": {
        "name": "get_stock_price",
        "description": "Get current stock price by ticker",
        "parameters": {
            "type": "object",
            "properties": {"ticker": {"type": "string"}},
            "required": ["ticker"],
        },
    },
}]

def get_stock_price(ticker):
    return requests.get(f"https://example-finance.com/{ticker}").json()

messages = [{"role": "user", "content": "What's NVDA trading at?"}]
resp = client.chat.completions.create(model="llama-3.3-70b-versatile", messages=messages, tools=tools)

call = resp.choices[0].message.tool_calls[0]
args = json.loads(call.function.arguments)
result = get_stock_price(args["ticker"])

messages.append(resp.choices[0].message)
messages.append({"role": "tool", "tool_call_id": call.id, "content": json.dumps(result)})
final = client.chat.completions.create(model="llama-3.3-70b-versatile", messages=messages, tools=tools)
print(final.choices[0].message.content)

Parallel tool calls

resp = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=[{"role": "user", "content": "Compare NVDA, AMD, and INTC current prices"}],
    tools=tools,
    parallel_tool_calls=True,
)
# resp.choices[0].message.tool_calls is now a list of 3 calls run in parallel.

Forcing a specific tool

client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=messages,
    tools=tools,
    tool_choice={"type": "function", "function": {"name": "get_stock_price"}},  # required
)

Best practices on Groq

  • Keep tool descriptions short and verb-led — Llama 3.3 picks tools by name + first sentence of description.
  • Set temperature=0 for tool-heavy agents — reduces tool-name hallucination.
  • Use llama-3.3-70b-versatile for serious tool use; llama-3.1-8b-instant works only for simple parameter extraction.
  • For agentic loops, batch tool results into one assistant turn rather than many — fewer round trips.

FAQ

Q: How does Llama 3.3 tool quality compare to GPT-4o? A: Roughly on par for typical tools (1-3 args, clear names). GPT-4o still leads on long-tail tools with overlapping descriptions. The 280 tok/s speed often wins net latency even when GPT-4o is one round shorter.

Q: Does Groq support structured outputs / JSON mode? A: Yes — response_format={'type': 'json_object'} works on Llama 3.3 70B. JSON schema mode (json_schema) was added in 2025; check console.groq.com/docs for current support level on your model.

Q: What about agent frameworks? A: LangChain ChatOpenAI(base_url='...groq.com/openai/v1') works directly. CrewAI, AutoGen, OpenAI Agents SDK all work via OpenAI-compatible config. Vercel AI SDK has a first-class @ai-sdk/groq provider.


Source & Thanks

Built by Groq. Tool-use docs at console.groq.com/docs/tool-use.

groq/groq-python — official SDK

🙏

Fuente y agradecimientos

Built by Groq. Tool-use docs at console.groq.com/docs/tool-use.

groq/groq-python — official SDK

Discusión

Inicia sesión para unirte a la discusión.
Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados