SkillsMay 8, 2026·4 min read

Groq Tool Use — Llama 3.3 Function Calling at 280 tok/s

Groq runs Llama 3.3 70B with native tool calling at 280 tok/sec. Multi-turn loops in 1-2 sec. Drop-in OpenAI format. Parallel calls supported.

Agent ready

This asset can be read and installed directly by agents

TokRepo exposes a universal CLI command, install contract, metadata JSON, adapter-aware plan, and raw content links so agents can judge fit, risk, and next actions.

Stage only · 17/100Stage only
Agent surface
Any MCP/CLI agent
Kind
Skill
Install
Stage only
Trust
Trust: New
Entrypoint
Asset
Universal CLI install command
npx tokrepo install d7a23b2b-55ce-4312-a36e-568346d1fdb3
Intro

Llama 3.3 70B on Groq supports the OpenAI tool-calling spec — tools=[...], tool_choice, parallel_tool_calls — and runs the whole tool loop at 280 tokens/sec. A typical 3-turn agent (model → tool → model → tool → model) finishes in 1.5-2 seconds end-to-end, fast enough for interactive UIs without spinners. Best for: real-time agents, latency-sensitive copilots, voice agents with tool use, anything where slow inference forced you off Llama before. Works with: openai-python, openai-node, LangChain bind_tools, Vercel AI SDK. Setup time: 5 minutes.


Single-turn tool call

from openai import OpenAI
import json, requests

client = OpenAI(base_url="https://api.groq.com/openai/v1", api_key=os.environ["GROQ_API_KEY"])

tools = [{
    "type": "function",
    "function": {
        "name": "get_stock_price",
        "description": "Get current stock price by ticker",
        "parameters": {
            "type": "object",
            "properties": {"ticker": {"type": "string"}},
            "required": ["ticker"],
        },
    },
}]

def get_stock_price(ticker):
    return requests.get(f"https://example-finance.com/{ticker}").json()

messages = [{"role": "user", "content": "What's NVDA trading at?"}]
resp = client.chat.completions.create(model="llama-3.3-70b-versatile", messages=messages, tools=tools)

call = resp.choices[0].message.tool_calls[0]
args = json.loads(call.function.arguments)
result = get_stock_price(args["ticker"])

messages.append(resp.choices[0].message)
messages.append({"role": "tool", "tool_call_id": call.id, "content": json.dumps(result)})
final = client.chat.completions.create(model="llama-3.3-70b-versatile", messages=messages, tools=tools)
print(final.choices[0].message.content)

Parallel tool calls

resp = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=[{"role": "user", "content": "Compare NVDA, AMD, and INTC current prices"}],
    tools=tools,
    parallel_tool_calls=True,
)
# resp.choices[0].message.tool_calls is now a list of 3 calls run in parallel.

Forcing a specific tool

client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=messages,
    tools=tools,
    tool_choice={"type": "function", "function": {"name": "get_stock_price"}},  # required
)

Best practices on Groq

  • Keep tool descriptions short and verb-led — Llama 3.3 picks tools by name + first sentence of description.
  • Set temperature=0 for tool-heavy agents — reduces tool-name hallucination.
  • Use llama-3.3-70b-versatile for serious tool use; llama-3.1-8b-instant works only for simple parameter extraction.
  • For agentic loops, batch tool results into one assistant turn rather than many — fewer round trips.

FAQ

Q: How does Llama 3.3 tool quality compare to GPT-4o? A: Roughly on par for typical tools (1-3 args, clear names). GPT-4o still leads on long-tail tools with overlapping descriptions. The 280 tok/s speed often wins net latency even when GPT-4o is one round shorter.

Q: Does Groq support structured outputs / JSON mode? A: Yes — response_format={'type': 'json_object'} works on Llama 3.3 70B. JSON schema mode (json_schema) was added in 2025; check console.groq.com/docs for current support level on your model.

Q: What about agent frameworks? A: LangChain ChatOpenAI(base_url='...groq.com/openai/v1') works directly. CrewAI, AutoGen, OpenAI Agents SDK all work via OpenAI-compatible config. Vercel AI SDK has a first-class @ai-sdk/groq provider.


Quick Use

  1. Use model='llama-3.3-70b-versatile' for production tool calling
  2. Pass tools=[...] in OpenAI format — same code as openai
  3. Set parallel_tool_calls=True for multi-tool comparisons

Intro

Llama 3.3 70B on Groq supports the OpenAI tool-calling spec — tools=[...], tool_choice, parallel_tool_calls — and runs the whole tool loop at 280 tokens/sec. A typical 3-turn agent (model → tool → model → tool → model) finishes in 1.5-2 seconds end-to-end, fast enough for interactive UIs without spinners. Best for: real-time agents, latency-sensitive copilots, voice agents with tool use, anything where slow inference forced you off Llama before. Works with: openai-python, openai-node, LangChain bind_tools, Vercel AI SDK. Setup time: 5 minutes.


Single-turn tool call

from openai import OpenAI
import json, requests

client = OpenAI(base_url="https://api.groq.com/openai/v1", api_key=os.environ["GROQ_API_KEY"])

tools = [{
    "type": "function",
    "function": {
        "name": "get_stock_price",
        "description": "Get current stock price by ticker",
        "parameters": {
            "type": "object",
            "properties": {"ticker": {"type": "string"}},
            "required": ["ticker"],
        },
    },
}]

def get_stock_price(ticker):
    return requests.get(f"https://example-finance.com/{ticker}").json()

messages = [{"role": "user", "content": "What's NVDA trading at?"}]
resp = client.chat.completions.create(model="llama-3.3-70b-versatile", messages=messages, tools=tools)

call = resp.choices[0].message.tool_calls[0]
args = json.loads(call.function.arguments)
result = get_stock_price(args["ticker"])

messages.append(resp.choices[0].message)
messages.append({"role": "tool", "tool_call_id": call.id, "content": json.dumps(result)})
final = client.chat.completions.create(model="llama-3.3-70b-versatile", messages=messages, tools=tools)
print(final.choices[0].message.content)

Parallel tool calls

resp = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=[{"role": "user", "content": "Compare NVDA, AMD, and INTC current prices"}],
    tools=tools,
    parallel_tool_calls=True,
)
# resp.choices[0].message.tool_calls is now a list of 3 calls run in parallel.

Forcing a specific tool

client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=messages,
    tools=tools,
    tool_choice={"type": "function", "function": {"name": "get_stock_price"}},  # required
)

Best practices on Groq

  • Keep tool descriptions short and verb-led — Llama 3.3 picks tools by name + first sentence of description.
  • Set temperature=0 for tool-heavy agents — reduces tool-name hallucination.
  • Use llama-3.3-70b-versatile for serious tool use; llama-3.1-8b-instant works only for simple parameter extraction.
  • For agentic loops, batch tool results into one assistant turn rather than many — fewer round trips.

FAQ

Q: How does Llama 3.3 tool quality compare to GPT-4o? A: Roughly on par for typical tools (1-3 args, clear names). GPT-4o still leads on long-tail tools with overlapping descriptions. The 280 tok/s speed often wins net latency even when GPT-4o is one round shorter.

Q: Does Groq support structured outputs / JSON mode? A: Yes — response_format={'type': 'json_object'} works on Llama 3.3 70B. JSON schema mode (json_schema) was added in 2025; check console.groq.com/docs for current support level on your model.

Q: What about agent frameworks? A: LangChain ChatOpenAI(base_url='...groq.com/openai/v1') works directly. CrewAI, AutoGen, OpenAI Agents SDK all work via OpenAI-compatible config. Vercel AI SDK has a first-class @ai-sdk/groq provider.


Source & Thanks

Built by Groq. Tool-use docs at console.groq.com/docs/tool-use.

groq/groq-python — official SDK

🙏

Source & Thanks

Built by Groq. Tool-use docs at console.groq.com/docs/tool-use.

groq/groq-python — official SDK

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets