Is Groq Tool Use — Llama 3.3 Function Calling at 280 tok/s free to use?

Yes. Groq Tool Use — Llama 3.3 Function Calling at 280 tok/s is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install Groq Tool Use — Llama 3.3 Function Calling at 280 tok/s?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

SkillsMay 8, 2026·4 min read

Groq Tool Use — Llama 3.3 Function Calling at 280 tok/s

Name: Groq Tool Use — Llama 3.3 Function Calling at 280 tok/s
Author: Groq

Groq runs Llama 3.3 70B with native tool calling at 280 tok/sec. Multi-turn loops in 1-2 sec. Drop-in OpenAI format. Parallel calls supported.

Groq · Community

Agent ready

This asset can be read and installed directly by agents

TokRepo exposes a universal CLI command, install contract, metadata JSON, adapter-aware plan, and raw content links so agents can judge fit, risk, and next actions.

Stage only · 17/100Stage only

Agent surface

Any MCP/CLI agent

Kind

Skill

Install

Stage only

Trust

Trust: New

Entrypoint

Asset

Universal CLI install command

npx tokrepo install d7a23b2b-55ce-4312-a36e-568346d1fdb3

install contract metadata JSON adapter plan raw content

Intro

Llama 3.3 70B on Groq supports the OpenAI tool-calling spec — tools=[...], tool_choice, parallel_tool_calls — and runs the whole tool loop at 280 tokens/sec. A typical 3-turn agent (model → tool → model → tool → model) finishes in 1.5-2 seconds end-to-end, fast enough for interactive UIs without spinners. Best for: real-time agents, latency-sensitive copilots, voice agents with tool use, anything where slow inference forced you off Llama before. Works with: openai-python, openai-node, LangChain bind_tools, Vercel AI SDK. Setup time: 5 minutes.

Single-turn tool call

from openai import OpenAI
import json, requests

client = OpenAI(base_url="https://api.groq.com/openai/v1", api_key=os.environ["GROQ_API_KEY"])

tools = [{
    "type": "function",
    "function": {
        "name": "get_stock_price",
        "description": "Get current stock price by ticker",
        "parameters": {
            "type": "object",
            "properties": {"ticker": {"type": "string"}},
            "required": ["ticker"],
        },
    },
}]

def get_stock_price(ticker):
    return requests.get(f"https://example-finance.com/{ticker}").json()

messages = [{"role": "user", "content": "What's NVDA trading at?"}]
resp = client.chat.completions.create(model="llama-3.3-70b-versatile", messages=messages, tools=tools)

call = resp.choices[0].message.tool_calls[0]
args = json.loads(call.function.arguments)
result = get_stock_price(args["ticker"])

messages.append(resp.choices[0].message)
messages.append({"role": "tool", "tool_call_id": call.id, "content": json.dumps(result)})
final = client.chat.completions.create(model="llama-3.3-70b-versatile", messages=messages, tools=tools)
print(final.choices[0].message.content)

Parallel tool calls

resp = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=[{"role": "user", "content": "Compare NVDA, AMD, and INTC current prices"}],
    tools=tools,
    parallel_tool_calls=True,
)
# resp.choices[0].message.tool_calls is now a list of 3 calls run in parallel.

Forcing a specific tool

client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=messages,
    tools=tools,
    tool_choice={"type": "function", "function": {"name": "get_stock_price"}},  # required
)

Best practices on Groq

Keep tool descriptions short and verb-led — Llama 3.3 picks tools by name + first sentence of description.
Set temperature=0 for tool-heavy agents — reduces tool-name hallucination.
Use llama-3.3-70b-versatile for serious tool use; llama-3.1-8b-instant works only for simple parameter extraction.
For agentic loops, batch tool results into one assistant turn rather than many — fewer round trips.

FAQ

Q: How does Llama 3.3 tool quality compare to GPT-4o? A: Roughly on par for typical tools (1-3 args, clear names). GPT-4o still leads on long-tail tools with overlapping descriptions. The 280 tok/s speed often wins net latency even when GPT-4o is one round shorter.

Q: Does Groq support structured outputs / JSON mode? A: Yes — response_format={'type': 'json_object'} works on Llama 3.3 70B. JSON schema mode (json_schema) was added in 2025; check console.groq.com/docs for current support level on your model.

Q: What about agent frameworks? A: LangChain ChatOpenAI(base_url='...groq.com/openai/v1') works directly. CrewAI, AutoGen, OpenAI Agents SDK all work via OpenAI-compatible config. Vercel AI SDK has a first-class @ai-sdk/groq provider.

Quick Use

Use model='llama-3.3-70b-versatile' for production tool calling
Pass tools=[...] in OpenAI format — same code as openai
Set parallel_tool_calls=True for multi-tool comparisons

Intro

Single-turn tool call

from openai import OpenAI
import json, requests

client = OpenAI(base_url="https://api.groq.com/openai/v1", api_key=os.environ["GROQ_API_KEY"])

tools = [{
    "type": "function",
    "function": {
        "name": "get_stock_price",
        "description": "Get current stock price by ticker",
        "parameters": {
            "type": "object",
            "properties": {"ticker": {"type": "string"}},
            "required": ["ticker"],
        },
    },
}]

def get_stock_price(ticker):
    return requests.get(f"https://example-finance.com/{ticker}").json()

messages = [{"role": "user", "content": "What's NVDA trading at?"}]
resp = client.chat.completions.create(model="llama-3.3-70b-versatile", messages=messages, tools=tools)

call = resp.choices[0].message.tool_calls[0]
args = json.loads(call.function.arguments)
result = get_stock_price(args["ticker"])

messages.append(resp.choices[0].message)
messages.append({"role": "tool", "tool_call_id": call.id, "content": json.dumps(result)})
final = client.chat.completions.create(model="llama-3.3-70b-versatile", messages=messages, tools=tools)
print(final.choices[0].message.content)

Parallel tool calls

resp = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=[{"role": "user", "content": "Compare NVDA, AMD, and INTC current prices"}],
    tools=tools,
    parallel_tool_calls=True,
)
# resp.choices[0].message.tool_calls is now a list of 3 calls run in parallel.

Forcing a specific tool

client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=messages,
    tools=tools,
    tool_choice={"type": "function", "function": {"name": "get_stock_price"}},  # required
)

Best practices on Groq

Keep tool descriptions short and verb-led — Llama 3.3 picks tools by name + first sentence of description.
Set temperature=0 for tool-heavy agents — reduces tool-name hallucination.
Use llama-3.3-70b-versatile for serious tool use; llama-3.1-8b-instant works only for simple parameter extraction.
For agentic loops, batch tool results into one assistant turn rather than many — fewer round trips.

FAQ

Source & Thanks

Built by Groq. Tool-use docs at console.groq.com/docs/tool-use.

groq/groq-python — official SDK

🙏

Source & Thanks

Built by Groq. Tool-use docs at console.groq.com/docs/tool-use.

groq/groq-python — official SDK

Discussion

No comments yet. Be the first to share your thoughts.

Related Assets

Fireworks JSON Mode + Function Calling on Open Models

Fireworks supports OpenAI-compat JSON mode, JSON Schema, and tool calling on Llama 3.3, Mixtral, Qwen. Same code, cheaper open weights.

Skills

Fireworks AI

Groq Whisper — Sub-Second Speech-to-Text for Voice Agents

Whisper-large-v3 on Groq runs 166× realtime — 60-sec clip in <400ms. OpenAI-compat audio.transcriptions endpoint for voice agents.

Scripts

Groq

Together AI Fine-Tuning Skill for Claude Code

Skill that teaches Claude Code Together AI's fine-tuning API. Covers LoRA, full fine-tuning, DPO preference tuning, VLM training, and function-calling fine-tuning.

Skills

Together AI

GroqCloud Quickstart — 250 tokens/sec OpenAI-Compat API

GroqCloud runs Llama 3.3 70B at 250+ tok/sec on LPU silicon. OpenAI-compatible API. Free tier, sub-second TTFT, ideal for streaming.

Knowledge

Groq