Is Groq Tool Use — Llama 3.3 Function Calling at 280 tok/s free to use?

Yes. Groq Tool Use — Llama 3.3 Function Calling at 280 tok/s is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install Groq Tool Use — Llama 3.3 Function Calling at 280 tok/s?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

Esta página se muestra en inglés. Una traducción al español está en curso.

SkillsMay 8, 2026·4 min de lectura

Groq Tool Use — Llama 3.3 Function Calling at 280 tok/s

Groq runs Llama 3.3 70B with native tool calling at 280 tok/sec. Multi-turn loops in 1-2 sec. Drop-in OpenAI format. Parallel calls supported.

Groq · Community

Listo para agents

Staging seguro para este activo

Este activo primero queda en staging. El prompt copiado pide inspeccionar los archivos staged antes de activar scripts, config MCP o config global.

Stage only · 29/100Política: staging

Superficie agent

Cualquier agent MCP/CLI

Tipo

Skill

Instalación

Stage only

Confianza

Confianza: Community

Entrada

Asset

Comando de staging seguro

npx -y tokrepo@latest install d7a23b2b-55ce-4312-a36e-568346d1fdb3 --target codex

Primero deja archivos en staging; la activación requiere revisar el README y el plan staged.

Introducción

Llama 3.3 70B on Groq supports the OpenAI tool-calling spec — tools=[...], tool_choice, parallel_tool_calls — and runs the whole tool loop at 280 tokens/sec. A typical 3-turn agent (model → tool → model → tool → model) finishes in 1.5-2 seconds end-to-end, fast enough for interactive UIs without spinners. Best for: real-time agents, latency-sensitive copilots, voice agents with tool use, anything where slow inference forced you off Llama before. Works with: openai-python, openai-node, LangChain bind_tools, Vercel AI SDK. Setup time: 5 minutes.

Single-turn tool call

from openai import OpenAI
import json, requests

client = OpenAI(base_url="https://api.groq.com/openai/v1", api_key=os.environ["GROQ_API_KEY"])

tools = [{
    "type": "function",
    "function": {
        "name": "get_stock_price",
        "description": "Get current stock price by ticker",
        "parameters": {
            "type": "object",
            "properties": {"ticker": {"type": "string"}},
            "required": ["ticker"],
        },
    },
}]

def get_stock_price(ticker):
    return requests.get(f"https://example-finance.com/{ticker}").json()

messages = [{"role": "user", "content": "What's NVDA trading at?"}]
resp = client.chat.completions.create(model="llama-3.3-70b-versatile", messages=messages, tools=tools)

call = resp.choices[0].message.tool_calls[0]
args = json.loads(call.function.arguments)
result = get_stock_price(args["ticker"])

messages.append(resp.choices[0].message)
messages.append({"role": "tool", "tool_call_id": call.id, "content": json.dumps(result)})
final = client.chat.completions.create(model="llama-3.3-70b-versatile", messages=messages, tools=tools)
print(final.choices[0].message.content)

Parallel tool calls

resp = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=[{"role": "user", "content": "Compare NVDA, AMD, and INTC current prices"}],
    tools=tools,
    parallel_tool_calls=True,
)
# resp.choices[0].message.tool_calls is now a list of 3 calls run in parallel.

Forcing a specific tool

client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=messages,
    tools=tools,
    tool_choice={"type": "function", "function": {"name": "get_stock_price"}},  # required
)

Best practices on Groq

Keep tool descriptions short and verb-led — Llama 3.3 picks tools by name + first sentence of description.
Set temperature=0 for tool-heavy agents — reduces tool-name hallucination.
Use llama-3.3-70b-versatile for serious tool use; llama-3.1-8b-instant works only for simple parameter extraction.
For agentic loops, batch tool results into one assistant turn rather than many — fewer round trips.

FAQ

Q: How does Llama 3.3 tool quality compare to GPT-4o? A: Roughly on par for typical tools (1-3 args, clear names). GPT-4o still leads on long-tail tools with overlapping descriptions. The 280 tok/s speed often wins net latency even when GPT-4o is one round shorter.

Q: Does Groq support structured outputs / JSON mode? A: Yes — response_format={'type': 'json_object'} works on Llama 3.3 70B. JSON schema mode (json_schema) was added in 2025; check console.groq.com/docs for current support level on your model.

Q: What about agent frameworks? A: LangChain ChatOpenAI(base_url='...groq.com/openai/v1') works directly. CrewAI, AutoGen, OpenAI Agents SDK all work via OpenAI-compatible config. Vercel AI SDK has a first-class @ai-sdk/groq provider.

Quick Use

Use model='llama-3.3-70b-versatile' for production tool calling
Pass tools=[...] in OpenAI format — same code as openai
Set parallel_tool_calls=True for multi-tool comparisons

Intro

Single-turn tool call

from openai import OpenAI
import json, requests

client = OpenAI(base_url="https://api.groq.com/openai/v1", api_key=os.environ["GROQ_API_KEY"])

tools = [{
    "type": "function",
    "function": {
        "name": "get_stock_price",
        "description": "Get current stock price by ticker",
        "parameters": {
            "type": "object",
            "properties": {"ticker": {"type": "string"}},
            "required": ["ticker"],
        },
    },
}]

def get_stock_price(ticker):
    return requests.get(f"https://example-finance.com/{ticker}").json()

messages = [{"role": "user", "content": "What's NVDA trading at?"}]
resp = client.chat.completions.create(model="llama-3.3-70b-versatile", messages=messages, tools=tools)

call = resp.choices[0].message.tool_calls[0]
args = json.loads(call.function.arguments)
result = get_stock_price(args["ticker"])

messages.append(resp.choices[0].message)
messages.append({"role": "tool", "tool_call_id": call.id, "content": json.dumps(result)})
final = client.chat.completions.create(model="llama-3.3-70b-versatile", messages=messages, tools=tools)
print(final.choices[0].message.content)

Parallel tool calls

resp = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=[{"role": "user", "content": "Compare NVDA, AMD, and INTC current prices"}],
    tools=tools,
    parallel_tool_calls=True,
)
# resp.choices[0].message.tool_calls is now a list of 3 calls run in parallel.

Forcing a specific tool

client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=messages,
    tools=tools,
    tool_choice={"type": "function", "function": {"name": "get_stock_price"}},  # required
)

Best practices on Groq

Keep tool descriptions short and verb-led — Llama 3.3 picks tools by name + first sentence of description.
Set temperature=0 for tool-heavy agents — reduces tool-name hallucination.
Use llama-3.3-70b-versatile for serious tool use; llama-3.1-8b-instant works only for simple parameter extraction.
For agentic loops, batch tool results into one assistant turn rather than many — fewer round trips.

FAQ

Source & Thanks

Built by Groq. Tool-use docs at console.groq.com/docs/tool-use.

groq/groq-python — official SDK

🙏

Fuente y agradecimientos

Built by Groq. Tool-use docs at console.groq.com/docs/tool-use.

groq/groq-python — official SDK

Discusión

Inicia sesión para unirte a la discusión.

Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados

Fireworks JSON Mode + Function Calling on Open Models

Fireworks supports OpenAI-compat JSON mode, JSON Schema, and tool calling on Llama 3.3, Mixtral, Qwen. Same code, cheaper open weights.

Skills

Fireworks AI

Groq Whisper — Sub-Second Speech-to-Text for Voice Agents

Whisper-large-v3 on Groq runs 166× realtime — 60-sec clip in <400ms. OpenAI-compat audio.transcriptions endpoint for voice agents.

Skills

Groq

llama.cpp — Run LLMs Locally in Pure C/C++

llama.cpp is a C/C++ LLM inference engine with 100K+ GitHub stars. Runs on CPU, Apple Silicon, NVIDIA, AMD GPUs. 1.5-8 bit quantization, no dependencies, supports 50+ model architectures. MIT licensed

Skills

Script Depot

LLaMA-Factory — Fine-Tune 100+ LLMs with a Unified Interface

LLaMA-Factory provides a web UI and CLI to fine-tune large language models including LLaMA, Mistral, Qwen, and more using LoRA, QLoRA, and full-parameter methods without writing training scripts.

Skills

Script Depot