Quick Use
- Use
model='llama-3.3-70b-versatile'for production tool calling - Pass
tools=[...]in OpenAI format — same code as openai - Set
parallel_tool_calls=Truefor multi-tool comparisons
Intro
Llama 3.3 70B on Groq supports the OpenAI tool-calling spec — tools=[...], tool_choice, parallel_tool_calls — and runs the whole tool loop at 280 tokens/sec. A typical 3-turn agent (model → tool → model → tool → model) finishes in 1.5-2 seconds end-to-end, fast enough for interactive UIs without spinners. Best for: real-time agents, latency-sensitive copilots, voice agents with tool use, anything where slow inference forced you off Llama before. Works with: openai-python, openai-node, LangChain bind_tools, Vercel AI SDK. Setup time: 5 minutes.
Single-turn tool call
from openai import OpenAI
import json, requests
client = OpenAI(base_url="https://api.groq.com/openai/v1", api_key=os.environ["GROQ_API_KEY"])
tools = [{
"type": "function",
"function": {
"name": "get_stock_price",
"description": "Get current stock price by ticker",
"parameters": {
"type": "object",
"properties": {"ticker": {"type": "string"}},
"required": ["ticker"],
},
},
}]
def get_stock_price(ticker):
return requests.get(f"https://example-finance.com/{ticker}").json()
messages = [{"role": "user", "content": "What's NVDA trading at?"}]
resp = client.chat.completions.create(model="llama-3.3-70b-versatile", messages=messages, tools=tools)
call = resp.choices[0].message.tool_calls[0]
args = json.loads(call.function.arguments)
result = get_stock_price(args["ticker"])
messages.append(resp.choices[0].message)
messages.append({"role": "tool", "tool_call_id": call.id, "content": json.dumps(result)})
final = client.chat.completions.create(model="llama-3.3-70b-versatile", messages=messages, tools=tools)
print(final.choices[0].message.content)Parallel tool calls
resp = client.chat.completions.create(
model="llama-3.3-70b-versatile",
messages=[{"role": "user", "content": "Compare NVDA, AMD, and INTC current prices"}],
tools=tools,
parallel_tool_calls=True,
)
# resp.choices[0].message.tool_calls is now a list of 3 calls run in parallel.Forcing a specific tool
client.chat.completions.create(
model="llama-3.3-70b-versatile",
messages=messages,
tools=tools,
tool_choice={"type": "function", "function": {"name": "get_stock_price"}}, # required
)Best practices on Groq
- Keep tool descriptions short and verb-led — Llama 3.3 picks tools by name + first sentence of description.
- Set
temperature=0for tool-heavy agents — reduces tool-name hallucination. - Use
llama-3.3-70b-versatilefor serious tool use;llama-3.1-8b-instantworks only for simple parameter extraction. - For agentic loops, batch tool results into one assistant turn rather than many — fewer round trips.
FAQ
Q: How does Llama 3.3 tool quality compare to GPT-4o? A: Roughly on par for typical tools (1-3 args, clear names). GPT-4o still leads on long-tail tools with overlapping descriptions. The 280 tok/s speed often wins net latency even when GPT-4o is one round shorter.
Q: Does Groq support structured outputs / JSON mode?
A: Yes — response_format={'type': 'json_object'} works on Llama 3.3 70B. JSON schema mode (json_schema) was added in 2025; check console.groq.com/docs for current support level on your model.
Q: What about agent frameworks?
A: LangChain ChatOpenAI(base_url='...groq.com/openai/v1') works directly. CrewAI, AutoGen, OpenAI Agents SDK all work via OpenAI-compatible config. Vercel AI SDK has a first-class @ai-sdk/groq provider.
Source & Thanks
Built by Groq. Tool-use docs at console.groq.com/docs/tool-use.
groq/groq-python — official SDK