Groq runs Llama 3.3 70B with native tool calling at 280 tok/sec. Multi-turn loops in 1-2 sec. Drop-in OpenAI format. Parallel calls supported.
Whisper-large-v3 on Groq runs 166× realtime — 60-sec clip in <400ms. OpenAI-compat audio.transcriptions endpoint for voice agents.
GroqCloud runs Llama 3.3 70B at 250+ tok/sec on LPU silicon. OpenAI-compatible API. Free tier, sub-second TTFT, ideal for streaming.