Introduction
Gorilla is a research project from UC Berkeley that fine-tunes LLMs to generate accurate API and function calls from natural language instructions. It addresses the hallucination problem where standard LLMs fabricate API parameters or call nonexistent endpoints, providing a reliable bridge between natural language and programmatic tool use.
What Gorilla Does
- Generates syntactically correct function calls from natural language descriptions
- Supports structured output in multiple formats including OpenAI, Anthropic, and raw JSON
- Provides the Berkeley Function Calling Leaderboard for evaluating tool-use models
- Includes OpenFunctions models fine-tuned on thousands of real-world API specifications
- Handles multi-turn conversations with function call chaining and parallel execution
Architecture Overview
Gorilla models are fine-tuned from base LLMs using a curated dataset of API documentation paired with natural language queries and correct function calls. The training pipeline uses retrieval-augmented fine-tuning where API documentation is injected into the context during training to ground the model in real specifications. This approach lets the model generalize to unseen APIs by learning the mapping pattern rather than memorizing specific endpoints.
Self-Hosting & Configuration
- Run locally with Python 3.10+ and a CUDA-capable GPU for inference
- Download model weights from Hugging Face or use the hosted API endpoint
- Configure the serving backend with vLLM or Hugging Face Transformers
- Set temperature and sampling parameters through the API server
- Models range from 7B to 13B parameters depending on the variant
Key Features
- Reduces API call hallucination compared to general-purpose LLMs
- Supports 1,600+ real-world APIs in training data from major cloud providers
- Open evaluation framework with reproducible benchmarks across models
- Compatible with the OpenAI function calling format for drop-in replacement
- Actively maintained with regular model updates and expanded API coverage
Comparison with Similar Tools
- GPT-4 Function Calling — Proprietary and closed; Gorilla provides an open-source alternative with competitive accuracy
- LangChain Tools — A framework for chaining tools; Gorilla handles the model-level function call generation
- Instructor — Focuses on structured output extraction; Gorilla specifically targets API call generation
- NexusRaven — Similar function-calling model but with a narrower API coverage
FAQ
Q: Does Gorilla work with custom APIs not in the training set? A: Yes, when provided with API documentation in the prompt, Gorilla can generalize to unseen APIs.
Q: What GPU is required to run Gorilla locally? A: The 7B model runs on a single GPU with 16 GB VRAM; larger variants need 24+ GB.
Q: Can Gorilla replace the OpenAI function calling API? A: It supports the same format and can serve as an open-source alternative for function call generation.
Q: How is Gorilla evaluated? A: Through the Berkeley Function Calling Leaderboard, which tests accuracy on real API calls across multiple categories.