Text Generation WebUI — Local LLM Chat Interface
Text Generation WebUI is a Gradio interface for running LLMs locally. 46.4K+ GitHub stars. Multiple backends, vision, training, image gen, OpenAI-compatible API. 100% offline.
What it is
Text Generation WebUI (by oobabooga) is a Gradio-based web interface for running large language models locally. It supports multiple backends including llama.cpp, ExLlamaV2, Transformers, and AutoGPTQ. Features include text chat, instruction-following, vision (image input), LoRA training, image generation, and an OpenAI-compatible API server. Everything runs on your hardware with no data leaving your machine.
This tool is for developers and AI enthusiasts who want to run open-source LLMs locally for privacy, experimentation, or offline use. If you want a ChatGPT-like interface for models like LLaMA, Mistral, or Qwen without cloud dependencies, this is the standard choice.
How it saves time or tokens
Text Generation WebUI provides a unified interface for multiple model backends, eliminating the need to set up separate environments for each. The one-click installer handles Python dependencies, CUDA drivers, and model downloading. The OpenAI-compatible API means existing tools and scripts that target OpenAI can point to your local instance without code changes. Running locally also means zero API costs -- no token billing.
How to use
- Download the one-click installer from the GitHub releases page for your OS (Windows, macOS, Linux).
- Run the installer, which sets up a Python environment with all dependencies.
- Download a model from HuggingFace through the UI or place model files in the models directory.
Example
# Manual installation
git clone https://github.com/oobabooga/text-generation-webui.git
cd text-generation-webui
pip install -r requirements.txt
# Start the web UI
python server.py
# Start with OpenAI-compatible API
python server.py --api --listen
# The API is now available at http://localhost:5000/v1
# Use it with any OpenAI client:
curl http://localhost:5000/v1/chat/completions \
-H 'Content-Type: application/json' \
-d '{"model": "local-model", "messages": [{"role": "user", "content": "Hello"}]}'
Related on TokRepo
- Local LLM tools -- Text Generation WebUI deep dive
- Self-hosted AI tools -- run AI tools on your own infrastructure
Common pitfalls
- Not having enough VRAM for the chosen model. A 7B parameter model needs roughly 6GB VRAM in 4-bit quantization. Check model requirements before downloading.
- Using the wrong backend for your hardware. llama.cpp works well on CPU and Apple Silicon. ExLlamaV2 is optimized for NVIDIA GPUs. The UI lets you switch backends in settings.
- Running the API server without authentication on a public network. The OpenAI-compatible API has no built-in auth. Use a reverse proxy or firewall rules if exposing it beyond localhost.
Frequently Asked Questions
You can run most open-source LLMs including LLaMA, Mistral, Qwen, Phi, Gemma, and hundreds of fine-tuned variants. The UI downloads models directly from HuggingFace. GGUF, GPTQ, AWQ, and EXL2 quantized formats are all supported.
For 7B models in 4-bit quantization, you need 6-8GB VRAM (NVIDIA GPU) or 8GB RAM (CPU/Apple Silicon via llama.cpp). Larger models (13B, 30B, 70B) need proportionally more memory. CPU inference works but is significantly slower than GPU.
Yes. Text Generation WebUI supports multimodal models that accept image inputs. Models like LLaVA and other vision-language models can process images alongside text prompts through the chat interface.
Yes. The training tab supports LoRA fine-tuning with your own datasets. You can create custom fine-tunes of base models using conversational data, instruction datasets, or raw text directly through the web interface.
Start the server with the --api flag and it exposes endpoints at /v1/chat/completions and /v1/completions that accept the same JSON format as the OpenAI API. Any client library or tool designed for OpenAI works without modification.
Citations (3)
- Text Generation WebUI GitHub— Text Generation WebUI is a Gradio interface for LLMs with 46K+ stars
- Text Generation WebUI Wiki— Supports llama.cpp, ExLlamaV2, Transformers, AutoGPTQ backends
- Text Generation WebUI API Docs— OpenAI-compatible API for local model serving
Related on TokRepo
Source & Thanks
Created by oobabooga. Open source. oobabooga/text-generation-webui — 46,400+ GitHub stars
Discussion
Related Assets
Conda — Cross-Platform Package and Environment Manager
Install, update, and manage packages and isolated environments for Python, R, C/C++, and hundreds of other languages from a single tool.
Sphinx — Python Documentation Generator
Generate professional documentation from reStructuredText and Markdown with cross-references, API autodoc, and multiple output formats.
Neutralinojs — Lightweight Cross-Platform Desktop Apps
Build desktop applications with HTML, CSS, and JavaScript using a tiny native runtime instead of bundling Chromium.