ConfigsMar 31, 2026·2 min read

Text Generation WebUI — Local LLM Chat Interface

Text Generation WebUI is a Gradio interface for running LLMs locally. 46.4K+ GitHub stars. Multiple backends, vision, training, image gen, OpenAI-compatible API. 100% offline.

TL;DR
Text Generation WebUI runs LLMs locally with a Gradio interface, multiple backends, vision support, and an OpenAI-compatible API.
§01

What it is

Text Generation WebUI (by oobabooga) is a Gradio-based web interface for running large language models locally. It supports multiple backends including llama.cpp, ExLlamaV2, Transformers, and AutoGPTQ. Features include text chat, instruction-following, vision (image input), LoRA training, image generation, and an OpenAI-compatible API server. Everything runs on your hardware with no data leaving your machine.

This tool is for developers and AI enthusiasts who want to run open-source LLMs locally for privacy, experimentation, or offline use. If you want a ChatGPT-like interface for models like LLaMA, Mistral, or Qwen without cloud dependencies, this is the standard choice.

§02

How it saves time or tokens

Text Generation WebUI provides a unified interface for multiple model backends, eliminating the need to set up separate environments for each. The one-click installer handles Python dependencies, CUDA drivers, and model downloading. The OpenAI-compatible API means existing tools and scripts that target OpenAI can point to your local instance without code changes. Running locally also means zero API costs -- no token billing.

§03

How to use

  1. Download the one-click installer from the GitHub releases page for your OS (Windows, macOS, Linux).
  2. Run the installer, which sets up a Python environment with all dependencies.
  3. Download a model from HuggingFace through the UI or place model files in the models directory.
§04

Example

# Manual installation
git clone https://github.com/oobabooga/text-generation-webui.git
cd text-generation-webui
pip install -r requirements.txt

# Start the web UI
python server.py

# Start with OpenAI-compatible API
python server.py --api --listen

# The API is now available at http://localhost:5000/v1
# Use it with any OpenAI client:
curl http://localhost:5000/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{"model": "local-model", "messages": [{"role": "user", "content": "Hello"}]}'
§05

Related on TokRepo

§06

Common pitfalls

  • Not having enough VRAM for the chosen model. A 7B parameter model needs roughly 6GB VRAM in 4-bit quantization. Check model requirements before downloading.
  • Using the wrong backend for your hardware. llama.cpp works well on CPU and Apple Silicon. ExLlamaV2 is optimized for NVIDIA GPUs. The UI lets you switch backends in settings.
  • Running the API server without authentication on a public network. The OpenAI-compatible API has no built-in auth. Use a reverse proxy or firewall rules if exposing it beyond localhost.

Frequently Asked Questions

What models can I run with Text Generation WebUI?+

You can run most open-source LLMs including LLaMA, Mistral, Qwen, Phi, Gemma, and hundreds of fine-tuned variants. The UI downloads models directly from HuggingFace. GGUF, GPTQ, AWQ, and EXL2 quantized formats are all supported.

What hardware do I need?+

For 7B models in 4-bit quantization, you need 6-8GB VRAM (NVIDIA GPU) or 8GB RAM (CPU/Apple Silicon via llama.cpp). Larger models (13B, 30B, 70B) need proportionally more memory. CPU inference works but is significantly slower than GPU.

Does it support image/vision models?+

Yes. Text Generation WebUI supports multimodal models that accept image inputs. Models like LLaVA and other vision-language models can process images alongside text prompts through the chat interface.

Can I fine-tune models with this tool?+

Yes. The training tab supports LoRA fine-tuning with your own datasets. You can create custom fine-tunes of base models using conversational data, instruction datasets, or raw text directly through the web interface.

How does the OpenAI-compatible API work?+

Start the server with the --api flag and it exposes endpoints at /v1/chat/completions and /v1/completions that accept the same JSON format as the OpenAI API. Any client library or tool designed for OpenAI works without modification.

Citations (3)
🙏

Source & Thanks

Created by oobabooga. Open source. oobabooga/text-generation-webui — 46,400+ GitHub stars

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets