ConfigsApr 14, 2026·3 min read

text-generation-webui — A Gradio Web UI for Local LLMs

oobabooga's text-generation-webui is the "AUTOMATIC1111 of LLMs": a feature-rich Gradio interface for chatting with and serving local language models. It supports llama.cpp, Transformers, ExLlamaV2, and dozens of model formats.

TL;DR
Feature-rich Gradio web UI for running local language models with llama.cpp, Transformers, and ExLlamaV2 backends.
§01

What it is

text-generation-webui (commonly called 'oobabooga') is a feature-rich Gradio web interface for chatting with and serving local language models. It supports multiple backends including llama.cpp, Transformers, ExLlamaV2, and dozens of model formats. The one-line installer detects your hardware (CUDA, ROCm, MPS, CPU) and configures the appropriate backend automatically.

The project targets users who want to run LLMs locally with a user-friendly web interface. It provides chat, notebook, and API modes, model management, LoRA loading, and extension support.

§02

How it saves time or tokens

text-generation-webui eliminates the need to write Python scripts for local model inference. The web UI provides a chat interface, parameter tuning, model comparison, and API endpoints without any code. The one-line installer handles Python environments, CUDA dependencies, and backend compilation. For experimentation with different models and parameters, the UI approach is faster than editing scripts.

§03

How to use

  1. Clone and run the installer:
git clone https://github.com/oobabooga/text-generation-webui
cd text-generation-webui
./start_linux.sh   # or start_windows.bat / start_macos.sh
  1. Select your hardware during setup (CUDA, ROCm, MPS, or CPU).
  1. Download a model from the Model tab and start chatting.
§04

Example

Using the API for programmatic access:

import requests

response = requests.post(
    'http://localhost:5000/v1/chat/completions',
    json={
        'model': 'loaded-model',
        'messages': [
            {'role': 'user', 'content': 'Explain quantum computing briefly'}
        ],
        'temperature': 0.7,
        'max_tokens': 200,
    }
)

print(response.json()['choices'][0]['message']['content'])

The API follows the OpenAI chat completions format, making it a drop-in replacement for API-based workflows.

§05

Related on TokRepo

§06

Common pitfalls

  • The installer creates a large Python environment (several GB). Ensure sufficient disk space before installation.
  • VRAM requirements vary by model and quantization. A 7B model at 4-bit quantization needs roughly 6GB VRAM. Check model requirements before downloading.
  • Some model formats (GPTQ, AWQ, EXL2) require specific backends. Not all backends are compatible with all formats.
  • Always check the official documentation for the latest version-specific changes and migration guides before upgrading in production environments.
  • For team deployments, establish clear guidelines on configuration and usage patterns to ensure consistency across developers.
  • Model quantization levels (4-bit, 8-bit, 16-bit) trade quality for speed and memory usage. Start with 4-bit quantization for testing and increase precision for production quality.
  • The web UI exposes an API endpoint by default. In shared environments, configure authentication or restrict access to localhost to prevent unauthorized model usage.

Frequently Asked Questions

What model formats does text-generation-webui support?+

It supports GGUF (llama.cpp), GPTQ, AWQ, EXL2 (ExLlamaV2), and standard Hugging Face Transformers format. Each format has different performance characteristics and VRAM requirements.

Does it provide an OpenAI-compatible API?+

Yes. The built-in API server follows the OpenAI chat completions format. This means you can use it as a local replacement for OpenAI's API in applications that support custom endpoints.

What hardware is required?+

The UI works on NVIDIA GPUs (CUDA), AMD GPUs (ROCm), Apple Silicon (MPS), and CPU-only setups. GPU acceleration dramatically improves inference speed. A minimum of 8GB VRAM is recommended for 7B parameter models.

Can I load LoRA adapters?+

Yes. The UI supports loading LoRA adapters on top of base models. This lets you use fine-tuned models without merging the adapters, saving disk space and enabling quick switching.

How does text-generation-webui compare to Ollama?+

Ollama provides a simpler CLI-focused experience for running models. text-generation-webui offers a richer web UI with more parameter controls, multiple backends, extension support, and model comparison features. Ollama is easier to set up; text-generation-webui provides more flexibility.

Citations (3)

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets