# text-generation-webui — A Gradio Web UI for Local LLMs

> oobabooga's text-generation-webui is the "AUTOMATIC1111 of LLMs": a feature-rich Gradio interface for chatting with and serving local language models. It supports llama.cpp, Transformers, ExLlamaV2, and dozens of model formats.

## Install

Save in your project root:

# text-generation-webui — Gradio UI for Local LLMs

## Quick Use
```bash
# One-line installer (chooses CUDA/ROCm/MPS/CPU automatically)
git clone https://github.com/oobabooga/text-generation-webui
cd text-generation-webui
./start_linux.sh   # or start_windows.bat / start_macos.sh
# Pick your hardware backend at first run
# Open http://127.0.0.1:7860
```

## Introduction
Text-generation-webui (often called "oobabooga" after its maintainer) is the most popular GUI for running language models locally. With over 46,000 GitHub stars, it gives you tabs for chat, notebook-style completion, model loading, training (LoRA), and an OpenAI-compatible API.

It supports virtually every local model format: GGUF (llama.cpp), GPTQ/AWQ/EXL2 (ExLlamaV2), HF Transformers (FP16/INT8/4-bit), and even LangChain-style integrations. Choose any model and a sane backend will load it.

## What It Does
The Web UI provides: **Chat tab** (system prompts, character cards, multi-turn), **Default tab** (raw completion), **Notebook tab** (long-form writing), **Parameters tab** (sampling controls, instruction templates), **Model tab** (browse/download from HF, hot-load), and **Training tab** (LoRA/QLoRA training). API mode mimics the OpenAI API for easy integration.

## Architecture Overview
```
[Gradio UI]
      |
+-----+-----+
|           |
Chat / Notebook / Default tabs
      |
[Parameter dispatch]
   sampling, templates, character cards
      |
[Backend Loader]
  +--- llama.cpp (GGUF)
  +--- Transformers (HF, FP16/INT8)
  +--- ExLlamaV2 (GPTQ/AWQ/EXL2)
  +--- HQQ, AQLM
      |
[OpenAI-Compatible API]
  /v1/chat/completions
  /v1/completions
  /v1/embeddings
      |
[Extensions]
   sd-api-pictures (auto-img),
   coqui-tts (voice),
   memoir+, openai_emb, ...
```

## Self-Hosting & Configuration
```bash
# Common settings
./start_linux.sh \
  --listen \
  --listen-port 7860 \
  --api \
  --api-port 5000 \
  --model Qwen2.5-7B-Instruct-GGUF \
  --loader llama.cpp \
  --gpu-layers 35 \
  --threads 8
```

```python
# Use the OpenAI-compatible API from any OpenAI client
from openai import OpenAI
client = OpenAI(base_url="http://localhost:5000/v1", api_key="local")
resp = client.chat.completions.create(
    model="Qwen2.5-7B-Instruct",
    messages=[{"role": "user", "content": "Explain MoE briefly."}],
)
print(resp.choices[0].message.content)
```

## Key Features
- **Multi-backend** — llama.cpp, Transformers, ExLlamaV2, HQQ, AQLM, more
- **Chat / Notebook / Default modes** — UX for any text-gen workflow
- **Character cards** — preset personas with system prompts
- **OpenAI-compatible API** — drop-in for tools expecting OpenAI
- **Training tab** — fine-tune via LoRA/QLoRA in the same UI
- **Extensions ecosystem** — voice (TTS/STT), images (SD), memory, RAG
- **Cross-platform** — single launcher for Linux/macOS/Windows
- **Active community** — frequent updates following the model release cycle

## Comparison with Similar Tools
| Feature | text-gen-webui | LM Studio | Ollama | KoboldCpp | Open WebUI |
|---|---|---|---|---|---|
| Open source | Yes | No (closed) | Yes | Yes | Yes |
| Backend choice | Many | One (llama.cpp) | One (llama.cpp) | One (llama.cpp) | Connects to any |
| Built-in chat UI | Yes | Yes | No (CLI) | Yes (RP focus) | Yes (best) |
| OpenAI API | Yes | Yes | Yes | Yes | Yes |
| Fine-tuning | Yes (LoRA) | No | No | No | No |
| Roleplay focus | Yes | Limited | No | Strong | Limited |
| Best For | Power users | Beginners on macOS/Windows | CLI / scripting | Roleplay | Polished chat UI |

## FAQ
**Q: text-gen-webui vs Open WebUI?**
A: Open WebUI is a polished chat front-end that talks to any backend (Ollama, OpenAI, llama.cpp). text-gen-webui bundles the backend, model loading, and fine-tuning in one app. Power users often run text-gen-webui as the backend with Open WebUI as the chat UI.

**Q: Does it support image generation?**
A: Not natively, but the `sd-api-pictures` extension lets the chat call out to a Stable Diffusion server.

**Q: How do I update?**
A: `git pull` then re-run the start script (it updates Python deps automatically). The maintainer ships breaking changes occasionally — read CHANGELOG before major updates.

**Q: VRAM requirements?**
A: Depends on model + quantization. 7B Q4_K_M GGUF runs on 6GB VRAM (or CPU). Llama 3.1-70B Q4 needs ~40GB VRAM, or splits between CPU/GPU with `--gpu-layers`.

## Sources
- GitHub: https://github.com/oobabooga/text-generation-webui
- Maintainer: oobabooga
- License: AGPL-3.0

---
Source: https://tokrepo.com/en/workflows/b0d2eaa8-37db-11f1-9bc6-00163e2b0d79
Author: AI Open Source