Introduction
Text-generation-webui (often called "oobabooga" after its maintainer) is the most popular GUI for running language models locally. With over 46,000 GitHub stars, it gives you tabs for chat, notebook-style completion, model loading, training (LoRA), and an OpenAI-compatible API.
It supports virtually every local model format: GGUF (llama.cpp), GPTQ/AWQ/EXL2 (ExLlamaV2), HF Transformers (FP16/INT8/4-bit), and even LangChain-style integrations. Choose any model and a sane backend will load it.
What It Does
The Web UI provides: Chat tab (system prompts, character cards, multi-turn), Default tab (raw completion), Notebook tab (long-form writing), Parameters tab (sampling controls, instruction templates), Model tab (browse/download from HF, hot-load), and Training tab (LoRA/QLoRA training). API mode mimics the OpenAI API for easy integration.
Architecture Overview
[Gradio UI]
|
+-----+-----+
| |
Chat / Notebook / Default tabs
|
[Parameter dispatch]
sampling, templates, character cards
|
[Backend Loader]
+--- llama.cpp (GGUF)
+--- Transformers (HF, FP16/INT8)
+--- ExLlamaV2 (GPTQ/AWQ/EXL2)
+--- HQQ, AQLM
|
[OpenAI-Compatible API]
/v1/chat/completions
/v1/completions
/v1/embeddings
|
[Extensions]
sd-api-pictures (auto-img),
coqui-tts (voice),
memoir+, openai_emb, ...Self-Hosting & Configuration
# Common settings
./start_linux.sh \
--listen \
--listen-port 7860 \
--api \
--api-port 5000 \
--model Qwen2.5-7B-Instruct-GGUF \
--loader llama.cpp \
--gpu-layers 35 \
--threads 8# Use the OpenAI-compatible API from any OpenAI client
from openai import OpenAI
client = OpenAI(base_url="http://localhost:5000/v1", api_key="local")
resp = client.chat.completions.create(
model="Qwen2.5-7B-Instruct",
messages=[{"role": "user", "content": "Explain MoE briefly."}],
)
print(resp.choices[0].message.content)Key Features
- Multi-backend — llama.cpp, Transformers, ExLlamaV2, HQQ, AQLM, more
- Chat / Notebook / Default modes — UX for any text-gen workflow
- Character cards — preset personas with system prompts
- OpenAI-compatible API — drop-in for tools expecting OpenAI
- Training tab — fine-tune via LoRA/QLoRA in the same UI
- Extensions ecosystem — voice (TTS/STT), images (SD), memory, RAG
- Cross-platform — single launcher for Linux/macOS/Windows
- Active community — frequent updates following the model release cycle
Comparison with Similar Tools
| Feature | text-gen-webui | LM Studio | Ollama | KoboldCpp | Open WebUI |
|---|---|---|---|---|---|
| Open source | Yes | No (closed) | Yes | Yes | Yes |
| Backend choice | Many | One (llama.cpp) | One (llama.cpp) | One (llama.cpp) | Connects to any |
| Built-in chat UI | Yes | Yes | No (CLI) | Yes (RP focus) | Yes (best) |
| OpenAI API | Yes | Yes | Yes | Yes | Yes |
| Fine-tuning | Yes (LoRA) | No | No | No | No |
| Roleplay focus | Yes | Limited | No | Strong | Limited |
| Best For | Power users | Beginners on macOS/Windows | CLI / scripting | Roleplay | Polished chat UI |
FAQ
Q: text-gen-webui vs Open WebUI? A: Open WebUI is a polished chat front-end that talks to any backend (Ollama, OpenAI, llama.cpp). text-gen-webui bundles the backend, model loading, and fine-tuning in one app. Power users often run text-gen-webui as the backend with Open WebUI as the chat UI.
Q: Does it support image generation?
A: Not natively, but the sd-api-pictures extension lets the chat call out to a Stable Diffusion server.
Q: How do I update?
A: git pull then re-run the start script (it updates Python deps automatically). The maintainer ships breaking changes occasionally — read CHANGELOG before major updates.
Q: VRAM requirements?
A: Depends on model + quantization. 7B Q4_K_M GGUF runs on 6GB VRAM (or CPU). Llama 3.1-70B Q4 needs ~40GB VRAM, or splits between CPU/GPU with --gpu-layers.
Sources
- GitHub: https://github.com/oobabooga/text-generation-webui
- Maintainer: oobabooga
- License: AGPL-3.0