# text-generation-webui — A Gradio Web UI for Local LLMs > oobabooga's text-generation-webui is the "AUTOMATIC1111 of LLMs": a feature-rich Gradio interface for chatting with and serving local language models. It supports llama.cpp, Transformers, ExLlamaV2, and dozens of model formats. ## Install Save in your project root: # text-generation-webui — Gradio UI for Local LLMs ## Quick Use ```bash # One-line installer (chooses CUDA/ROCm/MPS/CPU automatically) git clone https://github.com/oobabooga/text-generation-webui cd text-generation-webui ./start_linux.sh # or start_windows.bat / start_macos.sh # Pick your hardware backend at first run # Open http://127.0.0.1:7860 ``` ## Introduction Text-generation-webui (often called "oobabooga" after its maintainer) is the most popular GUI for running language models locally. With over 46,000 GitHub stars, it gives you tabs for chat, notebook-style completion, model loading, training (LoRA), and an OpenAI-compatible API. It supports virtually every local model format: GGUF (llama.cpp), GPTQ/AWQ/EXL2 (ExLlamaV2), HF Transformers (FP16/INT8/4-bit), and even LangChain-style integrations. Choose any model and a sane backend will load it. ## What It Does The Web UI provides: **Chat tab** (system prompts, character cards, multi-turn), **Default tab** (raw completion), **Notebook tab** (long-form writing), **Parameters tab** (sampling controls, instruction templates), **Model tab** (browse/download from HF, hot-load), and **Training tab** (LoRA/QLoRA training). API mode mimics the OpenAI API for easy integration. ## Architecture Overview ``` [Gradio UI] | +-----+-----+ | | Chat / Notebook / Default tabs | [Parameter dispatch] sampling, templates, character cards | [Backend Loader] +--- llama.cpp (GGUF) +--- Transformers (HF, FP16/INT8) +--- ExLlamaV2 (GPTQ/AWQ/EXL2) +--- HQQ, AQLM | [OpenAI-Compatible API] /v1/chat/completions /v1/completions /v1/embeddings | [Extensions] sd-api-pictures (auto-img), coqui-tts (voice), memoir+, openai_emb, ... ``` ## Self-Hosting & Configuration ```bash # Common settings ./start_linux.sh \ --listen \ --listen-port 7860 \ --api \ --api-port 5000 \ --model Qwen2.5-7B-Instruct-GGUF \ --loader llama.cpp \ --gpu-layers 35 \ --threads 8 ``` ```python # Use the OpenAI-compatible API from any OpenAI client from openai import OpenAI client = OpenAI(base_url="http://localhost:5000/v1", api_key="local") resp = client.chat.completions.create( model="Qwen2.5-7B-Instruct", messages=[{"role": "user", "content": "Explain MoE briefly."}], ) print(resp.choices[0].message.content) ``` ## Key Features - **Multi-backend** — llama.cpp, Transformers, ExLlamaV2, HQQ, AQLM, more - **Chat / Notebook / Default modes** — UX for any text-gen workflow - **Character cards** — preset personas with system prompts - **OpenAI-compatible API** — drop-in for tools expecting OpenAI - **Training tab** — fine-tune via LoRA/QLoRA in the same UI - **Extensions ecosystem** — voice (TTS/STT), images (SD), memory, RAG - **Cross-platform** — single launcher for Linux/macOS/Windows - **Active community** — frequent updates following the model release cycle ## Comparison with Similar Tools | Feature | text-gen-webui | LM Studio | Ollama | KoboldCpp | Open WebUI | |---|---|---|---|---|---| | Open source | Yes | No (closed) | Yes | Yes | Yes | | Backend choice | Many | One (llama.cpp) | One (llama.cpp) | One (llama.cpp) | Connects to any | | Built-in chat UI | Yes | Yes | No (CLI) | Yes (RP focus) | Yes (best) | | OpenAI API | Yes | Yes | Yes | Yes | Yes | | Fine-tuning | Yes (LoRA) | No | No | No | No | | Roleplay focus | Yes | Limited | No | Strong | Limited | | Best For | Power users | Beginners on macOS/Windows | CLI / scripting | Roleplay | Polished chat UI | ## FAQ **Q: text-gen-webui vs Open WebUI?** A: Open WebUI is a polished chat front-end that talks to any backend (Ollama, OpenAI, llama.cpp). text-gen-webui bundles the backend, model loading, and fine-tuning in one app. Power users often run text-gen-webui as the backend with Open WebUI as the chat UI. **Q: Does it support image generation?** A: Not natively, but the `sd-api-pictures` extension lets the chat call out to a Stable Diffusion server. **Q: How do I update?** A: `git pull` then re-run the start script (it updates Python deps automatically). The maintainer ships breaking changes occasionally — read CHANGELOG before major updates. **Q: VRAM requirements?** A: Depends on model + quantization. 7B Q4_K_M GGUF runs on 6GB VRAM (or CPU). Llama 3.1-70B Q4 needs ~40GB VRAM, or splits between CPU/GPU with `--gpu-layers`. ## Sources - GitHub: https://github.com/oobabooga/text-generation-webui - Maintainer: oobabooga - License: AGPL-3.0 --- Source: https://tokrepo.com/en/workflows/b0d2eaa8-37db-11f1-9bc6-00163e2b0d79 Author: AI Open Source