What is text-generation-webui — A Gradio Web UI for Local LLMs?

oobabooga's text-generation-webui is the "AUTOMATIC1111 of LLMs": a feature-rich Gradio interface for chatting with and serving local language models. It supports llama.cpp, Transformers, ExLlamaV2, and dozens of model formats.

Is text-generation-webui — A Gradio Web UI for Local LLMs free to use?

Yes. text-generation-webui — A Gradio Web UI for Local LLMs is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install text-generation-webui — A Gradio Web UI for Local LLMs?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

text-generation-webui — A Gradio Web UI for Local LLMs

Introduction

Text-generation-webui (often called "oobabooga" after its maintainer) is the most popular GUI for running language models locally. With over 46,000 GitHub stars, it gives you tabs for chat, notebook-style completion, model loading, training (LoRA), and an OpenAI-compatible API.

It supports virtually every local model format: GGUF (llama.cpp), GPTQ/AWQ/EXL2 (ExLlamaV2), HF Transformers (FP16/INT8/4-bit), and even LangChain-style integrations. Choose any model and a sane backend will load it.

What It Does

The Web UI provides: Chat tab (system prompts, character cards, multi-turn), Default tab (raw completion), Notebook tab (long-form writing), Parameters tab (sampling controls, instruction templates), Model tab (browse/download from HF, hot-load), and Training tab (LoRA/QLoRA training). API mode mimics the OpenAI API for easy integration.

Architecture Overview

[Gradio UI]
      |
+-----+-----+
|           |
Chat / Notebook / Default tabs
      |
[Parameter dispatch]
   sampling, templates, character cards
      |
[Backend Loader]
  +--- llama.cpp (GGUF)
  +--- Transformers (HF, FP16/INT8)
  +--- ExLlamaV2 (GPTQ/AWQ/EXL2)
  +--- HQQ, AQLM
      |
[OpenAI-Compatible API]
  /v1/chat/completions
  /v1/completions
  /v1/embeddings
      |
[Extensions]
   sd-api-pictures (auto-img),
   coqui-tts (voice),
   memoir+, openai_emb, ...

Self-Hosting & Configuration

# Common settings
./start_linux.sh \
  --listen \
  --listen-port 7860 \
  --api \
  --api-port 5000 \
  --model Qwen2.5-7B-Instruct-GGUF \
  --loader llama.cpp \
  --gpu-layers 35 \
  --threads 8

# Use the OpenAI-compatible API from any OpenAI client
from openai import OpenAI
client = OpenAI(base_url="http://localhost:5000/v1", api_key="local")
resp = client.chat.completions.create(
    model="Qwen2.5-7B-Instruct",
    messages=[{"role": "user", "content": "Explain MoE briefly."}],
)
print(resp.choices[0].message.content)

Key Features

Multi-backend — llama.cpp, Transformers, ExLlamaV2, HQQ, AQLM, more
Chat / Notebook / Default modes — UX for any text-gen workflow
Character cards — preset personas with system prompts
OpenAI-compatible API — drop-in for tools expecting OpenAI
Training tab — fine-tune via LoRA/QLoRA in the same UI
Extensions ecosystem — voice (TTS/STT), images (SD), memory, RAG
Cross-platform — single launcher for Linux/macOS/Windows
Active community — frequent updates following the model release cycle

Comparison with Similar Tools

Feature	text-gen-webui	LM Studio	Ollama	KoboldCpp	Open WebUI
Open source	Yes	No (closed)	Yes	Yes	Yes
Backend choice	Many	One (llama.cpp)	One (llama.cpp)	One (llama.cpp)	Connects to any
Built-in chat UI	Yes	Yes	No (CLI)	Yes (RP focus)	Yes (best)
OpenAI API	Yes	Yes	Yes	Yes	Yes
Fine-tuning	Yes (LoRA)	No	No	No	No
Roleplay focus	Yes	Limited	No	Strong	Limited
Best For	Power users	Beginners on macOS/Windows	CLI / scripting	Roleplay	Polished chat UI

FAQ

Q: text-gen-webui vs Open WebUI? A: Open WebUI is a polished chat front-end that talks to any backend (Ollama, OpenAI, llama.cpp). text-gen-webui bundles the backend, model loading, and fine-tuning in one app. Power users often run text-gen-webui as the backend with Open WebUI as the chat UI.

Q: Does it support image generation? A: Not natively, but the sd-api-pictures extension lets the chat call out to a Stable Diffusion server.

Q: How do I update? A: git pull then re-run the start script (it updates Python deps automatically). The maintainer ships breaking changes occasionally — read CHANGELOG before major updates.

Q: VRAM requirements? A: Depends on model + quantization. 7B Q4_K_M GGUF runs on 6GB VRAM (or CPU). Llama 3.1-70B Q4 needs ~40GB VRAM, or splits between CPU/GPU with --gpu-layers.

Sources

GitHub: https://github.com/oobabooga/text-generation-webui
Maintainer: oobabooga
License: AGPL-3.0

text-generation-webui — A Gradio Web UI for Local LLMs

Introduction

What It Does

Architecture Overview

Self-Hosting & Configuration

Key Features

Comparison with Similar Tools

FAQ

Sources

Discussion

Related Assets

LibreTranslate — Self-Hosted Translation API with No Rate Limits

Monica — Personal Relationship Manager for Remembering What Matters

Focalboard — Open-Source Project Management Alternative to Trello and Notion