Text Generation WebUI — Local LLM Chat Interface
Text Generation WebUI is a Gradio interface for running LLMs locally. 46.4K+ GitHub stars. Multiple backends, vision, training, image gen, OpenAI-compatible API. 100% offline.
Agent 可直接安装
这个资产可安装;Agent 先选择当前运行时、检查安装计划,再运行匹配命令。
npx -y tokrepo@latest install 11107806-c69a-4b75-8360-d0504ff602d7 --target codex先 dry-run 确认安装计划,再运行此命令。
What it is
Text Generation WebUI (by oobabooga) is a Gradio-based web interface for running large language models locally. It supports multiple backends including llama.cpp, ExLlamaV2, Transformers, and AutoGPTQ. Features include text chat, instruction-following, vision (image input), LoRA training, image generation, and an OpenAI-compatible API server. Everything runs on your hardware with no data leaving your machine.
This tool is for developers and AI enthusiasts who want to run open-source LLMs locally for privacy, experimentation, or offline use. If you want a ChatGPT-like interface for models like LLaMA, Mistral, or Qwen without cloud dependencies, this is the standard choice.
How it saves time or tokens
Text Generation WebUI provides a unified interface for multiple model backends, eliminating the need to set up separate environments for each. The one-click installer handles Python dependencies, CUDA drivers, and model downloading. The OpenAI-compatible API means existing tools and scripts that target OpenAI can point to your local instance without code changes. Running locally also means zero API costs -- no token billing.
How to use
- Download the one-click installer from the GitHub releases page for your OS (Windows, macOS, Linux).
- Run the installer, which sets up a Python environment with all dependencies.
- Download a model from HuggingFace through the UI or place model files in the models directory.
Example
# Manual installation
git clone https://github.com/oobabooga/text-generation-webui.git
cd text-generation-webui
pip install -r requirements.txt
# Start the web UI
python server.py
# Start with OpenAI-compatible API
python server.py --api --listen
# The API is now available at http://localhost:5000/v1
# Use it with any OpenAI client:
curl http://localhost:5000/v1/chat/completions \
-H 'Content-Type: application/json' \
-d '{"model": "local-model", "messages": [{"role": "user", "content": "Hello"}]}'
Related on TokRepo
- Local LLM tools -- Text Generation WebUI deep dive
- Self-hosted AI tools -- run AI tools on your own infrastructure
Common pitfalls
- Not having enough VRAM for the chosen model. A 7B parameter model needs roughly 6GB VRAM in 4-bit quantization. Check model requirements before downloading.
- Using the wrong backend for your hardware. llama.cpp works well on CPU and Apple Silicon. ExLlamaV2 is optimized for NVIDIA GPUs. The UI lets you switch backends in settings.
- Running the API server without authentication on a public network. The OpenAI-compatible API has no built-in auth. Use a reverse proxy or firewall rules if exposing it beyond localhost.
常见问题
You can run most open-source LLMs including LLaMA, Mistral, Qwen, Phi, Gemma, and hundreds of fine-tuned variants. The UI downloads models directly from HuggingFace. GGUF, GPTQ, AWQ, and EXL2 quantized formats are all supported.
For 7B models in 4-bit quantization, you need 6-8GB VRAM (NVIDIA GPU) or 8GB RAM (CPU/Apple Silicon via llama.cpp). Larger models (13B, 30B, 70B) need proportionally more memory. CPU inference works but is significantly slower than GPU.
Yes. Text Generation WebUI supports multimodal models that accept image inputs. Models like LLaVA and other vision-language models can process images alongside text prompts through the chat interface.
Yes. The training tab supports LoRA fine-tuning with your own datasets. You can create custom fine-tunes of base models using conversational data, instruction datasets, or raw text directly through the web interface.
Start the server with the --api flag and it exposes endpoints at /v1/chat/completions and /v1/completions that accept the same JSON format as the OpenAI API. Any client library or tool designed for OpenAI works without modification.
引用来源 (3)
- Text Generation WebUI GitHub— Text Generation WebUI is a Gradio interface for LLMs with 46K+ stars
- Text Generation WebUI Wiki— Supports llama.cpp, ExLlamaV2, Transformers, AutoGPTQ backends
- Text Generation WebUI API Docs— OpenAI-compatible API for local model serving
来源与感谢
Created by oobabooga. Open source. oobabooga/text-generation-webui — 46,400+ GitHub stars
讨论
相关资产
HuggingFace Chat UI — Open-Source AI Chat Interface
Chat UI is Hugging Face's open-source web interface for conversational AI, powering HuggingChat and supporting any text-generation model via TGI, Ollama, or OpenAI-compatible APIs with features like web search, tool use, and multimodal input.
text-generation-webui — A Gradio Web UI for Local LLMs
oobabooga's text-generation-webui is the "AUTOMATIC1111 of LLMs": a feature-rich Gradio interface for chatting with and serving local language models. It supports llama.cpp, Transformers, ExLlamaV2, and dozens of model formats.
Unsloth — 2x Faster Local LLM Training & Inference
Unsloth is a unified local interface for running and training AI models. 58.7K+ GitHub stars. 2x faster training with 70% less VRAM across 500+ models including Qwen, DeepSeek, Llama, Gemma. Web UI wi
Open WebUI — Self-Hosted AI Chat Interface
User-friendly, self-hosted AI chat interface. Supports Ollama, OpenAI, Anthropic, and any OpenAI-compatible API. RAG, web search, voice, image gen, and plugins. 129K+ stars.