Skills2026年3月31日·1 分钟阅读

Text Generation WebUI — Local LLM Chat Interface

Text Generation WebUI is a Gradio interface for running LLMs locally. 46.4K+ GitHub stars. Multiple backends, vision, training, image gen, OpenAI-compatible API. 100% offline.

AI Open Source · Community

Agent 就绪

Agent 可直接安装

这个资产可安装；Agent 先选择当前运行时、检查安装计划，再运行匹配命令。

Native · 98/100策略：允许

Agent 入口

任意 MCP/CLI Agent

类型

Skill

安装

Single

信任

信任等级：Established

入口

Text Generation WebUI — Local LLM Chat Interface

直接安装命令

npx -y tokrepo@latest install 11107806-c69a-4b75-8360-d0504ff602d7 --target codex

先 dry-run 确认安装计划，再运行此命令。

TL;DR

Text Generation WebUI runs LLMs locally with a Gradio interface, multiple backends, vision support, and an OpenAI-compatible API.

§01

What it is

Text Generation WebUI (by oobabooga) is a Gradio-based web interface for running large language models locally. It supports multiple backends including llama.cpp, ExLlamaV2, Transformers, and AutoGPTQ. Features include text chat, instruction-following, vision (image input), LoRA training, image generation, and an OpenAI-compatible API server. Everything runs on your hardware with no data leaving your machine.

This tool is for developers and AI enthusiasts who want to run open-source LLMs locally for privacy, experimentation, or offline use. If you want a ChatGPT-like interface for models like LLaMA, Mistral, or Qwen without cloud dependencies, this is the standard choice.

§02

How it saves time or tokens

Text Generation WebUI provides a unified interface for multiple model backends, eliminating the need to set up separate environments for each. The one-click installer handles Python dependencies, CUDA drivers, and model downloading. The OpenAI-compatible API means existing tools and scripts that target OpenAI can point to your local instance without code changes. Running locally also means zero API costs -- no token billing.

§03

How to use

Download the one-click installer from the GitHub releases page for your OS (Windows, macOS, Linux).
Run the installer, which sets up a Python environment with all dependencies.
Download a model from HuggingFace through the UI or place model files in the models directory.

§04

Example

# Manual installation
git clone https://github.com/oobabooga/text-generation-webui.git
cd text-generation-webui
pip install -r requirements.txt

# Start the web UI
python server.py

# Start with OpenAI-compatible API
python server.py --api --listen

# The API is now available at http://localhost:5000/v1
# Use it with any OpenAI client:
curl http://localhost:5000/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{"model": "local-model", "messages": [{"role": "user", "content": "Hello"}]}'

§05

Related on TokRepo

Local LLM tools -- Text Generation WebUI deep dive
Self-hosted AI tools -- run AI tools on your own infrastructure

§06

Common pitfalls

Not having enough VRAM for the chosen model. A 7B parameter model needs roughly 6GB VRAM in 4-bit quantization. Check model requirements before downloading.
Using the wrong backend for your hardware. llama.cpp works well on CPU and Apple Silicon. ExLlamaV2 is optimized for NVIDIA GPUs. The UI lets you switch backends in settings.
Running the API server without authentication on a public network. The OpenAI-compatible API has no built-in auth. Use a reverse proxy or firewall rules if exposing it beyond localhost.

常见问题

What models can I run with Text Generation WebUI?+

You can run most open-source LLMs including LLaMA, Mistral, Qwen, Phi, Gemma, and hundreds of fine-tuned variants. The UI downloads models directly from HuggingFace. GGUF, GPTQ, AWQ, and EXL2 quantized formats are all supported.

What hardware do I need?+

For 7B models in 4-bit quantization, you need 6-8GB VRAM (NVIDIA GPU) or 8GB RAM (CPU/Apple Silicon via llama.cpp). Larger models (13B, 30B, 70B) need proportionally more memory. CPU inference works but is significantly slower than GPU.

Does it support image/vision models?+

Yes. Text Generation WebUI supports multimodal models that accept image inputs. Models like LLaVA and other vision-language models can process images alongside text prompts through the chat interface.

Can I fine-tune models with this tool?+

Yes. The training tab supports LoRA fine-tuning with your own datasets. You can create custom fine-tunes of base models using conversational data, instruction datasets, or raw text directly through the web interface.

How does the OpenAI-compatible API work?+

Start the server with the --api flag and it exposes endpoints at /v1/chat/completions and /v1/completions that accept the same JSON format as the OpenAI API. Any client library or tool designed for OpenAI works without modification.

引用来源 (3)

Text Generation WebUI GitHub— Text Generation WebUI is a Gradio interface for LLMs with 46K+ stars
Text Generation WebUI Wiki— Supports llama.cpp, ExLlamaV2, Transformers, AutoGPTQ backends
Text Generation WebUI API Docs— OpenAI-compatible API for local model serving

🙏

来源与感谢

Created by oobabooga. Open source. oobabooga/text-generation-webui — 46,400+ GitHub stars

讨论

登录后参与讨论。

还没有评论，来写第一条吧。

Text Generation WebUI — Local LLM Chat Interface

Agent 可直接安装

What it is

How it saves time or tokens

How to use

Example

Related on TokRepo

Common pitfalls

常见问题

引用来源 (3)

TokRepo 相关

来源与感谢

讨论

相关资产

HuggingFace Chat UI — Open-Source AI Chat Interface

text-generation-webui — A Gradio Web UI for Local LLMs

Unsloth — 2x Faster Local LLM Training & Inference

Open WebUI — Self-Hosted AI Chat Interface