Configs2026年4月14日·1 分钟阅读

text-generation-webui — A Gradio Web UI for Local LLMs

oobabooga's text-generation-webui is the "AUTOMATIC1111 of LLMs": a feature-rich Gradio interface for chatting with and serving local language models. It supports llama.cpp, Transformers, ExLlamaV2, and dozens of model formats.

Introduction

Text-generation-webui (often called "oobabooga" after its maintainer) is the most popular GUI for running language models locally. With over 46,000 GitHub stars, it gives you tabs for chat, notebook-style completion, model loading, training (LoRA), and an OpenAI-compatible API.

It supports virtually every local model format: GGUF (llama.cpp), GPTQ/AWQ/EXL2 (ExLlamaV2), HF Transformers (FP16/INT8/4-bit), and even LangChain-style integrations. Choose any model and a sane backend will load it.

What It Does

The Web UI provides: Chat tab (system prompts, character cards, multi-turn), Default tab (raw completion), Notebook tab (long-form writing), Parameters tab (sampling controls, instruction templates), Model tab (browse/download from HF, hot-load), and Training tab (LoRA/QLoRA training). API mode mimics the OpenAI API for easy integration.

Architecture Overview

[Gradio UI]
      |
+-----+-----+
|           |
Chat / Notebook / Default tabs
      |
[Parameter dispatch]
   sampling, templates, character cards
      |
[Backend Loader]
  +--- llama.cpp (GGUF)
  +--- Transformers (HF, FP16/INT8)
  +--- ExLlamaV2 (GPTQ/AWQ/EXL2)
  +--- HQQ, AQLM
      |
[OpenAI-Compatible API]
  /v1/chat/completions
  /v1/completions
  /v1/embeddings
      |
[Extensions]
   sd-api-pictures (auto-img),
   coqui-tts (voice),
   memoir+, openai_emb, ...

Self-Hosting & Configuration

# Common settings
./start_linux.sh \
  --listen \
  --listen-port 7860 \
  --api \
  --api-port 5000 \
  --model Qwen2.5-7B-Instruct-GGUF \
  --loader llama.cpp \
  --gpu-layers 35 \
  --threads 8
# Use the OpenAI-compatible API from any OpenAI client
from openai import OpenAI
client = OpenAI(base_url="http://localhost:5000/v1", api_key="local")
resp = client.chat.completions.create(
    model="Qwen2.5-7B-Instruct",
    messages=[{"role": "user", "content": "Explain MoE briefly."}],
)
print(resp.choices[0].message.content)

Key Features

  • Multi-backend — llama.cpp, Transformers, ExLlamaV2, HQQ, AQLM, more
  • Chat / Notebook / Default modes — UX for any text-gen workflow
  • Character cards — preset personas with system prompts
  • OpenAI-compatible API — drop-in for tools expecting OpenAI
  • Training tab — fine-tune via LoRA/QLoRA in the same UI
  • Extensions ecosystem — voice (TTS/STT), images (SD), memory, RAG
  • Cross-platform — single launcher for Linux/macOS/Windows
  • Active community — frequent updates following the model release cycle

Comparison with Similar Tools

Feature text-gen-webui LM Studio Ollama KoboldCpp Open WebUI
Open source Yes No (closed) Yes Yes Yes
Backend choice Many One (llama.cpp) One (llama.cpp) One (llama.cpp) Connects to any
Built-in chat UI Yes Yes No (CLI) Yes (RP focus) Yes (best)
OpenAI API Yes Yes Yes Yes Yes
Fine-tuning Yes (LoRA) No No No No
Roleplay focus Yes Limited No Strong Limited
Best For Power users Beginners on macOS/Windows CLI / scripting Roleplay Polished chat UI

FAQ

Q: text-gen-webui vs Open WebUI? A: Open WebUI is a polished chat front-end that talks to any backend (Ollama, OpenAI, llama.cpp). text-gen-webui bundles the backend, model loading, and fine-tuning in one app. Power users often run text-gen-webui as the backend with Open WebUI as the chat UI.

Q: Does it support image generation? A: Not natively, but the sd-api-pictures extension lets the chat call out to a Stable Diffusion server.

Q: How do I update? A: git pull then re-run the start script (it updates Python deps automatically). The maintainer ships breaking changes occasionally — read CHANGELOG before major updates.

Q: VRAM requirements? A: Depends on model + quantization. 7B Q4_K_M GGUF runs on 6GB VRAM (or CPU). Llama 3.1-70B Q4 needs ~40GB VRAM, or splits between CPU/GPU with --gpu-layers.

Sources

讨论

登录后参与讨论。
还没有评论,来写第一条吧。

相关资产