Cette page est affichée en anglais. Une traduction française est en cours.

SkillsApr 14, 2026·3 min de lecture

text-generation-webui — A Gradio Web UI for Local LLMs

oobabooga's text-generation-webui is the "AUTOMATIC1111 of LLMs": a feature-rich Gradio interface for chatting with and serving local language models. It supports llama.cpp, Transformers, ExLlamaV2, and dozens of model formats.

AI Open Source · Community

Prêt pour agents

Installation agent prête

Cet actif peut être installé après choix du runtime, vérification du plan et exécution de la commande adaptée.

Native · 98/100Policy : autoriser

Surface agent

Tout agent MCP/CLI

Type

Skill

Installation

Single

Confiance

Confiance : Established

Point d'entrée

step-1.md

Commande d'installation directe

npx -y tokrepo@latest install b0d2eaa8-37db-11f1-9bc6-00163e2b0d79 --target codex

À exécuter après confirmation du plan en dry-run.

TL;DR

Feature-rich Gradio web UI for running local language models with llama.cpp, Transformers, and ExLlamaV2 backends.

§01

What it is

text-generation-webui (commonly called 'oobabooga') is a feature-rich Gradio web interface for chatting with and serving local language models. It supports multiple backends including llama.cpp, Transformers, ExLlamaV2, and dozens of model formats. The one-line installer detects your hardware (CUDA, ROCm, MPS, CPU) and configures the appropriate backend automatically.

The project targets users who want to run LLMs locally with a user-friendly web interface. It provides chat, notebook, and API modes, model management, LoRA loading, and extension support.

§02

How it saves time or tokens

text-generation-webui eliminates the need to write Python scripts for local model inference. The web UI provides a chat interface, parameter tuning, model comparison, and API endpoints without any code. The one-line installer handles Python environments, CUDA dependencies, and backend compilation. For experimentation with different models and parameters, the UI approach is faster than editing scripts.

§03

How to use

Clone and run the installer:

git clone https://github.com/oobabooga/text-generation-webui
cd text-generation-webui
./start_linux.sh   # or start_windows.bat / start_macos.sh

Select your hardware during setup (CUDA, ROCm, MPS, or CPU).

Download a model from the Model tab and start chatting.

§04

Example

Using the API for programmatic access:

import requests

response = requests.post(
    'http://localhost:5000/v1/chat/completions',
    json={
        'model': 'loaded-model',
        'messages': [
            {'role': 'user', 'content': 'Explain quantum computing briefly'}
        ],
        'temperature': 0.7,
        'max_tokens': 200,
    }
)

print(response.json()['choices'][0]['message']['content'])

The API follows the OpenAI chat completions format, making it a drop-in replacement for API-based workflows.

§05

Related on TokRepo

Local LLM with text-generation-webui — Detailed guide for text-generation-webui setup
Local LLM Providers — Compare local LLM running tools including Ollama and LM Studio

§06

Common pitfalls

The installer creates a large Python environment (several GB). Ensure sufficient disk space before installation.
VRAM requirements vary by model and quantization. A 7B model at 4-bit quantization needs roughly 6GB VRAM. Check model requirements before downloading.
Some model formats (GPTQ, AWQ, EXL2) require specific backends. Not all backends are compatible with all formats.
Always check the official documentation for the latest version-specific changes and migration guides before upgrading in production environments.
For team deployments, establish clear guidelines on configuration and usage patterns to ensure consistency across developers.
Model quantization levels (4-bit, 8-bit, 16-bit) trade quality for speed and memory usage. Start with 4-bit quantization for testing and increase precision for production quality.
The web UI exposes an API endpoint by default. In shared environments, configure authentication or restrict access to localhost to prevent unauthorized model usage.

Questions fréquentes

What model formats does text-generation-webui support?+

It supports GGUF (llama.cpp), GPTQ, AWQ, EXL2 (ExLlamaV2), and standard Hugging Face Transformers format. Each format has different performance characteristics and VRAM requirements.

Does it provide an OpenAI-compatible API?+

Yes. The built-in API server follows the OpenAI chat completions format. This means you can use it as a local replacement for OpenAI's API in applications that support custom endpoints.

What hardware is required?+

The UI works on NVIDIA GPUs (CUDA), AMD GPUs (ROCm), Apple Silicon (MPS), and CPU-only setups. GPU acceleration dramatically improves inference speed. A minimum of 8GB VRAM is recommended for 7B parameter models.

Can I load LoRA adapters?+

Yes. The UI supports loading LoRA adapters on top of base models. This lets you use fine-tuned models without merging the adapters, saving disk space and enabling quick switching.

How does text-generation-webui compare to Ollama?+

Ollama provides a simpler CLI-focused experience for running models. text-generation-webui offers a richer web UI with more parameter controls, multiple backends, extension support, and model comparison features. Ollama is easier to set up; text-generation-webui provides more flexibility.

Sources citées (3)

text-generation-webui GitHub— text-generation-webui is a Gradio UI for local LLMs
text-generation-webui Wiki— Multiple backend support: llama.cpp, Transformers, ExLlamaV2
llama.cpp GitHub— llama.cpp for efficient LLM inference

En lien sur TokRepo

text-generation-webui guide Local LLM providers Featured workflows

Fil de discussion

Connectez-vous pour rejoindre la discussion.

Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires

Text Generation WebUI — Local LLM Chat Interface

Text Generation WebUI is a Gradio interface for running LLMs locally. 46.4K+ GitHub stars. Multiple backends, vision, training, image gen, OpenAI-compatible API. 100% offline.

Skills

AI Open Source

HuggingFace Chat UI — Open-Source AI Chat Interface

Chat UI is Hugging Face's open-source web interface for conversational AI, powering HuggingChat and supporting any text-generation model via TGI, Ollama, or OpenAI-compatible APIs with features like web search, tool use, and multimodal input.

Skills

AI Open Source

Unsloth — 2x Faster Local LLM Training & Inference

Unsloth is a unified local interface for running and training AI models. 58.7K+ GitHub stars. 2x faster training with 70% less VRAM across 500+ models including Qwen, DeepSeek, Llama, Gemma. Web UI wi

Skills

AI Open Source

CogVideo — Text and Image to Video Generation

An open-source video generation framework from Zhipu AI supporting text-to-video and image-to-video with CogVideoX models. Generates high-quality clips up to 6 seconds.

Skills

Script Depot