Is text-generation-webui — A Gradio Web UI for Local LLMs free to use?

Yes. text-generation-webui — A Gradio Web UI for Local LLMs is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install text-generation-webui — A Gradio Web UI for Local LLMs?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

ConfigsApr 14, 2026·3 min read

text-generation-webui — A Gradio Web UI for Local LLMs

oobabooga's text-generation-webui is the "AUTOMATIC1111 of LLMs": a feature-rich Gradio interface for chatting with and serving local language models. It supports llama.cpp, Transformers, ExLlamaV2, and dozens of model formats.

AI Open Source · Community

TL;DR

Feature-rich Gradio web UI for running local language models with llama.cpp, Transformers, and ExLlamaV2 backends.

§01

What it is

text-generation-webui (commonly called 'oobabooga') is a feature-rich Gradio web interface for chatting with and serving local language models. It supports multiple backends including llama.cpp, Transformers, ExLlamaV2, and dozens of model formats. The one-line installer detects your hardware (CUDA, ROCm, MPS, CPU) and configures the appropriate backend automatically.

The project targets users who want to run LLMs locally with a user-friendly web interface. It provides chat, notebook, and API modes, model management, LoRA loading, and extension support.

§02

How it saves time or tokens

text-generation-webui eliminates the need to write Python scripts for local model inference. The web UI provides a chat interface, parameter tuning, model comparison, and API endpoints without any code. The one-line installer handles Python environments, CUDA dependencies, and backend compilation. For experimentation with different models and parameters, the UI approach is faster than editing scripts.

§03

How to use

Clone and run the installer:

git clone https://github.com/oobabooga/text-generation-webui
cd text-generation-webui
./start_linux.sh   # or start_windows.bat / start_macos.sh

Select your hardware during setup (CUDA, ROCm, MPS, or CPU).

Download a model from the Model tab and start chatting.

§04

Example

Using the API for programmatic access:

import requests

response = requests.post(
    'http://localhost:5000/v1/chat/completions',
    json={
        'model': 'loaded-model',
        'messages': [
            {'role': 'user', 'content': 'Explain quantum computing briefly'}
        ],
        'temperature': 0.7,
        'max_tokens': 200,
    }
)

print(response.json()['choices'][0]['message']['content'])

The API follows the OpenAI chat completions format, making it a drop-in replacement for API-based workflows.

§05

Related on TokRepo

Local LLM with text-generation-webui — Detailed guide for text-generation-webui setup
Local LLM Providers — Compare local LLM running tools including Ollama and LM Studio

§06

Common pitfalls

The installer creates a large Python environment (several GB). Ensure sufficient disk space before installation.
VRAM requirements vary by model and quantization. A 7B model at 4-bit quantization needs roughly 6GB VRAM. Check model requirements before downloading.
Some model formats (GPTQ, AWQ, EXL2) require specific backends. Not all backends are compatible with all formats.
Always check the official documentation for the latest version-specific changes and migration guides before upgrading in production environments.
For team deployments, establish clear guidelines on configuration and usage patterns to ensure consistency across developers.
Model quantization levels (4-bit, 8-bit, 16-bit) trade quality for speed and memory usage. Start with 4-bit quantization for testing and increase precision for production quality.
The web UI exposes an API endpoint by default. In shared environments, configure authentication or restrict access to localhost to prevent unauthorized model usage.

Frequently Asked Questions

What model formats does text-generation-webui support?+

It supports GGUF (llama.cpp), GPTQ, AWQ, EXL2 (ExLlamaV2), and standard Hugging Face Transformers format. Each format has different performance characteristics and VRAM requirements.

Does it provide an OpenAI-compatible API?+

Yes. The built-in API server follows the OpenAI chat completions format. This means you can use it as a local replacement for OpenAI's API in applications that support custom endpoints.

What hardware is required?+

The UI works on NVIDIA GPUs (CUDA), AMD GPUs (ROCm), Apple Silicon (MPS), and CPU-only setups. GPU acceleration dramatically improves inference speed. A minimum of 8GB VRAM is recommended for 7B parameter models.

Can I load LoRA adapters?+

Yes. The UI supports loading LoRA adapters on top of base models. This lets you use fine-tuned models without merging the adapters, saving disk space and enabling quick switching.

How does text-generation-webui compare to Ollama?+

Ollama provides a simpler CLI-focused experience for running models. text-generation-webui offers a richer web UI with more parameter controls, multiple backends, extension support, and model comparison features. Ollama is easier to set up; text-generation-webui provides more flexibility.

Citations (3)

text-generation-webui GitHub— text-generation-webui is a Gradio UI for local LLMs
text-generation-webui Wiki— Multiple backend support: llama.cpp, Transformers, ExLlamaV2
llama.cpp GitHub— llama.cpp for efficient LLM inference

Related on TokRepo

text-generation-webui guide Local LLM providers Featured workflows

Discussion

No comments yet. Be the first to share your thoughts.

text-generation-webui — A Gradio Web UI for Local LLMs

What it is

How it saves time or tokens

How to use

Example

Related on TokRepo

Common pitfalls

Frequently Asked Questions

Citations (3)

Related on TokRepo

Discussion

Related Assets

Cucumber.js — BDD Testing with Plain Language Scenarios

WireMock — Flexible API Mocking for Java and Beyond

Google Benchmark — Microbenchmark Library for C++