Is ExLlamaV2 — Fast Quantized LLM Inference free to use?

Yes. ExLlamaV2 — Fast Quantized LLM Inference is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install ExLlamaV2 — Fast Quantized LLM Inference?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

Esta página se muestra en inglés. Una traducción al español está en curso.

ScriptsApr 1, 2026·1 min de lectura

ExLlamaV2 — Fast Quantized LLM Inference

Name: ExLlamaV2 — Fast Quantized LLM Inference
Author: Script Depot

ExLlamaV2 runs quantized LLMs on consumer GPUs with optimized CUDA kernels. EXL2/GPTQ/HQQ, PagedAttention, speculative decoding.

Script Depot · Community

Introducción

ExLlamaV2 is a high-performance inference library for running quantized LLMs on consumer NVIDIA GPUs. Optimized CUDA kernels for fast token generation, EXL2/GPTQ/HQQ quantization, PagedAttention, dynamic batching, speculative decoding, and a built-in chat server. Widely used as a backend in text-generation-webui.

Best for: Users running quantized LLMs on consumer GPUs Works with: Claude Code, OpenAI Codex, Cursor, Gemini CLI, Windsurf

Key Features

Optimized CUDA kernels
EXL2, GPTQ, HQQ quantization
PagedAttention for memory efficiency
Dynamic batching and speculative decoding
Built-in chat server
text-generation-webui backend

FAQ

Q: What is ExLlamaV2? A: Fast quantized LLM inference. Optimized CUDA, EXL2/GPTQ/HQQ, PagedAttention. Consumer GPU.

Q: How do I install it? A: pip install exllamav2. Requires NVIDIA GPU.

🙏

Fuente y agradecimientos

turboderp/exllamav2

Discusión

Inicia sesión para unirte a la discusión.

Aún no hay comentarios. Sé el primero en compartir tus ideas.

◈Inicio 🔍Buscar 👤Yo

ExLlamaV2 — Fast Quantized LLM Inference

Key Features

FAQ

Fuente y agradecimientos

Discusión

Activos relacionados

Unkey — Open-Source API Key Management Platform

Flagsmith — Open-Source Feature Flags and Remote Config

OpenStatus — Open-Source Monitoring and Status Page Platform