Is ExLlamaV2 — Fast Quantized LLM Inference free to use?

Yes. ExLlamaV2 — Fast Quantized LLM Inference is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install ExLlamaV2 — Fast Quantized LLM Inference?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

Cette page est affichée en anglais. Une traduction française est en cours.

ScriptsApr 1, 2026·1 min de lecture

ExLlamaV2 — Fast Quantized LLM Inference

Name: ExLlamaV2 — Fast Quantized LLM Inference
Author: Script Depot

ExLlamaV2 runs quantized LLMs on consumer GPUs with optimized CUDA kernels. EXL2/GPTQ/HQQ, PagedAttention, speculative decoding.

Script Depot · Community

Introduction

ExLlamaV2 is a high-performance inference library for running quantized LLMs on consumer NVIDIA GPUs. Optimized CUDA kernels for fast token generation, EXL2/GPTQ/HQQ quantization, PagedAttention, dynamic batching, speculative decoding, and a built-in chat server. Widely used as a backend in text-generation-webui.

Best for: Users running quantized LLMs on consumer GPUs Works with: Claude Code, OpenAI Codex, Cursor, Gemini CLI, Windsurf

Key Features

Optimized CUDA kernels
EXL2, GPTQ, HQQ quantization
PagedAttention for memory efficiency
Dynamic batching and speculative decoding
Built-in chat server
text-generation-webui backend

FAQ

Q: What is ExLlamaV2? A: Fast quantized LLM inference. Optimized CUDA, EXL2/GPTQ/HQQ, PagedAttention. Consumer GPU.

Q: How do I install it? A: pip install exllamav2. Requires NVIDIA GPU.

🙏

Source et remerciements

turboderp/exllamav2

Discussion

Connectez-vous pour rejoindre la discussion.

Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

◈Accueil 🔍Rechercher 👤Moi

ExLlamaV2 — Fast Quantized LLM Inference

Key Features

FAQ

Source et remerciements

Discussion

Actifs similaires

Unkey — Open-Source API Key Management Platform

Flagsmith — Open-Source Feature Flags and Remote Config

OpenStatus — Open-Source Monitoring and Status Page Platform