Question 1

What is ExLlamaV2 — Fast Quantized LLM Inference?

Accepted Answer

ExLlamaV2 runs quantized LLMs on consumer GPUs with optimized CUDA kernels. EXL2/GPTQ/HQQ, PagedAttention, speculative decoding.

Question 2

Is ExLlamaV2 — Fast Quantized LLM Inference free to use?

Accepted Answer

Yes. ExLlamaV2 — Fast Quantized LLM Inference is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

Question 3

How do I install ExLlamaV2 — Fast Quantized LLM Inference?

Accepted Answer

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

ExLlamaV2 — Fast Quantized LLM Inference

先拿来用，再决定要不要深挖

来源与感谢

讨论

相关资产

Hoppscotch — Open-Source API Development Platform

AFFiNE — Open-Source Notion Alternative

Uptime Kuma — Self-Hosted Uptime Monitoring