What is MLC-LLM — Universal LLM Deployment Engine?

Deploy any LLM on any hardware — phones, browsers, GPUs, CPUs. Compiles models for native performance on iOS, Android, WebGPU, CUDA, Metal, and Vulkan. 22K+ stars.

Is MLC-LLM — Universal LLM Deployment Engine free to use?

Yes. MLC-LLM — Universal LLM Deployment Engine is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install MLC-LLM — Universal LLM Deployment Engine?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

MLC-LLM — Universal LLM Deployment Engine

Key Features

Universal Hardware Support

Platform	Backend
NVIDIA GPU	CUDA
Apple Silicon	Metal
AMD GPU	Vulkan/ROCm
Browsers	WebGPU
iOS	Metal + Core ML
Android	Vulkan + OpenCL
CPU	x86 / ARM

WebLLM (Browser)

Run LLMs entirely in the browser with WebGPU — no server needed:

import { CreateMLCEngine } from "@mlc-ai/web-llm";
const engine = await CreateMLCEngine("Llama-3-8B-Instruct-q4f16_1-MLC");
const reply = await engine.chat.completions.create({
  messages: [{ role: "user", content: "Hello!" }],
});

Quantization

4-bit and 3-bit quantization for running large models on consumer hardware.

OpenAI-Compatible API

REST server with OpenAI-compatible endpoints — drop-in replacement.

Native Performance

ML compilation optimizes models for each specific hardware target, achieving near-optimal throughput.

FAQ

Q: What is MLC-LLM? A: A universal LLM deployment engine that compiles models for native performance on any hardware — phones, browsers, GPUs, and CPUs. 22K+ stars, Apache 2.0.

Q: Can I run Llama 3 on my iPhone? A: Yes, MLC-LLM compiles Llama 3 (quantized) for iOS with Metal acceleration. There's an iOS app available.

MLC-LLM — Universal LLM Deployment Engine

Use it first, then decide how deep to go

Key Features

Universal Hardware Support

WebLLM (Browser)

Quantization

OpenAI-Compatible API

Native Performance

FAQ

Source & Thanks

Related Assets

Dagger — Programmable CI/CD Engine

PandasAI — Chat with Your Data Using AI

Vanna — Chat with Your SQL Database Using AI