ScriptsMar 31, 2026·2 min read

MLC-LLM — Universal LLM Deployment Engine

Deploy any LLM on any hardware — phones, browsers, GPUs, CPUs. Compiles models for native performance on iOS, Android, WebGPU, CUDA, Metal, and Vulkan. 22K+ stars.

TO
TokRepo精选 · Community
Quick Use

Use it first, then decide how deep to go

This block should tell both the user and the agent what to copy, install, and apply first.

pip install mlc-llm

# Download and run a model
mlc_llm chat HF://mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC

Or run in browser at webllm.mlc.ai.


Intro

MLC-LLM is a universal deployment engine that runs large language models natively on any hardware. Using ML compilation (Apache TVM), it compiles LLMs for optimized inference on iOS, Android, WebGPU (browsers), CUDA, Metal, Vulkan, and CPUs. Run Llama, Mistral, Phi, Gemma, and other models at native speed everywhere — from phones to servers. 22,000+ GitHub stars, Apache 2.0.

Best for: Deploying LLMs on edge devices, mobile apps, browsers, and custom hardware Works with: Llama 3, Mistral, Phi, Gemma, Qwen, StableLM, and 50+ models


Key Features

Universal Hardware Support

Platform Backend
NVIDIA GPU CUDA
Apple Silicon Metal
AMD GPU Vulkan/ROCm
Browsers WebGPU
iOS Metal + Core ML
Android Vulkan + OpenCL
CPU x86 / ARM

WebLLM (Browser)

Run LLMs entirely in the browser with WebGPU — no server needed:

import { CreateMLCEngine } from "@mlc-ai/web-llm";
const engine = await CreateMLCEngine("Llama-3-8B-Instruct-q4f16_1-MLC");
const reply = await engine.chat.completions.create({
  messages: [{ role: "user", content: "Hello!" }],
});

Quantization

4-bit and 3-bit quantization for running large models on consumer hardware.

OpenAI-Compatible API

REST server with OpenAI-compatible endpoints — drop-in replacement.

Native Performance

ML compilation optimizes models for each specific hardware target, achieving near-optimal throughput.


FAQ

Q: What is MLC-LLM? A: A universal LLM deployment engine that compiles models for native performance on any hardware — phones, browsers, GPUs, and CPUs. 22K+ stars, Apache 2.0.

Q: Can I run Llama 3 on my iPhone? A: Yes, MLC-LLM compiles Llama 3 (quantized) for iOS with Metal acceleration. There's an iOS app available.


🙏

Source & Thanks

Created by MLC AI. Licensed under Apache 2.0. mlc-ai/mlc-llm — 22,000+ GitHub stars

Related Assets