# MLC-LLM — Universal LLM Deployment Engine

> Deploy any LLM on any hardware — phones, browsers, GPUs, CPUs. Compiles models for native performance on iOS, Android, WebGPU, CUDA, Metal, and Vulkan. 22K+ stars.

## Install

Save as a script file and run:

## Quick Use

```bash
pip install mlc-llm

# Download and run a model
mlc_llm chat HF://mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC
```

Or run in browser at [webllm.mlc.ai](https://webllm.mlc.ai).

---

## Intro

MLC-LLM is a universal deployment engine that runs large language models natively on any hardware. Using ML compilation (Apache TVM), it compiles LLMs for optimized inference on iOS, Android, WebGPU (browsers), CUDA, Metal, Vulkan, and CPUs. Run Llama, Mistral, Phi, Gemma, and other models at native speed everywhere — from phones to servers. 22,000+ GitHub stars, Apache 2.0.

**Best for**: Deploying LLMs on edge devices, mobile apps, browsers, and custom hardware
**Works with**: Llama 3, Mistral, Phi, Gemma, Qwen, StableLM, and 50+ models

---

## Key Features

### Universal Hardware Support
| Platform | Backend |
|----------|---------|
| NVIDIA GPU | CUDA |
| Apple Silicon | Metal |
| AMD GPU | Vulkan/ROCm |
| Browsers | WebGPU |
| iOS | Metal + Core ML |
| Android | Vulkan + OpenCL |
| CPU | x86 / ARM |

### WebLLM (Browser)
Run LLMs entirely in the browser with WebGPU — no server needed:
```javascript
import { CreateMLCEngine } from "@mlc-ai/web-llm";
const engine = await CreateMLCEngine("Llama-3-8B-Instruct-q4f16_1-MLC");
const reply = await engine.chat.completions.create({
  messages: [{ role: "user", content: "Hello!" }],
});
```

### Quantization
4-bit and 3-bit quantization for running large models on consumer hardware.

### OpenAI-Compatible API
REST server with OpenAI-compatible endpoints — drop-in replacement.

### Native Performance
ML compilation optimizes models for each specific hardware target, achieving near-optimal throughput.

---

### FAQ

**Q: What is MLC-LLM?**
A: A universal LLM deployment engine that compiles models for native performance on any hardware — phones, browsers, GPUs, and CPUs. 22K+ stars, Apache 2.0.

**Q: Can I run Llama 3 on my iPhone?**
A: Yes, MLC-LLM compiles Llama 3 (quantized) for iOS with Metal acceleration. There's an iOS app available.

---

## Source & Thanks

> Created by [MLC AI](https://github.com/mlc-ai). Licensed under Apache 2.0.
> [mlc-ai/mlc-llm](https://github.com/mlc-ai/mlc-llm) — 22,000+ GitHub stars

---
Source: https://tokrepo.com/en/workflows/735f5a27-07d6-47ac-8377-e29be76a9452
Author: Script Depot