# MLC-LLM — Universal LLM Deployment Engine > Deploy any LLM on any hardware — phones, browsers, GPUs, CPUs. Compiles models for native performance on iOS, Android, WebGPU, CUDA, Metal, and Vulkan. 22K+ stars. ## Install Save as a script file and run: ## Quick Use ```bash pip install mlc-llm # Download and run a model mlc_llm chat HF://mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC ``` Or run in browser at [webllm.mlc.ai](https://webllm.mlc.ai). --- ## Intro MLC-LLM is a universal deployment engine that runs large language models natively on any hardware. Using ML compilation (Apache TVM), it compiles LLMs for optimized inference on iOS, Android, WebGPU (browsers), CUDA, Metal, Vulkan, and CPUs. Run Llama, Mistral, Phi, Gemma, and other models at native speed everywhere — from phones to servers. 22,000+ GitHub stars, Apache 2.0. **Best for**: Deploying LLMs on edge devices, mobile apps, browsers, and custom hardware **Works with**: Llama 3, Mistral, Phi, Gemma, Qwen, StableLM, and 50+ models --- ## Key Features ### Universal Hardware Support | Platform | Backend | |----------|---------| | NVIDIA GPU | CUDA | | Apple Silicon | Metal | | AMD GPU | Vulkan/ROCm | | Browsers | WebGPU | | iOS | Metal + Core ML | | Android | Vulkan + OpenCL | | CPU | x86 / ARM | ### WebLLM (Browser) Run LLMs entirely in the browser with WebGPU — no server needed: ```javascript import { CreateMLCEngine } from "@mlc-ai/web-llm"; const engine = await CreateMLCEngine("Llama-3-8B-Instruct-q4f16_1-MLC"); const reply = await engine.chat.completions.create({ messages: [{ role: "user", content: "Hello!" }], }); ``` ### Quantization 4-bit and 3-bit quantization for running large models on consumer hardware. ### OpenAI-Compatible API REST server with OpenAI-compatible endpoints — drop-in replacement. ### Native Performance ML compilation optimizes models for each specific hardware target, achieving near-optimal throughput. --- ### FAQ **Q: What is MLC-LLM?** A: A universal LLM deployment engine that compiles models for native performance on any hardware — phones, browsers, GPUs, and CPUs. 22K+ stars, Apache 2.0. **Q: Can I run Llama 3 on my iPhone?** A: Yes, MLC-LLM compiles Llama 3 (quantized) for iOS with Metal acceleration. There's an iOS app available. --- ## Source & Thanks > Created by [MLC AI](https://github.com/mlc-ai). Licensed under Apache 2.0. > [mlc-ai/mlc-llm](https://github.com/mlc-ai/mlc-llm) — 22,000+ GitHub stars --- Source: https://tokrepo.com/en/workflows/735f5a27-07d6-47ac-8377-e29be76a9452 Author: Script Depot