# Llamafile — Run AI Models as Single Executables > Package and run LLMs as single portable executables. Llamafile bundles model weights with llama.cpp into one file that runs on any OS without installation. ## Install Save as a script file and run: ## Quick Use ```bash # Download a llamafile (model + runtime in one file) curl -LO https://huggingface.co/Mozilla/llamafile/resolve/main/llava-v1.5-7b-q4.llamafile chmod +x llava-v1.5-7b-q4.llamafile ./llava-v1.5-7b-q4.llamafile # Opens browser at http://localhost:8080 — ready to chat ``` ## What is Llamafile? Llamafile packages LLMs into single executable files that run on any operating system. Built on llama.cpp and Cosmopolitan Libc, a llamafile is one file that contains both the model weights and inference engine. Download, make executable, run — no Python, no Docker, no dependencies. It works on Windows, macOS, Linux, FreeBSD, and even OpenBSD. **Answer-Ready**: Llamafile packages LLMs into single portable executables. One file runs on any OS — no Python, no Docker, no dependencies. Built by Mozilla on llama.cpp + Cosmopolitan Libc. Includes web UI and OpenAI-compatible API. 22k+ GitHub stars. **Best for**: Developers wanting zero-setup local AI inference. **Works with**: Any OpenAI-compatible tool, Claude Code (as local backend). **Setup time**: Under 1 minute. ## Core Features ### 1. Zero Dependencies ```bash # That's it. No pip, no conda, no brew. ./mistral-7b.llamafile --server --port 8080 ``` ### 2. OpenAI-Compatible API ```python from openai import OpenAI client = OpenAI(base_url="http://localhost:8080/v1", api_key="not-needed") response = client.chat.completions.create( model="local", messages=[{"role": "user", "content": "Hello"}], ) print(response.choices[0].message.content) ``` ### 3. Build Your Own Llamafile ```bash # Package any GGUF model into a llamafile llamafile-pack -o my-model.llamafile my-model.gguf ``` ### 4. GPU Acceleration | Platform | Acceleration | |----------|-------------| | NVIDIA | CUDA (auto-detected) | | Apple Silicon | Metal (auto-detected) | | AMD | ROCm support | | CPU | AVX/AVX2/AVX-512 | ## Llamafile vs Alternatives | Feature | Llamafile | Ollama | Jan | LM Studio | |---------|-----------|--------|-----|-----------| | Single file | Yes | No (service) | No (app) | No (app) | | No dependencies | Yes | Docker/binary | Electron | Electron | | Cross-OS portable | Yes (same file) | Per-OS binary | Per-OS app | Per-OS app | | Web UI included | Yes | No | Yes | Yes | | API | OpenAI-compat | OpenAI-compat | OpenAI-compat | OpenAI-compat | ## FAQ **Q: How big are llamafiles?** A: Same as the model weights — a 7B Q4 model is ~4GB. The runtime adds <10MB overhead. **Q: Can I use GPU acceleration?** A: Yes, CUDA and Metal are auto-detected. Pass `--n-gpu-layers 999` to offload all layers. **Q: Who maintains it?** A: Mozilla's Innovation team, built by Justine Tunney (creator of Cosmopolitan Libc). ## Source & Thanks > Created by [Mozilla](https://github.com/Mozilla-Ocho). Licensed under Apache 2.0. > > [Mozilla-Ocho/llamafile](https://github.com/Mozilla-Ocho/llamafile) — 22k+ stars ## 快速使用 ```bash curl -LO .llamafile && chmod +x && ./ ``` 下载一个文件,直接运行 AI 模型,无需安装任何依赖。 ## 什么是 Llamafile? Llamafile 将 LLM 打包为单个可执行文件,任何操作系统直接运行。Mozilla 出品,基于 llama.cpp + Cosmopolitan Libc。 **一句话总结**:将 LLM 打包为单文件可执行程序,跨平台零依赖运行,内置 Web UI 和 OpenAI 兼容 API,Mozilla 出品,22k+ stars。 **适合人群**:需要零配置本地 AI 推理的开发者。 ## 核心功能 ### 1. 零依赖 一个文件,无需 Python/Docker/包管理器。 ### 2. 跨平台 同一文件在 Windows/macOS/Linux 运行。 ### 3. GPU 加速 自动检测 CUDA/Metal。 ## 常见问题 **Q: 文件多大?** A: 等于模型权重大小,7B Q4 约 4GB。 **Q: 谁维护?** A: Mozilla 创新团队。 ## 来源与致谢 > [Mozilla-Ocho/llamafile](https://github.com/Mozilla-Ocho/llamafile) — 22k+ stars, Apache 2.0 --- Source: https://tokrepo.com/en/workflows/83ea12ae-8576-474f-b1ec-4ddbe0dd1804 Author: AI Open Source