Cette page est affichée en anglais. Une traduction française est en cours.
ScriptsMar 31, 2026·2 min de lecture

llama.cpp — Run LLMs Locally in Pure C/C++

llama.cpp is a C/C++ LLM inference engine with 100K+ GitHub stars. Runs on CPU, Apple Silicon, NVIDIA, AMD GPUs. 1.5-8 bit quantization, no dependencies, supports 50+ model architectures. MIT licensed

Introduction

llama.cpp is a plain C/C++ implementation of LLM inference with zero dependencies, enabling efficient model execution across diverse hardware. With 100,000+ GitHub stars and MIT license, it is the most popular local LLM inference engine. llama.cpp supports Apple Silicon (Metal), NVIDIA (CUDA), AMD (HIP), Intel, Vulkan, and CPU inference. It provides 1.5-8 bit quantization for faster inference with smaller models, supports 50+ model architectures (LLaMA, Mistral, Qwen, Gemma, Phi, and more), and includes an OpenAI-compatible API server.

Best for: Developers running LLMs locally on any hardware without cloud dependencies Works with: Claude Code, OpenAI Codex, Cursor, Gemini CLI, Windsurf Hardware: CPU, Apple Silicon, NVIDIA, AMD, Intel, Vulkan, RISC-V


Key Features

  • Zero dependencies: Pure C/C++ with no external libraries required
  • Universal hardware: CPU, Apple Metal, CUDA, HIP, Vulkan, SYCL, RISC-V
  • Multi-bit quantization: 1.5 to 8-bit for speed/quality tradeoff
  • 50+ model architectures: LLaMA, Mistral, Qwen, Gemma, Phi, multimodal models
  • OpenAI-compatible server: Drop-in replacement for local inference
  • CPU+GPU hybrid: Split models across CPU and GPU memory
  • GGUF format: Standard model format used across the ecosystem

FAQ

Q: What is llama.cpp? A: llama.cpp is a C/C++ LLM inference engine with 100K+ stars. Zero dependencies, runs on any hardware (CPU, Apple Silicon, NVIDIA, AMD), supports 50+ model architectures with 1.5-8 bit quantization. MIT licensed.

Q: How do I install llama.cpp? A: brew install llama.cpp on macOS/Linux, or build from source with cmake -B build && cmake --build build. Download GGUF models from Hugging Face.


🙏

Source et remerciements

Created by Georgi Gerganov. Licensed under MIT. ggml-org/llama.cpp — 100,000+ GitHub stars

Discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires