Is llama.cpp — Run LLMs Locally in Pure C/C++ free to use?

Yes. llama.cpp — Run LLMs Locally in Pure C/C++ is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install llama.cpp — Run LLMs Locally in Pure C/C++?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

Cette page est affichée en anglais. Une traduction française est en cours.

ScriptsMar 31, 2026·2 min de lecture

llama.cpp — Run LLMs Locally in Pure C/C++

Name: llama.cpp — Run LLMs Locally in Pure C/C++
Author: Script Depot

llama.cpp is a C/C++ LLM inference engine with 100K+ GitHub stars. Runs on CPU, Apple Silicon, NVIDIA, AMD GPUs. 1.5-8 bit quantization, no dependencies, supports 50+ model architectures. MIT licensed

Script Depot · Community

Introduction

llama.cpp is a plain C/C++ implementation of LLM inference with zero dependencies, enabling efficient model execution across diverse hardware. With 100,000+ GitHub stars and MIT license, it is the most popular local LLM inference engine. llama.cpp supports Apple Silicon (Metal), NVIDIA (CUDA), AMD (HIP), Intel, Vulkan, and CPU inference. It provides 1.5-8 bit quantization for faster inference with smaller models, supports 50+ model architectures (LLaMA, Mistral, Qwen, Gemma, Phi, and more), and includes an OpenAI-compatible API server.

Best for: Developers running LLMs locally on any hardware without cloud dependencies Works with: Claude Code, OpenAI Codex, Cursor, Gemini CLI, Windsurf Hardware: CPU, Apple Silicon, NVIDIA, AMD, Intel, Vulkan, RISC-V

Key Features

Zero dependencies: Pure C/C++ with no external libraries required
Universal hardware: CPU, Apple Metal, CUDA, HIP, Vulkan, SYCL, RISC-V
Multi-bit quantization: 1.5 to 8-bit for speed/quality tradeoff
50+ model architectures: LLaMA, Mistral, Qwen, Gemma, Phi, multimodal models
OpenAI-compatible server: Drop-in replacement for local inference
CPU+GPU hybrid: Split models across CPU and GPU memory
GGUF format: Standard model format used across the ecosystem

FAQ

Q: What is llama.cpp? A: llama.cpp is a C/C++ LLM inference engine with 100K+ stars. Zero dependencies, runs on any hardware (CPU, Apple Silicon, NVIDIA, AMD), supports 50+ model architectures with 1.5-8 bit quantization. MIT licensed.

Q: How do I install llama.cpp? A: brew install llama.cpp on macOS/Linux, or build from source with cmake -B build && cmake --build build. Download GGUF models from Hugging Face.

🙏

Source et remerciements

Created by Georgi Gerganov. Licensed under MIT. ggml-org/llama.cpp — 100,000+ GitHub stars

Discussion

Connectez-vous pour rejoindre la discussion.

Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

◈Accueil 🔍Rechercher 👤Moi

llama.cpp — Run LLMs Locally in Pure C/C++

Key Features

FAQ

Source et remerciements

Discussion

Actifs similaires

Unkey — Open-Source API Key Management Platform

Flagsmith — Open-Source Feature Flags and Remote Config

OpenStatus — Open-Source Monitoring and Status Page Platform