How do I install Candle — Minimalist Machine Learning Framework for Rust?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

Candle — Minimalist Machine Learning Framework for Rust

Introduction

Candle is Hugging Face's answer to "what if PyTorch was Rust-native?" It's a minimalist ML framework written entirely in Rust, designed for production inference: small binaries, low memory, easy WASM/serverless deployment, and fast startup — all advantages over Python-based stacks.

With over 20,000 GitHub stars, Candle ships reference implementations of Llama, Mistral, Qwen, Whisper, Stable Diffusion, BERT, and dozens of other models. It supports CUDA, Metal, MKL, and CPU backends.

What Candle Does

Candle provides PyTorch-like tensors and nn modules in Rust. The candle-transformers crate has reference implementations of popular architectures. Models load from safetensors files (the same format Hugging Face uses), so you can take any HF model checkpoint and run it from a Rust binary with no Python dependencies.

Architecture Overview

Rust app
      |
candle-core         (Tensor, autograd, devices)
candle-nn           (Linear, LayerNorm, Embedding, ...)
candle-transformers (model architectures: Llama, Qwen, Whisper, ...)
      |
Backend choice:
  CPU (MKL / accelerate / pure Rust)
  CUDA (NVIDIA)
  Metal (Apple Silicon)
  WebGPU (browser)
      |
Model weights: safetensors / GGUF
      |
Deployment:
  Standalone binary (small)
  WASM module (browser)
  Serverless (Lambda, Cloudflare Workers)

Self-Hosting & Configuration

// Run Llama-style inference from Rust
use candle_core::{Device, Tensor};
use candle_transformers::models::llama::{Llama, Config, Cache};
use hf_hub::api::sync::Api;

fn main() -> anyhow::Result<()> {
    let device = Device::cuda_if_available(0)?;
    let api = Api::new()?;
    let repo = api.model("meta-llama/Llama-3.2-1B".into());
    let weights = repo.get("model.safetensors")?;
    // ... load Config + tokenizer, build Llama, generate tokens
    Ok(())
}

# Cargo.toml — choose a backend feature
[dependencies]
candle-core = { version = "0.6", features = ["cuda"] }
candle-nn = { version = "0.6", features = ["cuda"] }
candle-transformers = "0.6"
# Or features = ["metal"] for Apple Silicon
# Or features = ["accelerate"] for macOS Accelerate framework

Key Features

Pure Rust — no Python, no PyTorch — easy to embed in any Rust binary
Multi-backend — CPU, CUDA, Metal, MKL, accelerate
PyTorch-like API — Tensor, nn modules, autograd familiar to PyTorch users
Reference models — Llama, Mistral, Qwen, Whisper, SD, BERT, ViT
safetensors / GGUF support — load existing HF weights or quantized models
WASM / WebGPU — run models in browsers
Serverless friendly — small binaries, fast cold start
First-party HF integration — pull models via hf-hub crate

Comparison with Similar Tools

Feature	Candle	tch-rs	Burn	ONNX Runtime	llama.cpp
Language	Rust (native)	Rust (libtorch FFI)	Rust (native)	C++ + bindings	C/C++
Python required	No	No	No	No	No
Backend	CPU/CUDA/Metal	CUDA/Metal via libtorch	CPU/CUDA/Metal/WebGPU	Many	CPU/CUDA/Metal
Training	Yes	Yes	Yes	No	No
Model breadth	Many (HF)	Any PyTorch model	Growing	ONNX zoo	Llama-family
Best For	Rust-native AI inference	PyTorch from Rust	Pure Rust deep learning	Cross-platform inference	Local LLMs

FAQ

Q: Candle vs llama.cpp? A: llama.cpp is a focused C++ implementation of Llama-family models. Candle is a general Rust ML framework — broader model support, training capability, and Rust ecosystem integration. llama.cpp wins for pure CPU inference of supported models.

Q: Why pick Candle over PyTorch? A: Smaller deployment footprint, Rust-native (no Python runtime), ideal for Lambda/Workers/embedded. PyTorch wins for training and research; Candle wins for production inference in Rust ecosystems.

Q: Does it support training? A: Yes, basic training (autograd, optimizers, common modules). For large-scale distributed training, PyTorch + DeepSpeed is still more mature.

Q: What about model conversion? A: Candle reads safetensors directly. For PyTorch checkpoints, convert with safetensors's Python tool. GGUF (quantized) is supported for Llama-family models.

Sources

GitHub: https://github.com/huggingface/candle
Company: Hugging Face
License: Apache-2.0

Candle — Minimalist Machine Learning Framework for Rust

Introduction

What Candle Does

Architecture Overview

Self-Hosting & Configuration

Key Features

Comparison with Similar Tools

FAQ

Sources

Discussion

Related Assets

Verba — The Golden RAGtriever by Weaviate

NVIDIA Triton Inference Server — Multi-Framework Model Serving at Scale

text-generation-webui — A Gradio Web UI for Local LLMs