Introduction
Candle is Hugging Face's answer to "what if PyTorch was Rust-native?" It's a minimalist ML framework written entirely in Rust, designed for production inference: small binaries, low memory, easy WASM/serverless deployment, and fast startup — all advantages over Python-based stacks.
With over 20,000 GitHub stars, Candle ships reference implementations of Llama, Mistral, Qwen, Whisper, Stable Diffusion, BERT, and dozens of other models. It supports CUDA, Metal, MKL, and CPU backends.
What Candle Does
Candle provides PyTorch-like tensors and nn modules in Rust. The candle-transformers crate has reference implementations of popular architectures. Models load from safetensors files (the same format Hugging Face uses), so you can take any HF model checkpoint and run it from a Rust binary with no Python dependencies.
Architecture Overview
Rust app
|
candle-core (Tensor, autograd, devices)
candle-nn (Linear, LayerNorm, Embedding, ...)
candle-transformers (model architectures: Llama, Qwen, Whisper, ...)
|
Backend choice:
CPU (MKL / accelerate / pure Rust)
CUDA (NVIDIA)
Metal (Apple Silicon)
WebGPU (browser)
|
Model weights: safetensors / GGUF
|
Deployment:
Standalone binary (small)
WASM module (browser)
Serverless (Lambda, Cloudflare Workers)Self-Hosting & Configuration
// Run Llama-style inference from Rust
use candle_core::{Device, Tensor};
use candle_transformers::models::llama::{Llama, Config, Cache};
use hf_hub::api::sync::Api;
fn main() -> anyhow::Result<()> {
let device = Device::cuda_if_available(0)?;
let api = Api::new()?;
let repo = api.model("meta-llama/Llama-3.2-1B".into());
let weights = repo.get("model.safetensors")?;
// ... load Config + tokenizer, build Llama, generate tokens
Ok(())
}# Cargo.toml — choose a backend feature
[dependencies]
candle-core = { version = "0.6", features = ["cuda"] }
candle-nn = { version = "0.6", features = ["cuda"] }
candle-transformers = "0.6"
# Or features = ["metal"] for Apple Silicon
# Or features = ["accelerate"] for macOS Accelerate frameworkKey Features
- Pure Rust — no Python, no PyTorch — easy to embed in any Rust binary
- Multi-backend — CPU, CUDA, Metal, MKL, accelerate
- PyTorch-like API — Tensor, nn modules, autograd familiar to PyTorch users
- Reference models — Llama, Mistral, Qwen, Whisper, SD, BERT, ViT
- safetensors / GGUF support — load existing HF weights or quantized models
- WASM / WebGPU — run models in browsers
- Serverless friendly — small binaries, fast cold start
- First-party HF integration — pull models via
hf-hubcrate
Comparison with Similar Tools
| Feature | Candle | tch-rs | Burn | ONNX Runtime | llama.cpp |
|---|---|---|---|---|---|
| Language | Rust (native) | Rust (libtorch FFI) | Rust (native) | C++ + bindings | C/C++ |
| Python required | No | No | No | No | No |
| Backend | CPU/CUDA/Metal | CUDA/Metal via libtorch | CPU/CUDA/Metal/WebGPU | Many | CPU/CUDA/Metal |
| Training | Yes | Yes | Yes | No | No |
| Model breadth | Many (HF) | Any PyTorch model | Growing | ONNX zoo | Llama-family |
| Best For | Rust-native AI inference | PyTorch from Rust | Pure Rust deep learning | Cross-platform inference | Local LLMs |
FAQ
Q: Candle vs llama.cpp? A: llama.cpp is a focused C++ implementation of Llama-family models. Candle is a general Rust ML framework — broader model support, training capability, and Rust ecosystem integration. llama.cpp wins for pure CPU inference of supported models.
Q: Why pick Candle over PyTorch? A: Smaller deployment footprint, Rust-native (no Python runtime), ideal for Lambda/Workers/embedded. PyTorch wins for training and research; Candle wins for production inference in Rust ecosystems.
Q: Does it support training? A: Yes, basic training (autograd, optimizers, common modules). For large-scale distributed training, PyTorch + DeepSpeed is still more mature.
Q: What about model conversion?
A: Candle reads safetensors directly. For PyTorch checkpoints, convert with safetensors's Python tool. GGUF (quantized) is supported for Llama-family models.
Sources
- GitHub: https://github.com/huggingface/candle
- Company: Hugging Face
- License: Apache-2.0