Candle — Minimalist Machine Learning Framework for Rust
Candle is a Rust-native ML framework focused on inference performance, small binaries, and serverless deployment. It runs Llama, Whisper, Stable Diffusion, and other PyTorch models in pure Rust — no Python required.
Installation agent prête
Cet actif peut être installé après choix du runtime, vérification du plan et exécution de la commande adaptée.
npx -y tokrepo@latest install b113c394-37db-11f1-9bc6-00163e2b0d79 --target codexÀ exécuter après confirmation du plan en dry-run.
What it is
Candle is a Rust-native machine learning framework focused on inference performance, small binaries, and serverless deployment. Built by Hugging Face, it runs Llama, Whisper, Stable Diffusion, and other PyTorch models in pure Rust without requiring Python.
Candle is designed for ML engineers and systems developers who need fast inference with minimal dependencies, especially for edge deployment, WebAssembly targets, or serverless functions where Python runtimes add overhead.
How it saves time or tokens
Candle produces small, self-contained binaries that start in milliseconds compared to Python-based inference servers that load heavy runtimes. No Python dependency chain means no version conflicts, no pip install issues, and predictable builds. CUDA and Metal support provides GPU acceleration on par with PyTorch for supported models.
How to use
- Add Candle to your Rust project:
cargo add candle-core candle-nn candle-transformers
- Run a pre-built example:
cargo run --example llama -- --prompt 'Hello, world'
- Use the tensor API:
use candle_core::{Device, Tensor};
fn main() -> anyhow::Result<()> {
let device = Device::Cpu;
let a = Tensor::randn(0f32, 1., (3, 3), &device)?;
let b = Tensor::randn(0f32, 1., (3, 3), &device)?;
let c = a.matmul(&b)?;
println!("{c}");
Ok(())
}
Example
// Load and run Whisper for speech-to-text
use candle_transformers::models::whisper;
fn transcribe(audio_path: &str) -> anyhow::Result<String> {
let device = Device::Cpu;
let model = whisper::model::Whisper::load(
"openai/whisper-base",
&device,
)?;
let result = model.transcribe(audio_path)?;
Ok(result.text)
}
Related on TokRepo
- AI coding tools — ML and AI development frameworks
- Local LLM tools — running models locally
Common pitfalls
- Expecting full training support: Candle is optimized for inference, not large-scale training
- Not enabling the cuda or metal feature flags for GPU acceleration
- Trying to load PyTorch checkpoints directly without converting to safetensors format first
Questions fréquentes
Candle focuses on inference with small binaries and fast startup. PyTorch is a full training and inference framework with a massive ecosystem. Use Candle for production inference in Rust; use PyTorch for research and training.
Yes. Candle supports CUDA (NVIDIA) and Metal (Apple Silicon) through feature flags. Enable them in your Cargo.toml to use GPU acceleration for tensor operations and model inference.
Candle supports Llama, Mistral, Whisper, Stable Diffusion, BERT, T5, and many other transformer architectures. The candle-transformers crate provides pre-built model implementations.
Yes. Candle's pure Rust implementation allows compilation to WASM for browser-based inference. This enables running ML models directly in the browser without a server.
Candle is maintained by Hugging Face as part of their Rust ML ecosystem. It integrates with the Hugging Face Hub for model downloads and uses the safetensors format for model weights.
Sources citées (3)
- Candle GitHub— Candle Rust ML framework by Hugging Face
- Safetensors GitHub— Safetensors format for model weights
- Hugging Face Blog— Rust machine learning ecosystem
En lien sur TokRepo
Fil de discussion
Actifs similaires
tinygrad — Minimalist Deep Learning Framework
tinygrad is a minimalist deep learning framework in under 10,000 lines of code. It provides a simple, hackable tensor library with automatic differentiation and multi-backend support spanning CPU, GPU, Apple Metal, and custom accelerators.
ggml — Lightweight Tensor Library for Machine Learning in C
ggml is a pure C tensor library optimized for running machine learning models on CPUs and edge devices, providing the foundational compute layer used by llama.cpp, whisper.cpp, and other popular local AI inference tools.
Apache TVM — Open Machine Learning Compiler Framework
A compiler framework that optimizes and deploys machine learning models across CPUs, GPUs, and specialized accelerators with automated performance tuning.
PostgresML — Machine Learning Inside PostgreSQL
PostgresML brings machine learning directly into PostgreSQL, allowing you to train models, run inference, and manage embeddings using SQL. No separate ML infrastructure needed — your database is your ML engine.