Esta página se muestra en inglés. Una traducción al español está en curso.
ScriptsJun 1, 2026·3 min de lectura

LLMFit — Find Which LLM Runs on Your Hardware

A Rust CLI that scans your system specs and matches them against hundreds of models and providers to tell you what you can run locally.

Listo para agents

Instalación lista para agent

Este activo puede instalarse después de elegir el runtime, revisar el plan y ejecutar el comando correspondiente.

Native · 98/100Política: permitir
Superficie agent
Cualquier agent MCP/CLI
Tipo
Skill
Instalación
Single
Confianza
Confianza: Established
Entrada
LLMFit Overview
Comando de instalación directa
npx -y tokrepo@latest install a73d28f2-5df6-11f1-9bc6-00163e2b0d79 --target codex

Ejecutar después de confirmar el plan con dry-run.

Introduction

LLMFit is a single-command Rust CLI that inspects your CPU, GPU, and RAM, then tells you exactly which large language models you can run locally. It supports hundreds of models across GGUF, SafeTensors, MLX, and Unsloth formats, removing the guesswork from local AI deployment.

What LLMFit Does

  • Detects available GPU VRAM, system RAM, and compute capabilities automatically
  • Matches hardware profile against a curated registry of models and providers
  • Recommends quantization levels (Q4, Q5, Q8, FP16) that fit within your memory budget
  • Supports NVIDIA CUDA, Apple Metal/MLX, AMD ROCm, and CPU-only setups
  • Outputs results as a ranked table or JSON for scripting

Architecture Overview

LLMFit is a single statically-linked Rust binary with zero runtime dependencies. On launch it probes GPU and system info via platform APIs (CUDA, Metal, sysinfo), loads its model registry from an embedded catalog (updated via llmfit update), and runs a constraint solver to match model memory requirements against available resources. Results are streamed to stdout in either a human-readable table or structured JSON.

Self-Hosting & Configuration

  • Install via cargo install llmfit or download a prebuilt binary from GitHub Releases
  • No server component or daemon required — purely a local CLI tool
  • Update the model registry: llmfit update
  • Override detected VRAM with --vram 24GB for planning on different hardware
  • Filter results by provider, format, or model family with CLI flags

Key Features

  • Single binary, zero dependencies — runs on Linux, macOS, and Windows
  • Covers GGUF (llama.cpp), SafeTensors (Hugging Face), MLX (Apple), and Unsloth formats
  • Hardware auto-detection for NVIDIA, AMD, Apple Silicon, and CPU
  • JSON output mode for CI/CD pipeline integration
  • Frequently updated model catalog with community contributions

Comparison with Similar Tools

  • Ollama — runtime that pulls and serves models; LLMFit only advises what fits, does not serve
  • GPT4All — bundled desktop app with limited model selection; LLMFit covers broader registries
  • LM Studio — GUI-based model browser; LLMFit is headless and scriptable
  • candle — Rust inference library; LLMFit is a recommendation tool, not an inference engine

FAQ

Q: Does LLMFit download or run models? A: No. It only scans hardware and recommends compatible models. You still need a runtime like llama.cpp or Ollama to actually run them.

Q: How often is the model catalog updated? A: The embedded catalog ships with each release. Run llmfit update to pull the latest catalog between releases.

Q: Does it support multi-GPU setups? A: Yes. LLMFit detects all available GPUs and can recommend models that fit across combined VRAM.

Q: Is it free and open source? A: Yes. LLMFit is MIT-licensed and fully open source.

Sources

Discusión

Inicia sesión para unirte a la discusión.
Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados