Esta página se muestra en inglés. Una traducción al español está en curso.
ScriptsMay 4, 2026·2 min de lectura

LLMFit — Find What Models Run on Your Hardware

A Rust CLI that scans your system specs and matches them against hundreds of LLM models and providers to tell you exactly what you can run locally.

Introduction

LLMFit is an open-source Rust CLI that detects your hardware capabilities and recommends which LLM models you can run locally. It eliminates the guesswork of matching GPU VRAM, RAM, and compute power to specific model sizes and quantization levels.

What LLMFit Does

  • Scans system hardware (GPU VRAM, RAM, CPU cores, disk space)
  • Matches against a registry of hundreds of models across providers
  • Recommends optimal quantization formats (GGUF, MLX, GPTQ) per model
  • Filters by provider compatibility (Ollama, llama.cpp, vLLM, MLX)
  • Outputs structured JSON for scripting and automation

Architecture Overview

LLMFit is a single Rust binary with no runtime dependencies. It queries system hardware via platform-native APIs (NVML for NVIDIA, Metal for Apple Silicon), then cross-references a bundled model registry that maps each model variant to its memory and compute requirements. The registry is updated via a simple pull mechanism from the upstream repository.

Self-Hosting & Configuration

  • Install via cargo or download prebuilt binaries from GitHub Releases
  • No external services or API keys required
  • Configure custom model registries via TOML files
  • Supports offline operation with bundled model database
  • Works on Linux, macOS (Intel and Apple Silicon), and Windows

Key Features

  • Zero-dependency single binary written in Rust
  • Supports NVIDIA, AMD, Apple Silicon, and CPU-only configurations
  • Recommends specific quantization levels per available VRAM
  • Integrates with Ollama, llama.cpp, MLX, and vLLM ecosystems
  • Extensible model registry with community contributions

Comparison with Similar Tools

  • Ollama — runs models but does not pre-assess hardware compatibility
  • LM Studio — GUI-based model browser without CLI automation
  • GPT4All — bundled models with limited hardware-aware recommendations
  • LocalAI — serving platform, not a hardware assessment tool
  • llama.cpp — inference engine requiring manual model selection

FAQ

Q: Does LLMFit download or run models? A: No. It only scans hardware and recommends models. You use your preferred runtime to actually download and serve them.

Q: How is the model registry kept up to date? A: The registry ships with the binary and can be updated via llmfit update. Community PRs add new models regularly.

Q: Does it support multi-GPU setups? A: Yes. It detects all available GPUs and calculates aggregate VRAM for split-model recommendations.

Q: What quantization formats does it cover? A: GGUF (Q4, Q5, Q8), MLX 4-bit, GPTQ, AWQ, and full-precision variants.

Sources

Discusión

Inicia sesión para unirte a la discusión.
Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados