Esta página se muestra en inglés. Una traducción al español está en curso.
ScriptsMay 24, 2026·3 min de lectura

LM Studio CLI — Run Local LLMs from the Command Line

The official CLI for LM Studio that lets you download, manage, and serve local language models with an OpenAI-compatible API from your terminal.

Listo para agents

Este activo puede ser leído e instalado directamente por agents

TokRepo expone un comando CLI universal, contrato de instalación, metadata JSON, plan según adaptador y contenido raw para que los agents evalúen compatibilidad, riesgo y próximos pasos.

Stage only · 27/100Stage only
Superficie agent
Cualquier agent MCP/CLI
Tipo
CLI Tool
Instalación
Single
Confianza
Confianza: Established
Entrada
LM Studio CLI
Comando CLI universal
npx tokrepo install 0d74e22f-57ae-11f1-9bc6-00163e2b0d79

Introduction

LM Studio CLI (lms) is the official command-line interface for LM Studio, providing terminal-native access to downloading, managing, and serving local language models. It exposes an OpenAI-compatible server, making it straightforward to integrate local LLMs into development workflows, scripts, and AI applications without leaving the terminal.

What LM Studio CLI Does

  • Downloads and manages GGUF and other quantized model files
  • Starts a local inference server with OpenAI-compatible API
  • Lists available models from the LM Studio model catalog
  • Controls the running server (load, unload, status)
  • Supports hardware acceleration on Apple Silicon, NVIDIA, and AMD GPUs

Architecture Overview

The CLI communicates with the LM Studio runtime daemon running locally. When you start a server, it loads the selected model into GPU or CPU memory using the appropriate backend (MLX on Apple Silicon, llama.cpp on other platforms). The server exposes REST endpoints matching the OpenAI Chat Completions and Embeddings APIs, enabling any OpenAI-compatible client to connect.

Self-Hosting & Configuration

  • Install via npx (Node.js) or download the standalone binary
  • Models download to a configurable local directory
  • Server binds to localhost:1234 by default (configurable)
  • GPU layers and context length set via command flags or config file
  • Runs on macOS, Windows, and Linux

Key Features

  • One-command model download with automatic format detection
  • OpenAI-compatible API allows drop-in replacement of cloud models
  • Automatic GPU detection and memory allocation
  • Supports multiple concurrent models on capable hardware
  • Structured JSON output mode for scripting and automation

Comparison with Similar Tools

  • Ollama — similar local LLM serving; LM Studio CLI integrates with the LM Studio desktop ecosystem
  • llama.cpp server — lower-level; LM Studio CLI adds model management and easier setup
  • LocalAI — broader model type support; LM Studio CLI focuses on chat and embedding models
  • GPT4All CLI — similar concept; LM Studio CLI has broader model catalog access

FAQ

Q: Do I need LM Studio desktop app installed? A: The CLI installs the LM Studio runtime automatically. The desktop GUI is optional.

Q: Which model formats are supported? A: GGUF is the primary format. MLX models are supported on Apple Silicon.

Q: Can I use it in CI/CD pipelines? A: Yes. The CLI supports non-interactive mode and can be scripted for automated testing against local models.

Q: How much VRAM do I need? A: Depends on the model. 3B parameter models need roughly 2-3 GB. 7B models need 4-8 GB. CPU inference works with system RAM.

Sources

Discusión

Inicia sesión para unirte a la discusión.
Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados