Scripts2026年5月24日·1 分钟阅读

LM Studio CLI — Run Local LLMs from the Command Line

The official CLI for LM Studio that lets you download, manage, and serve local language models with an OpenAI-compatible API from your terminal.

Agent 就绪

这个资产会安全暂存

这个资产会先安全暂存。复制的指令会要求 Agent 读取暂存文件,并在激活脚本、MCP 配置或全局配置前先确认。

Stage only · 27/100策略:需暂存
Agent 入口
任意 MCP/CLI Agent
类型
CLI Tool
安装
Single
信任
信任等级:Established
入口
LM Studio CLI
安全暂存命令
npx -y tokrepo@latest install 0d74e22f-57ae-11f1-9bc6-00163e2b0d79 --target codex

先暂存文件;激活前需要读取暂存 README 和安装计划。

Introduction

LM Studio CLI (lms) is the official command-line interface for LM Studio, providing terminal-native access to downloading, managing, and serving local language models. It exposes an OpenAI-compatible server, making it straightforward to integrate local LLMs into development workflows, scripts, and AI applications without leaving the terminal.

What LM Studio CLI Does

  • Downloads and manages GGUF and other quantized model files
  • Starts a local inference server with OpenAI-compatible API
  • Lists available models from the LM Studio model catalog
  • Controls the running server (load, unload, status)
  • Supports hardware acceleration on Apple Silicon, NVIDIA, and AMD GPUs

Architecture Overview

The CLI communicates with the LM Studio runtime daemon running locally. When you start a server, it loads the selected model into GPU or CPU memory using the appropriate backend (MLX on Apple Silicon, llama.cpp on other platforms). The server exposes REST endpoints matching the OpenAI Chat Completions and Embeddings APIs, enabling any OpenAI-compatible client to connect.

Self-Hosting & Configuration

  • Install via npx (Node.js) or download the standalone binary
  • Models download to a configurable local directory
  • Server binds to localhost:1234 by default (configurable)
  • GPU layers and context length set via command flags or config file
  • Runs on macOS, Windows, and Linux

Key Features

  • One-command model download with automatic format detection
  • OpenAI-compatible API allows drop-in replacement of cloud models
  • Automatic GPU detection and memory allocation
  • Supports multiple concurrent models on capable hardware
  • Structured JSON output mode for scripting and automation

Comparison with Similar Tools

  • Ollama — similar local LLM serving; LM Studio CLI integrates with the LM Studio desktop ecosystem
  • llama.cpp server — lower-level; LM Studio CLI adds model management and easier setup
  • LocalAI — broader model type support; LM Studio CLI focuses on chat and embedding models
  • GPT4All CLI — similar concept; LM Studio CLI has broader model catalog access

FAQ

Q: Do I need LM Studio desktop app installed? A: The CLI installs the LM Studio runtime automatically. The desktop GUI is optional.

Q: Which model formats are supported? A: GGUF is the primary format. MLX models are supported on Apple Silicon.

Q: Can I use it in CI/CD pipelines? A: Yes. The CLI supports non-interactive mode and can be scripted for automated testing against local models.

Q: How much VRAM do I need? A: Depends on the model. 3B parameter models need roughly 2-3 GB. 7B models need 4-8 GB. CPU inference works with system RAM.

Sources

讨论

登录后参与讨论。
还没有评论,来写第一条吧。

相关资产