ScriptsMay 24, 2026·3 min read

LM Studio CLI — Run Local LLMs from the Command Line

The official CLI for LM Studio that lets you download, manage, and serve local language models with an OpenAI-compatible API from your terminal.

Agent ready

This asset can be read and installed directly by agents

TokRepo exposes a universal CLI command, install contract, metadata JSON, adapter-aware plan, and raw content links so agents can judge fit, risk, and next actions.

Stage only · 27/100Stage only
Agent surface
Any MCP/CLI agent
Kind
CLI Tool
Install
Single
Trust
Trust: Established
Entrypoint
LM Studio CLI
Universal CLI install command
npx tokrepo install 0d74e22f-57ae-11f1-9bc6-00163e2b0d79

Introduction

LM Studio CLI (lms) is the official command-line interface for LM Studio, providing terminal-native access to downloading, managing, and serving local language models. It exposes an OpenAI-compatible server, making it straightforward to integrate local LLMs into development workflows, scripts, and AI applications without leaving the terminal.

What LM Studio CLI Does

  • Downloads and manages GGUF and other quantized model files
  • Starts a local inference server with OpenAI-compatible API
  • Lists available models from the LM Studio model catalog
  • Controls the running server (load, unload, status)
  • Supports hardware acceleration on Apple Silicon, NVIDIA, and AMD GPUs

Architecture Overview

The CLI communicates with the LM Studio runtime daemon running locally. When you start a server, it loads the selected model into GPU or CPU memory using the appropriate backend (MLX on Apple Silicon, llama.cpp on other platforms). The server exposes REST endpoints matching the OpenAI Chat Completions and Embeddings APIs, enabling any OpenAI-compatible client to connect.

Self-Hosting & Configuration

  • Install via npx (Node.js) or download the standalone binary
  • Models download to a configurable local directory
  • Server binds to localhost:1234 by default (configurable)
  • GPU layers and context length set via command flags or config file
  • Runs on macOS, Windows, and Linux

Key Features

  • One-command model download with automatic format detection
  • OpenAI-compatible API allows drop-in replacement of cloud models
  • Automatic GPU detection and memory allocation
  • Supports multiple concurrent models on capable hardware
  • Structured JSON output mode for scripting and automation

Comparison with Similar Tools

  • Ollama — similar local LLM serving; LM Studio CLI integrates with the LM Studio desktop ecosystem
  • llama.cpp server — lower-level; LM Studio CLI adds model management and easier setup
  • LocalAI — broader model type support; LM Studio CLI focuses on chat and embedding models
  • GPT4All CLI — similar concept; LM Studio CLI has broader model catalog access

FAQ

Q: Do I need LM Studio desktop app installed? A: The CLI installs the LM Studio runtime automatically. The desktop GUI is optional.

Q: Which model formats are supported? A: GGUF is the primary format. MLX models are supported on Apple Silicon.

Q: Can I use it in CI/CD pipelines? A: Yes. The CLI supports non-interactive mode and can be scripted for automated testing against local models.

Q: How much VRAM do I need? A: Depends on the model. 3B parameter models need roughly 2-3 GB. 7B models need 4-8 GB. CPU inference works with system RAM.

Sources

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets