Local LLM

LM Studio — Desktop GUI for Local LLMs (Windows, Mac, Linux)

LM Studio is the leading desktop GUI for running LLMs locally — built-in model browser, OpenAI-compatible local server, and polished Windows/Mac/Linux experience. The easiest way in for non-terminal users.

Official Site

Why LM Studio

LM Studio is what Ollama would be if it had started as a Windows/Mac application instead of a CLI. You download a .dmg / .exe, double-click, search for a model in the built-in Hugging Face browser, pick a quantization that fits your RAM, and click Load. No terminal, no Docker, no config files — and once loaded you can chat in the app or expose an OpenAI-compatible server on localhost.

For users who come from the ChatGPT desktop app rather than the terminal, LM Studio lowers the activation energy to near-zero. It also ships genuinely useful power features: model benchmarking, preset prompts, RAG over local files, MLX acceleration on Apple Silicon, and a CLI (lms) for automation.

Where Ollama still wins: Linux server deployments, Docker, and developer ergonomics for scripting. Where LM Studio wins: GUI model discovery, explicit quantization picker, and non-developer onboarding. Running both on the same Mac is common — LM Studio for browsing and testing, Ollama as the runtime for developer tools.

Quick Start — Desktop Install and Local Server

The Developer tab exposes the local server — by default it mirrors the OpenAI chat completions endpoint at /v1. The lms CLI ships separately (brew install lmstudio-cli on macOS) and is optional for GUI users. RAG over local docs lives under the Chat tab → "My Documents" since v0.3.

# 1. Install the desktop app
#    https://lmstudio.ai/download  (macOS, Windows, Linux)

# 2. Inside LM Studio:
#    - Open the "Discover" tab → search "Llama 3.2"
#    - Pick a quantization (Q4_K_M is a good default for ~8GB RAM)
#    - Click "Download"
#    - Open the "Chat" tab → select the model → chat

# 3. Start the local server (inside LM Studio → "Developer" tab → "Start Server")
#    Or from the CLI (requires "lms" installed):
lms server start --port 1234

# 4. Use any OpenAI SDK with base_url http://localhost:1234/v1
python - <<'PY'
from openai import OpenAI
client = OpenAI(base_url="http://localhost:1234/v1", api_key="lm-studio")
r = client.chat.completions.create(
    model="lmstudio-community/Llama-3.2-3B-Instruct-GGUF",
    messages=[{"role":"user","content":"One tip for learning Rust?"}],
)
print(r.choices[0].message.content)
PY

# Automation: script model management from the terminal
lms ls                              # list local models
lms load llama-3.2-3b-instruct      # load a specific model
lms unload --all                    # free VRAM

Key Features

Polished desktop app

Native-feeling Windows/macOS/Linux UI. Right-click menus, keyboard shortcuts, tab-based workflow. The app alone attracts users who don’t touch a terminal.

Hugging Face model browser

Built-in search with filters by model family, quantization, RAM requirement, and license. No separate download script or Modelfile — click, go.

Quantization picker

Explicitly choose Q2/Q3/Q4/Q5/Q6/Q8 quants. Shows exact file size and estimated RAM. Easier to reason about trade-offs than auto-picked quantization elsewhere.

MLX acceleration on Apple Silicon

Uses MLX natively on M-series Macs for the fastest token generation available to non-specialists. GGUF models use llama.cpp Metal backend when MLX versions aren’t available.

OpenAI-compatible local server

Same /v1/chat/completions and /v1/embeddings shape as OpenAI. Drop-in for any tool or SDK. Configurable port and CORS.

RAG over local files

Attach PDFs or folders to a chat; LM Studio indexes them locally and retrieves on demand. Useful for quick "ask my notes" use cases without standalone RAG infra.

Comparison

	Primary Interface	Non-developer Fit	Server Deployment	Best For
LM Studiothis	Desktop GUI	Excellent	Local-first (not intended as multi-user server)	Individual users, Windows/Mac
Ollama	CLI + API	Needs terminal	First-class (Docker, systemd)	Developers, servers
Jan	Desktop GUI	Good	Basic	OSS-purist desktop users
GPT4All	Desktop GUI	Very good	Limited	CPU-first desktop users

Use Cases

01. Non-developer desktop AI

Teammates who want ChatGPT-offline on their laptop without touching a terminal. LM Studio is the fastest path to a working setup.

02. Model evaluation before production

Download 5 candidates, load each, compare quality side-by-side in the Chat tab. Faster than scripting via CLI when you don’t know the right model yet.

03. MLX-accelerated inference on Mac

For M3/M4 users, LM Studio’s MLX integration gives the fastest generation without the steep setup curve of raw MLX.

Pricing & License

LM Studio: free for personal use and commercial use per current terms (see lmstudio.ai/legal). Not open source — closed-source binary distribution.

Hardware cost: the app is free. You pay in RAM/VRAM. Realistic baseline: 16GB RAM for 7B models, 32GB+ for 13-34B, 64GB+ for 70B+.

Enterprise: LM Studio offers enterprise licensing with MDM support, offline installation, and SLA. Contact lmstudio.ai for terms.

Related Assets on TokRepo

SillyTavern — LLM Frontend for Power Users

A self-hosted chat interface for interacting with local and cloud LLMs, featuring character cards, group chats, extensions, and advanced prompt management.

Prompt Flow — Build, Test & Deploy LLM Pipelines

Prompt Flow by Microsoft provides a visual editor and CLI for building LLM application workflows with built-in evaluation, tracing, and CI/CD integration for production deployment.

Gorilla — LLM That Writes Accurate API Calls

Gorilla is a fine-tuned LLM from UC Berkeley that generates correct API calls with reduced hallucination. It connects language models to thousands of real-world APIs and tools.

LMMS — Free Cross-Platform Digital Audio Workstation

LMMS (Linux MultiMedia Studio) is a free, open-source digital audio workstation for music production. It includes synthesizers, sample playback, beat sequencing, and an effects chain, providing a complete environment for creating music without any cost.

Frequently Asked Questions

Is LM Studio open source?+

No. LM Studio is a free, closed-source application. This is the main reason some OSS-purist users prefer Jan or Ollama. For most practical purposes (personal use, internal team use) the distinction matters less than the UX difference.

LM Studio vs Ollama?+

LM Studio: GUI-first, closed source, best-in-class desktop UX. Ollama: CLI-first, MIT open source, better for automation and servers. Use LM Studio for individual interactive use; use Ollama when you need Docker, a shared server, or scripting.

Can I expose LM Studio’s server to other machines?+

Yes — uncheck "Localhost only" in the Developer tab and pick a bind address. Only do this on a trusted network; there is no built-in auth. For shared use, Ollama with a reverse proxy is a safer pattern.

Does LM Studio support the same models as Ollama?+

Largely yes — both use GGUF models from Hugging Face. LM Studio adds MLX support on Apple Silicon which Ollama does not (yet). Ollama has a curated library; LM Studio lets you search all of Hugging Face.

Does LM Studio train or fine-tune?+

No — inference only. For fine-tuning look at Axolotl, Unsloth, or MLX-LM on Mac. LM Studio’s scope is "run pre-trained models well".

Is there a CLI?+

Yes, the lms CLI ships separately — install via npm, brew, or the LM Studio app itself. Covers model management (ls, load, unload), server control (server start/stop), and chat streaming. Useful for scripting or headless machines.

Compare Alternatives

Ollama — Run LLMs Locally with One Command (2026 Guide)Jan — Open-source ChatGPT Alternative That Runs Offline GPT4All — Privacy-First Desktop LLM App by Nomic AI llama.cpp — The C++ Engine Under Ollama, LM Studio, and Most Local LLMs