Jan — Run AI Models Locally on Your Desktop
Open-source desktop app to run LLMs offline. Jan supports Llama, Mistral, and Gemma models with one-click download, OpenAI-compatible API, and full privacy.
What it is
Jan is an open-source desktop application for running large language models locally on your computer. It provides a ChatGPT-like interface where you browse a model hub, download models (Llama, Mistral, Gemma, and others) with one click, and start chatting immediately. Everything runs on your hardware with no data leaving your machine.
Jan targets developers, researchers, and privacy-conscious users who want to experiment with LLMs without sending data to cloud APIs. It runs on macOS, Windows, and Linux, supporting both CPU and GPU inference.
How it saves time or tokens
Using cloud LLM APIs means paying per token and trusting a third party with your data. Jan eliminates both costs after the initial model download. For experimentation, prototyping, and sensitive data processing, running locally saves API spend entirely. The OpenAI-compatible local API means you can point existing code at localhost:1337 and it works without code changes.
How to use
- Download Jan from jan.ai for your platform (Mac, Windows, Linux).
- Open Jan, go to the Model Hub, and download a model (e.g., Llama 3.1 8B).
- Start chatting in the built-in UI, fully offline.
- Optionally, use the local API:
curl http://localhost:1337/v1/chat/completions \
-H 'Content-Type: application/json' \
-d '{
"model": "llama-3.1-8b",
"messages": [{"role": "user", "content": "Explain transformers briefly"}]
}'
Example
from openai import OpenAI
# Point to Jan's local API
client = OpenAI(base_url='http://localhost:1337/v1', api_key='not-needed')
response = client.chat.completions.create(
model='llama-3.1-8b',
messages=[{'role': 'user', 'content': 'What is retrieval augmented generation?'}]
)
print(response.choices[0].message.content)
This uses the standard OpenAI Python SDK pointed at your local Jan instance. No API key needed, no data sent externally.
Related on TokRepo
- Local LLM Tools -- Compare local LLM runners like Jan, Ollama, and LM Studio
- Local LLM: Jan -- Deep dive into Jan's capabilities
Common pitfalls
- Large models (70B+ parameters) require significant RAM and VRAM. Check model requirements before downloading. Start with 7B-8B parameter models on consumer hardware.
- The OpenAI-compatible API listens on localhost by default. If you need network access, configure the bind address carefully and consider authentication.
- Model download sizes are large (4-50+ GB). Ensure sufficient disk space and a stable connection before starting downloads.
Frequently Asked Questions
Jan runs on any modern computer. For CPU-only inference, 8GB RAM handles 7B models. For GPU acceleration, an NVIDIA GPU with 6GB+ VRAM dramatically improves speed. Apple Silicon Macs use Metal for acceleration.
Yes. Jan runs entirely on your local machine. No telemetry, no data sent to external servers. Models are downloaded once and run offline. Your conversations never leave your computer.
Ollama is CLI-first and optimized for developers. Jan provides a full desktop GUI similar to ChatGPT. Both offer OpenAI-compatible APIs. Choose Jan for a visual experience; choose Ollama for terminal workflows.
Yes. Jan supports NVIDIA CUDA for GPU acceleration. It auto-detects available GPUs and offers GPU layers configuration. AMD ROCm support is also available on Linux.
Jan primarily uses GGUF format models (llama.cpp compatible). The built-in model hub offers pre-configured models. You can also import custom GGUF models from sources like Hugging Face.
Citations (3)
- Jan GitHub— Open-source desktop app for running LLMs locally
- Jan Documentation— OpenAI-compatible local API
- llama.cpp GGUF spec— GGUF model format for local inference
Related on TokRepo
Source & Thanks
Discussion
Related Assets
Claude-Flow — Multi-Agent Orchestration for Claude Code
Layers swarm and hive-mind multi-agent orchestration on top of Claude Code with 64 specialized agents, SQLite memory, and parallel execution.
ccusage — Real-Time Token Cost Tracker for Claude Code
CLI that reads ~/.claude logs and breaks down Claude Code token spend by day, session, and project — pluggable into your statusline.
SuperClaude — Workflow Framework for Claude Code
Adds 16+ slash commands, 9 cognitive personas, and a smart flag system to Claude Code in one pipx install.