ConfigsApr 2, 2026·3 min read

Ollama — Run LLMs Locally with One Command

Get up and running with Llama, Mistral, Gemma, DeepSeek and 100+ models locally. Simple CLI, OpenAI-compatible API. 95K+ stars.

TO
TokRepo精选 · Community
Quick Use

Use it first, then decide how deep to go

This block should tell both the user and the agent what to copy, install, and apply first.

```bash # Install (macOS/Linux) curl -fsSL https://ollama.com/install.sh | sh # Run a model ollama run llama3.2 ollama run deepseek-r1 ollama run gemma2 ``` ```bash # Use the OpenAI-compatible API curl http://localhost:11434/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{"model": "llama3.2", "messages": [{"role": "user", "content": "Hello!"}]}' ``` Also available on Windows (installer) and Docker: `docker run -d -p 11434:11434 ollama/ollama`
## Introduction Ollama is the **simplest way to run large language models on your own machine**. It packages model weights, tokenizer, and runtime into a single command, making local AI accessible to anyone with a computer. Core capabilities: - **One-Command Setup** — `ollama run llama3.2` downloads and starts a model in seconds. No Python environment, no CUDA setup, no config files - **100+ Models** — Llama 3.2, DeepSeek R1, Gemma 2, Mistral, Phi-3, Qwen, CodeLlama, and many more from the Ollama library - **OpenAI-Compatible API** — Drop-in replacement for OpenAI's API at `localhost:11434/v1/`. Works with any OpenAI SDK client - **GPU Acceleration** — Automatic CUDA, ROCm, and Metal (Apple Silicon) detection. Falls back to CPU gracefully - **Modelfile System** — Create custom models with system prompts, parameters, and adapters using Dockerfile-like syntax - **Multi-Model Serving** — Run multiple models simultaneously with automatic memory management - **Import & Convert** — Import models from GGUF, SafeTensors, or PyTorch formats 95,000+ GitHub stars. The de facto standard for local LLM inference, powering Open WebUI, Continue, Aider, and hundreds of other tools. ## FAQ **Q: How much RAM/VRAM do I need?** A: 7B models need ~4GB RAM (quantized). 13B needs ~8GB. 70B needs ~40GB. Apple Silicon Macs are great because they share CPU/GPU memory. For GPU: any NVIDIA card with 6GB+ VRAM runs 7B models well. **Q: How does Ollama compare to llama.cpp?** A: Ollama is built on llama.cpp but adds model management, an API server, and a model library. Think of it as Docker for LLMs — llama.cpp is the engine, Ollama is the complete platform. **Q: Can I use it with Claude Code or other AI tools?** A: Yes. Any tool that supports OpenAI-compatible APIs can use Ollama. Set the base URL to `http://localhost:11434/v1/` and use the model name as the model ID. **Q: Is it free?** A: 100% free and open source (MIT). No API keys, no usage limits, no data sent anywhere. Everything runs on your hardware. ## Works With - Open WebUI for chat interface - Continue / Aider / Cursor for AI coding - LangChain / LlamaIndex / any OpenAI-compatible SDK - Docker / Kubernetes for deployment - Apple Silicon / NVIDIA / AMD GPUs
🙏

Source & Thanks

- GitHub: [ollama/ollama](https://github.com/ollama/ollama) - License: MIT - Stars: 95,000+ - Maintainer: Ollama team Thanks to the Ollama team for democratizing local LLM access, making it as easy to run a language model as it is to run a Docker container.

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets