Configs2026年4月2日·1 分钟阅读

Ollama — Run LLMs Locally with One Command

Get up and running with Llama, Mistral, Gemma, DeepSeek and 100+ models locally. Simple CLI, OpenAI-compatible API. 95K+ stars.

TO
TokRepo精选 · Community
快速使用

先拿来用,再决定要不要深挖

这里应该同时让用户和 Agent 知道第一步该复制什么、安装什么、落到哪里。

```bash # Install (macOS/Linux) curl -fsSL https://ollama.com/install.sh | sh # Run a model ollama run llama3.2 ollama run deepseek-r1 ollama run gemma2 ``` ```bash # Use the OpenAI-compatible API curl http://localhost:11434/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{"model": "llama3.2", "messages": [{"role": "user", "content": "Hello!"}]}' ``` Also available on Windows (installer) and Docker: `docker run -d -p 11434:11434 ollama/ollama`
## Introduction Ollama is the **simplest way to run large language models on your own machine**. It packages model weights, tokenizer, and runtime into a single command, making local AI accessible to anyone with a computer. Core capabilities: - **One-Command Setup** — `ollama run llama3.2` downloads and starts a model in seconds. No Python environment, no CUDA setup, no config files - **100+ Models** — Llama 3.2, DeepSeek R1, Gemma 2, Mistral, Phi-3, Qwen, CodeLlama, and many more from the Ollama library - **OpenAI-Compatible API** — Drop-in replacement for OpenAI's API at `localhost:11434/v1/`. Works with any OpenAI SDK client - **GPU Acceleration** — Automatic CUDA, ROCm, and Metal (Apple Silicon) detection. Falls back to CPU gracefully - **Modelfile System** — Create custom models with system prompts, parameters, and adapters using Dockerfile-like syntax - **Multi-Model Serving** — Run multiple models simultaneously with automatic memory management - **Import & Convert** — Import models from GGUF, SafeTensors, or PyTorch formats 95,000+ GitHub stars. The de facto standard for local LLM inference, powering Open WebUI, Continue, Aider, and hundreds of other tools. ## FAQ **Q: How much RAM/VRAM do I need?** A: 7B models need ~4GB RAM (quantized). 13B needs ~8GB. 70B needs ~40GB. Apple Silicon Macs are great because they share CPU/GPU memory. For GPU: any NVIDIA card with 6GB+ VRAM runs 7B models well. **Q: How does Ollama compare to llama.cpp?** A: Ollama is built on llama.cpp but adds model management, an API server, and a model library. Think of it as Docker for LLMs — llama.cpp is the engine, Ollama is the complete platform. **Q: Can I use it with Claude Code or other AI tools?** A: Yes. Any tool that supports OpenAI-compatible APIs can use Ollama. Set the base URL to `http://localhost:11434/v1/` and use the model name as the model ID. **Q: Is it free?** A: 100% free and open source (MIT). No API keys, no usage limits, no data sent anywhere. Everything runs on your hardware. ## Works With - Open WebUI for chat interface - Continue / Aider / Cursor for AI coding - LangChain / LlamaIndex / any OpenAI-compatible SDK - Docker / Kubernetes for deployment - Apple Silicon / NVIDIA / AMD GPUs
🙏

来源与感谢

- GitHub: [ollama/ollama](https://github.com/ollama/ollama) - License: MIT - Stars: 95,000+ - Maintainer: Ollama team Thanks to the Ollama team for democratizing local LLM access, making it as easy to run a language model as it is to run a Docker container.

讨论

登录后参与讨论。
还没有评论,来写第一条吧。

相关资产