ConfigsApr 6, 2026·2 min read

Ollama — Run LLMs Locally with One Command

Run Llama 3, Mistral, Gemma, Phi, and 100+ open-source LLMs locally with a single command. OpenAI-compatible API for seamless integration with AI tools. 120,000+ GitHub stars.

AI
AI Open Source · Community
Quick Use

Use it first, then decide how deep to go

This block should tell both the user and the agent what to copy, install, and apply first.

# Install (macOS)
brew install ollama

# Install (Linux)
curl -fsSL https://ollama.com/install.sh | sh

# Run a model
ollama run llama3.1
# Chat starts immediately — no API keys, no cloud, no cost

Use as OpenAI-compatible API:

curl http://localhost:11434/v1/chat/completions \
  -d '{"model":"llama3.1","messages":[{"role":"user","content":"Hello"}]}'

Intro

Ollama is an open-source tool that lets you run Llama 3, Mistral, Gemma, Phi, and 100+ large language models locally with a single command and 120,000+ GitHub stars. No API keys, no cloud costs, no data leaving your machine. It provides an OpenAI-compatible API at localhost:11434, making it a drop-in local replacement for cloud LLMs in any tool that supports OpenAI. Best for developers who want privacy, zero latency, and unlimited free inference. Works with: Claude Code (via LiteLLM), Cursor, Continue, LangChain, any OpenAI-compatible client. Setup time: under 2 minutes.


Popular Models

Model Size Best For
llama3.1 8B / 70B General purpose, coding
mistral 7B Fast, multilingual
codestral 22B Code generation
gemma2 9B / 27B Compact, efficient
phi3 3.8B / 14B Small device deployment
qwen2.5 7B / 72B Multilingual, math
deepseek-coder 6.7B / 33B Code completion
llava 7B / 13B Vision + text
ollama pull llama3.1:70b    # Download 70B model
ollama pull codestral       # Code-specialized model
ollama list                 # See installed models

OpenAI-Compatible API

Point any OpenAI SDK client to http://localhost:11434/v1:

from openai import OpenAI

client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")
response = client.chat.completions.create(
    model="llama3.1",
    messages=[{"role": "user", "content": "Write a Python function"}]
)

Use with AI Tools

Continue (VS Code):

{"models": [{"title": "Llama", "provider": "ollama", "model": "llama3.1"}]}

LiteLLM proxy:

litellm --model ollama/llama3.1

LangChain:

from langchain_community.llms import Ollama
llm = Ollama(model="llama3.1")

Custom Modelfiles

Create custom models with system prompts and parameters:

FROM llama3.1
SYSTEM "You are a senior Python developer. Always write type-hinted, well-tested code."
PARAMETER temperature 0.3
PARAMETER num_ctx 8192
ollama create my-coder -f Modelfile
ollama run my-coder

Key Stats

  • 120,000+ GitHub stars
  • 100+ available models
  • OpenAI-compatible API
  • Runs on macOS, Linux, Windows
  • GPU acceleration (NVIDIA, Apple Silicon)

FAQ

Q: What is Ollama? A: Ollama is a tool that runs open-source LLMs locally with one command, providing an OpenAI-compatible API for seamless integration with AI development tools.

Q: Is Ollama free? A: Yes, completely free and open-source under MIT license. No API keys or usage fees.

Q: What hardware do I need? A: 8GB RAM for 7B models, 16GB for 13B, 64GB for 70B. Apple Silicon and NVIDIA GPUs are automatically utilized for acceleration.


🙏

Source & Thanks

Created by Ollama. Licensed under MIT.

ollama — ⭐ 120,000+

Thanks to the Ollama team for making local LLM inference effortless.

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets