Popular Models
| Model | Size | Best For |
|---|---|---|
llama3.1 |
8B / 70B | General purpose, coding |
mistral |
7B | Fast, multilingual |
codestral |
22B | Code generation |
gemma2 |
9B / 27B | Compact, efficient |
phi3 |
3.8B / 14B | Small device deployment |
qwen2.5 |
7B / 72B | Multilingual, math |
deepseek-coder |
6.7B / 33B | Code completion |
llava |
7B / 13B | Vision + text |
ollama pull llama3.1:70b # Download 70B model
ollama pull codestral # Code-specialized model
ollama list # See installed modelsOpenAI-Compatible API
Point any OpenAI SDK client to http://localhost:11434/v1:
from openai import OpenAI
client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")
response = client.chat.completions.create(
model="llama3.1",
messages=[{"role": "user", "content": "Write a Python function"}]
)Use with AI Tools
Continue (VS Code):
{"models": [{"title": "Llama", "provider": "ollama", "model": "llama3.1"}]}LiteLLM proxy:
litellm --model ollama/llama3.1LangChain:
from langchain_community.llms import Ollama
llm = Ollama(model="llama3.1")Custom Modelfiles
Create custom models with system prompts and parameters:
FROM llama3.1
SYSTEM "You are a senior Python developer. Always write type-hinted, well-tested code."
PARAMETER temperature 0.3
PARAMETER num_ctx 8192ollama create my-coder -f Modelfile
ollama run my-coderKey Stats
- 120,000+ GitHub stars
- 100+ available models
- OpenAI-compatible API
- Runs on macOS, Linux, Windows
- GPU acceleration (NVIDIA, Apple Silicon)
FAQ
Q: What is Ollama? A: Ollama is a tool that runs open-source LLMs locally with one command, providing an OpenAI-compatible API for seamless integration with AI development tools.
Q: Is Ollama free? A: Yes, completely free and open-source under MIT license. No API keys or usage fees.
Q: What hardware do I need? A: 8GB RAM for 7B models, 16GB for 13B, 64GB for 70B. Apple Silicon and NVIDIA GPUs are automatically utilized for acceleration.