ScriptsMar 31, 2026·2 min read

Llama Stack — Meta Official LLM App Framework

Official Meta framework for building LLM applications with Llama models. Inference, safety, RAG, agents, evals, and tool use. Standardized APIs. 8.3K+ stars.

TO
TokRepo精选 · Community
Quick Use

Use it first, then decide how deep to go

This block should tell both the user and the agent what to copy, install, and apply first.

pip install llama-stack
llama stack build --template ollama --image-type conda
llama stack run ollama

Or use the client:

from llama_stack_client import LlamaStackClient

client = LlamaStackClient(base_url="http://localhost:8321")
response = client.inference.chat_completion(
    model_id="meta-llama/Llama-3.1-8B-Instruct",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.completion_message.content.text)

Intro

Llama Stack is Meta's official framework for building LLM applications with Llama models. It provides standardized APIs for inference, safety (Llama Guard), RAG, agentic workflows, evaluations, tool use, and memory — all designed to work seamlessly with Llama 3, 3.1, and 3.2 models. Deploy locally, in the cloud, or on-device. 8,300+ GitHub stars, MIT licensed.

Best for: Developers building production apps with Meta's Llama models Works with: Llama 3/3.1/3.2, Ollama, Together, Fireworks, AWS Bedrock, NVIDIA NIM


Core APIs

API Description
Inference Chat completion, text generation, embeddings
Safety Content moderation with Llama Guard / Prompt Guard
Agents Multi-step agentic workflows with tool use and memory
RAG Document ingestion, vector search, contextual retrieval
Eval Benchmark and evaluate model quality
Memory Persistent memory banks for agent context
Tool Use Web search, code execution, Wolfram Alpha, custom tools

Distribution Providers

Run anywhere with pluggable backends:

  • Local: Ollama, vLLM, TGI
  • Cloud: Together, Fireworks, AWS Bedrock, NVIDIA NIM
  • On-device: Qualcomm, MediaTek, PyTorch ExecuTorch

FAQ

Q: What is Llama Stack? A: Meta's official framework for building LLM apps with Llama models. Provides standardized APIs for inference, safety, RAG, agents, and evals. 8.3K+ stars, MIT licensed.

Q: Can I use Llama Stack with non-Llama models? A: Llama Stack is designed for Llama models, but inference providers like Ollama and vLLM can serve other models through the same API.


🙏

Source & Thanks

Created by Meta. Licensed under MIT. meta-llama/llama-stack — 8,300+ GitHub stars

Related Assets