ConfigsApr 7, 2026·2 min read

Llama Stack — Meta AI Agent Development Platform

Meta's official framework for building AI agents with Llama models. Includes tool calling, RAG, safety guardrails, memory, and evaluation in a unified API.

AI
AI Open Source · Community
Quick Use

Use it first, then decide how deep to go

This block should tell both the user and the agent what to copy, install, and apply first.

pip install llama-stack
llama stack build --template local --name my-stack
llama stack run my-stack
from llama_stack_client import LlamaStackClient

client = LlamaStackClient(base_url="http://localhost:5000")
response = client.inference.chat_completion(
    model_id="meta-llama/Llama-3.3-70B-Instruct",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.completion_message.content)

What is Llama Stack?

Llama Stack is Meta's official framework for building production AI applications with Llama models. It provides a unified API covering inference, tool calling, RAG, safety guardrails, memory, and evaluation — everything you need to go from prototype to production with Llama.

Answer-Ready: Llama Stack is Meta's official AI agent development platform providing inference, tool calling, RAG, safety guardrails, memory, and evaluation APIs for building production applications with Llama models.

Best for: Teams building AI agents with Llama models who need a complete, standardized stack. Works with: Llama 3.3 (8B, 70B), Llama 3.1 405B, any Llama variant. Setup time: Under 10 minutes.

Core Features

1. Unified API

# Inference
response = client.inference.chat_completion(
    model_id="meta-llama/Llama-3.3-70B-Instruct",
    messages=[...],
)

# Tool Calling
response = client.inference.chat_completion(
    model_id="meta-llama/Llama-3.3-70B-Instruct",
    messages=[...],
    tools=[weather_tool, search_tool],
)

# RAG
client.memory_banks.create(
    name="docs",
    config={"type": "vector", "embedding_model": "all-MiniLM-L6-v2"},
)
client.memory_banks.insert(bank_id="docs", documents=[...])

2. Agentic Workflows

from llama_stack_client.lib.agents.agent import Agent
from llama_stack_client.lib.agents.event_logger import EventLogger

agent = Agent(
    client,
    model="meta-llama/Llama-3.3-70B-Instruct",
    instructions="You are a helpful research assistant.",
    tools=["brave_search", "code_interpreter"],
    enable_session_persistence=True,
)

session = agent.create_session("research")
response = agent.create_turn(
    session_id=session.session_id,
    messages=[{"role": "user", "content": "Research quantum computing trends"}],
)
for event in EventLogger().log(response):
    print(event)

3. Safety & Guardrails

# Built-in Llama Guard for content safety
response = client.safety.run_shield(
    shield_id="llama-guard",
    messages=[{"role": "user", "content": "..."}],
)
if response.violation:
    print(f"Blocked: {response.violation.user_message}")

4. Multiple Providers

Run the same API against different backends:

Provider Use Case
Local (Ollama) Development
Together AI Cloud inference
Fireworks AI Low-latency production
AWS Bedrock Enterprise deployment
Meta Reference Research

5. Evaluation

# Evaluate model performance
results = client.eval.run_eval(
    model_id="meta-llama/Llama-3.3-70B-Instruct",
    benchmark="mmlu",
)

Architecture

Llama Stack Server (unified API)
  ├── Inference (chat, completion, embeddings)
  ├── Safety (Llama Guard, content shields)
  ├── Memory (vector stores, session persistence)
  ├── Agents (tool use, multi-step reasoning)
  └── Eval (benchmarks, custom evaluations)

FAQ

Q: Does it only work with Llama models? A: Primarily designed for Llama, but the API is model-agnostic. Community providers add support for other models.

Q: How does it compare to LangChain? A: Llama Stack is a unified server with standardized APIs. LangChain is a client-side framework. They can work together.

Q: Is it production ready? A: Yes, Meta uses it internally and supports production deployments through partner providers.

🙏

Source & Thanks

Created by Meta. Licensed under MIT.

meta-llama/llama-stack — 12k+ stars

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.