# Llama Stack — Meta AI Agent Development Platform

> Meta's official framework for building AI agents with Llama models. Includes tool calling, RAG, safety guardrails, memory, and evaluation in a unified API.

## Install

Save in your project root:

## Quick Use

```bash
pip install llama-stack
llama stack build --template local --name my-stack
llama stack run my-stack
```

```python
from llama_stack_client import LlamaStackClient

client = LlamaStackClient(base_url="http://localhost:5000")
response = client.inference.chat_completion(
    model_id="meta-llama/Llama-3.3-70B-Instruct",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.completion_message.content)
```

## What is Llama Stack?

Llama Stack is Meta's official framework for building production AI applications with Llama models. It provides a unified API covering inference, tool calling, RAG, safety guardrails, memory, and evaluation — everything you need to go from prototype to production with Llama.

**Answer-Ready**: Llama Stack is Meta's official AI agent development platform providing inference, tool calling, RAG, safety guardrails, memory, and evaluation APIs for building production applications with Llama models.

**Best for**: Teams building AI agents with Llama models who need a complete, standardized stack. **Works with**: Llama 3.3 (8B, 70B), Llama 3.1 405B, any Llama variant. **Setup time**: Under 10 minutes.

## Core Features

### 1. Unified API

```python
# Inference
response = client.inference.chat_completion(
    model_id="meta-llama/Llama-3.3-70B-Instruct",
    messages=[...],
)

# Tool Calling
response = client.inference.chat_completion(
    model_id="meta-llama/Llama-3.3-70B-Instruct",
    messages=[...],
    tools=[weather_tool, search_tool],
)

# RAG
client.memory_banks.create(
    name="docs",
    config={"type": "vector", "embedding_model": "all-MiniLM-L6-v2"},
)
client.memory_banks.insert(bank_id="docs", documents=[...])
```

### 2. Agentic Workflows

```python
from llama_stack_client.lib.agents.agent import Agent
from llama_stack_client.lib.agents.event_logger import EventLogger

agent = Agent(
    client,
    model="meta-llama/Llama-3.3-70B-Instruct",
    instructions="You are a helpful research assistant.",
    tools=["brave_search", "code_interpreter"],
    enable_session_persistence=True,
)

session = agent.create_session("research")
response = agent.create_turn(
    session_id=session.session_id,
    messages=[{"role": "user", "content": "Research quantum computing trends"}],
)
for event in EventLogger().log(response):
    print(event)
```

### 3. Safety & Guardrails

```python
# Built-in Llama Guard for content safety
response = client.safety.run_shield(
    shield_id="llama-guard",
    messages=[{"role": "user", "content": "..."}],
)
if response.violation:
    print(f"Blocked: {response.violation.user_message}")
```

### 4. Multiple Providers
Run the same API against different backends:

| Provider | Use Case |
|----------|----------|
| Local (Ollama) | Development |
| Together AI | Cloud inference |
| Fireworks AI | Low-latency production |
| AWS Bedrock | Enterprise deployment |
| Meta Reference | Research |

### 5. Evaluation

```python
# Evaluate model performance
results = client.eval.run_eval(
    model_id="meta-llama/Llama-3.3-70B-Instruct",
    benchmark="mmlu",
)
```

## Architecture

```
Llama Stack Server (unified API)
  ├── Inference (chat, completion, embeddings)
  ├── Safety (Llama Guard, content shields)
  ├── Memory (vector stores, session persistence)
  ├── Agents (tool use, multi-step reasoning)
  └── Eval (benchmarks, custom evaluations)
```

## FAQ

**Q: Does it only work with Llama models?**
A: Primarily designed for Llama, but the API is model-agnostic. Community providers add support for other models.

**Q: How does it compare to LangChain?**
A: Llama Stack is a unified server with standardized APIs. LangChain is a client-side framework. They can work together.

**Q: Is it production ready?**
A: Yes, Meta uses it internally and supports production deployments through partner providers.

## Source & Thanks

> Created by [Meta](https://github.com/meta-llama). Licensed under MIT.
>
> [meta-llama/llama-stack](https://github.com/meta-llama/llama-stack) — 12k+ stars

<!-- ZH -->

## 快速使用

```bash
pip install llama-stack
llama stack build --template local --name my-stack
llama stack run my-stack
```

三步启动 Llama Stack 本地服务器。

## 什么是 Llama Stack？

Llama Stack 是 Meta 官方 AI 代理开发平台，提供推理、工具调用、RAG、安全护栏、记忆和评估的统一 API。

**一句话总结**：Llama Stack 是 Meta 官方平台，为 Llama 模型提供推理、工具调用、RAG、安全护栏、记忆和评估 API。

**适合人群**：使用 Llama 模型构建 AI 代理的团队。**支持**：Llama 3.3（8B/70B）、Llama 3.1 405B。

## 核心功能

### 1. 统一 API
推理、工具调用、RAG、安全、记忆、评估一站式。

### 2. 代理工作流
内置代理框架，支持多步推理和工具使用。

### 3. 安全护栏
Llama Guard 内容安全检测。

### 4. 多后端支持
本地、云端、企业部署统一 API。

### 5. 评估
内置基准测试和自定义评估。

## 常见问题

**Q: 只支持 Llama 模型吗？**
A: 主要为 Llama 设计，API 模型无关，社区可扩展。

**Q: 生产就绪？**
A: 是，Meta 内部使用，合作伙伴支持生产部署。

## 来源与致谢

> [meta-llama/llama-stack](https://github.com/meta-llama/llama-stack) — 12k+ stars, MIT

---
Source: https://tokrepo.com/en/workflows/095c5744-4916-4212-9dff-a7a855bc7a22
Author: AI Open Source