# Llama Stack — Meta AI Agent Development Platform > Meta's official framework for building AI agents with Llama models. Includes tool calling, RAG, safety guardrails, memory, and evaluation in a unified API. ## Install Save in your project root: ## Quick Use ```bash pip install llama-stack llama stack build --template local --name my-stack llama stack run my-stack ``` ```python from llama_stack_client import LlamaStackClient client = LlamaStackClient(base_url="http://localhost:5000") response = client.inference.chat_completion( model_id="meta-llama/Llama-3.3-70B-Instruct", messages=[{"role": "user", "content": "Hello!"}], ) print(response.completion_message.content) ``` ## What is Llama Stack? Llama Stack is Meta's official framework for building production AI applications with Llama models. It provides a unified API covering inference, tool calling, RAG, safety guardrails, memory, and evaluation — everything you need to go from prototype to production with Llama. **Answer-Ready**: Llama Stack is Meta's official AI agent development platform providing inference, tool calling, RAG, safety guardrails, memory, and evaluation APIs for building production applications with Llama models. **Best for**: Teams building AI agents with Llama models who need a complete, standardized stack. **Works with**: Llama 3.3 (8B, 70B), Llama 3.1 405B, any Llama variant. **Setup time**: Under 10 minutes. ## Core Features ### 1. Unified API ```python # Inference response = client.inference.chat_completion( model_id="meta-llama/Llama-3.3-70B-Instruct", messages=[...], ) # Tool Calling response = client.inference.chat_completion( model_id="meta-llama/Llama-3.3-70B-Instruct", messages=[...], tools=[weather_tool, search_tool], ) # RAG client.memory_banks.create( name="docs", config={"type": "vector", "embedding_model": "all-MiniLM-L6-v2"}, ) client.memory_banks.insert(bank_id="docs", documents=[...]) ``` ### 2. Agentic Workflows ```python from llama_stack_client.lib.agents.agent import Agent from llama_stack_client.lib.agents.event_logger import EventLogger agent = Agent( client, model="meta-llama/Llama-3.3-70B-Instruct", instructions="You are a helpful research assistant.", tools=["brave_search", "code_interpreter"], enable_session_persistence=True, ) session = agent.create_session("research") response = agent.create_turn( session_id=session.session_id, messages=[{"role": "user", "content": "Research quantum computing trends"}], ) for event in EventLogger().log(response): print(event) ``` ### 3. Safety & Guardrails ```python # Built-in Llama Guard for content safety response = client.safety.run_shield( shield_id="llama-guard", messages=[{"role": "user", "content": "..."}], ) if response.violation: print(f"Blocked: {response.violation.user_message}") ``` ### 4. Multiple Providers Run the same API against different backends: | Provider | Use Case | |----------|----------| | Local (Ollama) | Development | | Together AI | Cloud inference | | Fireworks AI | Low-latency production | | AWS Bedrock | Enterprise deployment | | Meta Reference | Research | ### 5. Evaluation ```python # Evaluate model performance results = client.eval.run_eval( model_id="meta-llama/Llama-3.3-70B-Instruct", benchmark="mmlu", ) ``` ## Architecture ``` Llama Stack Server (unified API) ├── Inference (chat, completion, embeddings) ├── Safety (Llama Guard, content shields) ├── Memory (vector stores, session persistence) ├── Agents (tool use, multi-step reasoning) └── Eval (benchmarks, custom evaluations) ``` ## FAQ **Q: Does it only work with Llama models?** A: Primarily designed for Llama, but the API is model-agnostic. Community providers add support for other models. **Q: How does it compare to LangChain?** A: Llama Stack is a unified server with standardized APIs. LangChain is a client-side framework. They can work together. **Q: Is it production ready?** A: Yes, Meta uses it internally and supports production deployments through partner providers. ## Source & Thanks > Created by [Meta](https://github.com/meta-llama). Licensed under MIT. > > [meta-llama/llama-stack](https://github.com/meta-llama/llama-stack) — 12k+ stars ## 快速使用 ```bash pip install llama-stack llama stack build --template local --name my-stack llama stack run my-stack ``` 三步启动 Llama Stack 本地服务器。 ## 什么是 Llama Stack? Llama Stack 是 Meta 官方 AI 代理开发平台,提供推理、工具调用、RAG、安全护栏、记忆和评估的统一 API。 **一句话总结**:Llama Stack 是 Meta 官方平台,为 Llama 模型提供推理、工具调用、RAG、安全护栏、记忆和评估 API。 **适合人群**:使用 Llama 模型构建 AI 代理的团队。**支持**:Llama 3.3(8B/70B)、Llama 3.1 405B。 ## 核心功能 ### 1. 统一 API 推理、工具调用、RAG、安全、记忆、评估一站式。 ### 2. 代理工作流 内置代理框架,支持多步推理和工具使用。 ### 3. 安全护栏 Llama Guard 内容安全检测。 ### 4. 多后端支持 本地、云端、企业部署统一 API。 ### 5. 评估 内置基准测试和自定义评估。 ## 常见问题 **Q: 只支持 Llama 模型吗?** A: 主要为 Llama 设计,API 模型无关,社区可扩展。 **Q: 生产就绪?** A: 是,Meta 内部使用,合作伙伴支持生产部署。 ## 来源与致谢 > [meta-llama/llama-stack](https://github.com/meta-llama/llama-stack) — 12k+ stars, MIT --- Source: https://tokrepo.com/en/workflows/095c5744-4916-4212-9dff-a7a855bc7a22 Author: AI Open Source