Llama Stack — Meta Official LLM App Framework
Official Meta framework for building LLM applications with Llama models. Inference, safety, RAG, agents, evals, and tool use. Standardized APIs. 8.3K+ stars.
Instalación lista para agent
Este activo puede instalarse después de elegir el runtime, revisar el plan y ejecutar el comando correspondiente.
npx -y tokrepo@latest install 2670226a-fe9a-4de2-bc53-8d5a25b071f2 --target codexEjecutar después de confirmar el plan con dry-run.
What it is
Llama Stack is the official Meta framework for building applications with Llama models. It provides standardized APIs for inference, safety guardrails, retrieval-augmented generation, agent orchestration, evaluations, and tool use. The framework is designed to work across different deployment environments.
AI engineers who build with Llama models and want a cohesive, officially supported development experience will find Llama Stack preferable to assembling individual components.
How it saves time or tokens
Llama Stack unifies what would otherwise require separate libraries for inference, safety, RAG, and evaluation. The standardized API surface means you write integration code once and swap providers (local, cloud, or custom) without changing application logic.
How to use
- Install Llama Stack via pip.
- Configure a provider for inference (local, Fireworks, Together, or custom).
- Use the client SDK to call inference, safety, and agent APIs.
from llama_stack_client import LlamaStackClient
client = LlamaStackClient(base_url='http://localhost:5000')
response = client.inference.chat_completion(
model_id='Llama3.3-70B-Instruct',
messages=[{'role': 'user', 'content': 'Explain RAG in two sentences.'}],
)
print(response.completion_message.content)
Example
Running safety checks on model output:
# Check output with Llama Guard
safety_response = client.safety.run_shield(
shield_id='llama_guard',
messages=[{'role': 'assistant', 'content': response.completion_message.content}],
)
print(f'Safe: {safety_response.violation is None}')
Related on TokRepo
- Local LLM tools — Compare local inference options for Llama models
- AI tools for agents — Agent frameworks and orchestration
Common pitfalls
- Llama Stack is optimized for Meta's Llama models. Using non-Llama models may require custom provider implementations.
- Local inference with large Llama models requires significant GPU memory. Plan hardware accordingly.
- The framework is evolving rapidly; API stability may vary between releases.
Preguntas frecuentes
Llama Stack supports all official Meta Llama models including Llama 3.3, Llama 3.2, and earlier versions. It provides inference, safety, and tool use APIs tailored to Llama model capabilities.
Yes. Llama Stack supports local inference via Ollama, vLLM, and other local providers. You can run the full stack on your own hardware for complete data privacy.
Llama Guard is Meta's safety model for content moderation. Llama Stack integrates it as a shield that checks model inputs and outputs for harmful content, enabling safety guardrails in production applications.
Yes. Llama Stack includes RAG APIs for document ingestion, embedding, retrieval, and generation. You can use built-in providers or integrate custom vector stores.
LangChain is model-agnostic and supports many LLM providers. Llama Stack is specifically designed for Llama models with deeper integration into Meta's ecosystem (Llama Guard, official model configurations). Use Llama Stack for Llama-first projects; use LangChain for multi-provider flexibility.
Referencias (3)
- Llama Stack GitHub— Llama Stack is Meta's official framework for Llama applications
- Llama Stack Documentation— Standardized APIs for inference, safety, RAG, agents, and evals
- Meta AI Llama Guard— Llama Guard provides safety guardrails for LLM outputs
Relacionados en TokRepo
Fuente y agradecimientos
Created by Meta. Licensed under MIT. meta-llama/llama-stack — 8,300+ GitHub stars
Discusión
Activos relacionados
SolidStart — Full-Stack Meta-Framework for SolidJS
The official meta-framework for SolidJS that adds file-based routing, server functions, SSR, and deployment adapters for building full-stack web applications.
Dioxus — Full-Stack App Framework for Web, Desktop, and Mobile
Dioxus is a full-stack app framework for Rust with a React-like API. Build web (WASM), desktop (native WebView), mobile (iOS/Android), TUI, and server-rendered apps from one codebase. Hooks, components, server functions, and hot reloading.
LLaMA-Factory — Unified LLM Fine-Tuning Framework
LLaMA-Factory offers a web UI and CLI for fine-tuning over 100 large language models using methods like LoRA, QLoRA, and full-parameter training, with built-in evaluation and export.
LlamaIndex — Data Framework for LLM Applications
Connect your data to large language models. The leading framework for RAG, document indexing, knowledge graphs, and structured data extraction.