SkillsMar 31, 2026·2 min read

Llama Stack — Meta Official LLM App Framework

Official Meta framework for building LLM applications with Llama models. Inference, safety, RAG, agents, evals, and tool use. Standardized APIs. 8.3K+ stars.

Agent ready

Ready-to-run agent install

This asset can be installed after the agent chooses its runtime, checks the plan, and runs the matching command.

Native · 98/100Policy: allow
Agent surface
Any MCP/CLI agent
Kind
Skill
Install
Single
Trust
Trust: Established
Entrypoint
Llama Stack — Meta Official LLM App Framework
Direct install command
npx -y tokrepo@latest install 2670226a-fe9a-4de2-bc53-8d5a25b071f2 --target codex

Run after dry-run confirms the install plan.

TL;DR
Llama Stack provides standardized APIs for inference, safety, RAG, agents, and evals with Llama models.
§01

What it is

Llama Stack is the official Meta framework for building applications with Llama models. It provides standardized APIs for inference, safety guardrails, retrieval-augmented generation, agent orchestration, evaluations, and tool use. The framework is designed to work across different deployment environments.

AI engineers who build with Llama models and want a cohesive, officially supported development experience will find Llama Stack preferable to assembling individual components.

§02

How it saves time or tokens

Llama Stack unifies what would otherwise require separate libraries for inference, safety, RAG, and evaluation. The standardized API surface means you write integration code once and swap providers (local, cloud, or custom) without changing application logic.

§03

How to use

  1. Install Llama Stack via pip.
  2. Configure a provider for inference (local, Fireworks, Together, or custom).
  3. Use the client SDK to call inference, safety, and agent APIs.
from llama_stack_client import LlamaStackClient

client = LlamaStackClient(base_url='http://localhost:5000')

response = client.inference.chat_completion(
    model_id='Llama3.3-70B-Instruct',
    messages=[{'role': 'user', 'content': 'Explain RAG in two sentences.'}],
)
print(response.completion_message.content)
§04

Example

Running safety checks on model output:

# Check output with Llama Guard
safety_response = client.safety.run_shield(
    shield_id='llama_guard',
    messages=[{'role': 'assistant', 'content': response.completion_message.content}],
)
print(f'Safe: {safety_response.violation is None}')
§05

Related on TokRepo

§06

Common pitfalls

  • Llama Stack is optimized for Meta's Llama models. Using non-Llama models may require custom provider implementations.
  • Local inference with large Llama models requires significant GPU memory. Plan hardware accordingly.
  • The framework is evolving rapidly; API stability may vary between releases.

Frequently Asked Questions

What Llama models does Llama Stack support?+

Llama Stack supports all official Meta Llama models including Llama 3.3, Llama 3.2, and earlier versions. It provides inference, safety, and tool use APIs tailored to Llama model capabilities.

Can I run Llama Stack locally?+

Yes. Llama Stack supports local inference via Ollama, vLLM, and other local providers. You can run the full stack on your own hardware for complete data privacy.

What is Llama Guard?+

Llama Guard is Meta's safety model for content moderation. Llama Stack integrates it as a shield that checks model inputs and outputs for harmful content, enabling safety guardrails in production applications.

Does Llama Stack support RAG?+

Yes. Llama Stack includes RAG APIs for document ingestion, embedding, retrieval, and generation. You can use built-in providers or integrate custom vector stores.

How does Llama Stack compare to LangChain?+

LangChain is model-agnostic and supports many LLM providers. Llama Stack is specifically designed for Llama models with deeper integration into Meta's ecosystem (Llama Guard, official model configurations). Use Llama Stack for Llama-first projects; use LangChain for multi-provider flexibility.

Citations (3)
🙏

Source & Thanks

Created by Meta. Licensed under MIT. meta-llama/llama-stack — 8,300+ GitHub stars

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets