SkillsMar 31, 2026·2 min read

Llama Stack — Meta Official LLM App Framework

Official Meta framework for building LLM applications with Llama models. Inference, safety, RAG, agents, evals, and tool use. Standardized APIs. 8.3K+ stars.

Script Depot · Community

Agent ready

Ready-to-run agent install

This asset can be installed after the agent chooses its runtime, checks the plan, and runs the matching command.

Native · 98/100Policy: allow

Agent surface

Any MCP/CLI agent

Kind

Skill

Install

Single

Trust

Trust: Established

Entrypoint

Llama Stack — Meta Official LLM App Framework

Direct install command

npx -y tokrepo@latest install 2670226a-fe9a-4de2-bc53-8d5a25b071f2 --target codex

Run after dry-run confirms the install plan.

TL;DR

Llama Stack provides standardized APIs for inference, safety, RAG, agents, and evals with Llama models.

§01

What it is

Llama Stack is the official Meta framework for building applications with Llama models. It provides standardized APIs for inference, safety guardrails, retrieval-augmented generation, agent orchestration, evaluations, and tool use. The framework is designed to work across different deployment environments.

AI engineers who build with Llama models and want a cohesive, officially supported development experience will find Llama Stack preferable to assembling individual components.

§02

How it saves time or tokens

Llama Stack unifies what would otherwise require separate libraries for inference, safety, RAG, and evaluation. The standardized API surface means you write integration code once and swap providers (local, cloud, or custom) without changing application logic.

§03

How to use

Install Llama Stack via pip.
Configure a provider for inference (local, Fireworks, Together, or custom).
Use the client SDK to call inference, safety, and agent APIs.

from llama_stack_client import LlamaStackClient

client = LlamaStackClient(base_url='http://localhost:5000')

response = client.inference.chat_completion(
    model_id='Llama3.3-70B-Instruct',
    messages=[{'role': 'user', 'content': 'Explain RAG in two sentences.'}],
)
print(response.completion_message.content)

§04

Example

Running safety checks on model output:

# Check output with Llama Guard
safety_response = client.safety.run_shield(
    shield_id='llama_guard',
    messages=[{'role': 'assistant', 'content': response.completion_message.content}],
)
print(f'Safe: {safety_response.violation is None}')

§05

Related on TokRepo

Local LLM tools — Compare local inference options for Llama models
AI tools for agents — Agent frameworks and orchestration

§06

Common pitfalls

Llama Stack is optimized for Meta's Llama models. Using non-Llama models may require custom provider implementations.
Local inference with large Llama models requires significant GPU memory. Plan hardware accordingly.
The framework is evolving rapidly; API stability may vary between releases.

Frequently Asked Questions

What Llama models does Llama Stack support?+

Llama Stack supports all official Meta Llama models including Llama 3.3, Llama 3.2, and earlier versions. It provides inference, safety, and tool use APIs tailored to Llama model capabilities.

Can I run Llama Stack locally?+

Yes. Llama Stack supports local inference via Ollama, vLLM, and other local providers. You can run the full stack on your own hardware for complete data privacy.

What is Llama Guard?+

Llama Guard is Meta's safety model for content moderation. Llama Stack integrates it as a shield that checks model inputs and outputs for harmful content, enabling safety guardrails in production applications.

Does Llama Stack support RAG?+

Yes. Llama Stack includes RAG APIs for document ingestion, embedding, retrieval, and generation. You can use built-in providers or integrate custom vector stores.

How does Llama Stack compare to LangChain?+

LangChain is model-agnostic and supports many LLM providers. Llama Stack is specifically designed for Llama models with deeper integration into Meta's ecosystem (Llama Guard, official model configurations). Use Llama Stack for Llama-first projects; use LangChain for multi-provider flexibility.

Citations (3)

Llama Stack GitHub— Llama Stack is Meta's official framework for Llama applications
Llama Stack Documentation— Standardized APIs for inference, safety, RAG, agents, and evals
Meta AI Llama Guard— Llama Guard provides safety guardrails for LLM outputs

Related on TokRepo

Local LLM tools AI agent tools Featured workflows

🙏

Source & Thanks

Created by Meta. Licensed under MIT. meta-llama/llama-stack — 8,300+ GitHub stars

Discussion

No comments yet. Be the first to share your thoughts.

Related Assets

SolidStart — Full-Stack Meta-Framework for SolidJS

The official meta-framework for SolidJS that adds file-based routing, server functions, SSR, and deployment adapters for building full-stack web applications.

Skills

Script Depot

Dioxus — Full-Stack App Framework for Web, Desktop, and Mobile

Dioxus is a full-stack app framework for Rust with a React-like API. Build web (WASM), desktop (native WebView), mobile (iOS/Android), TUI, and server-rendered apps from one codebase. Hooks, components, server functions, and hot reloading.

Skills

Script Depot

LLaMA-Factory — Unified LLM Fine-Tuning Framework

LLaMA-Factory offers a web UI and CLI for fine-tuning over 100 large language models using methods like LoRA, QLoRA, and full-parameter training, with built-in evaluation and export.

Skills

Script Depot

Llama Models — Official Meta Model Utilities and Definitions

The official Meta repository containing model definitions, tokenizer utilities, and reference implementations for the Llama family of large language models including Llama 2 and Llama 3.

Scripts

Script Depot