# Llama Stack — Meta Official LLM App Framework

> Official Meta framework for building LLM applications with Llama models. Inference, safety, RAG, agents, evals, and tool use. Standardized APIs. 8.3K+ stars.

## Install

Save as a script file and run:

## Quick Use

```bash
pip install llama-stack
llama stack build --template ollama --image-type conda
llama stack run ollama
```

Or use the client:
```python
from llama_stack_client import LlamaStackClient

client = LlamaStackClient(base_url="http://localhost:8321")
response = client.inference.chat_completion(
    model_id="meta-llama/Llama-3.1-8B-Instruct",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.completion_message.content.text)
```

---

## Intro

Llama Stack is Meta's official framework for building LLM applications with Llama models. It provides standardized APIs for inference, safety (Llama Guard), RAG, agentic workflows, evaluations, tool use, and memory — all designed to work seamlessly with Llama 3, 3.1, and 3.2 models. Deploy locally, in the cloud, or on-device. 8,300+ GitHub stars, MIT licensed.

**Best for**: Developers building production apps with Meta's Llama models
**Works with**: Llama 3/3.1/3.2, Ollama, Together, Fireworks, AWS Bedrock, NVIDIA NIM

---

## Core APIs

| API | Description |
|-----|-------------|
| **Inference** | Chat completion, text generation, embeddings |
| **Safety** | Content moderation with Llama Guard / Prompt Guard |
| **Agents** | Multi-step agentic workflows with tool use and memory |
| **RAG** | Document ingestion, vector search, contextual retrieval |
| **Eval** | Benchmark and evaluate model quality |
| **Memory** | Persistent memory banks for agent context |
| **Tool Use** | Web search, code execution, Wolfram Alpha, custom tools |

### Distribution Providers
Run anywhere with pluggable backends:
- **Local**: Ollama, vLLM, TGI
- **Cloud**: Together, Fireworks, AWS Bedrock, NVIDIA NIM
- **On-device**: Qualcomm, MediaTek, PyTorch ExecuTorch

---

### FAQ

**Q: What is Llama Stack?**
A: Meta's official framework for building LLM apps with Llama models. Provides standardized APIs for inference, safety, RAG, agents, and evals. 8.3K+ stars, MIT licensed.

**Q: Can I use Llama Stack with non-Llama models?**
A: Llama Stack is designed for Llama models, but inference providers like Ollama and vLLM can serve other models through the same API.

---

## Source & Thanks

> Created by [Meta](https://github.com/meta-llama). Licensed under MIT.
> [meta-llama/llama-stack](https://github.com/meta-llama/llama-stack) — 8,300+ GitHub stars

---
Source: https://tokrepo.com/en/workflows/2670226a-fe9a-4de2-bc53-8d5a25b071f2
Author: Script Depot