# Llama Stack — Meta Official LLM App Framework > Official Meta framework for building LLM applications with Llama models. Inference, safety, RAG, agents, evals, and tool use. Standardized APIs. 8.3K+ stars. ## Install Save as a script file and run: ## Quick Use ```bash pip install llama-stack llama stack build --template ollama --image-type conda llama stack run ollama ``` Or use the client: ```python from llama_stack_client import LlamaStackClient client = LlamaStackClient(base_url="http://localhost:8321") response = client.inference.chat_completion( model_id="meta-llama/Llama-3.1-8B-Instruct", messages=[{"role": "user", "content": "Hello!"}], ) print(response.completion_message.content.text) ``` --- ## Intro Llama Stack is Meta's official framework for building LLM applications with Llama models. It provides standardized APIs for inference, safety (Llama Guard), RAG, agentic workflows, evaluations, tool use, and memory — all designed to work seamlessly with Llama 3, 3.1, and 3.2 models. Deploy locally, in the cloud, or on-device. 8,300+ GitHub stars, MIT licensed. **Best for**: Developers building production apps with Meta's Llama models **Works with**: Llama 3/3.1/3.2, Ollama, Together, Fireworks, AWS Bedrock, NVIDIA NIM --- ## Core APIs | API | Description | |-----|-------------| | **Inference** | Chat completion, text generation, embeddings | | **Safety** | Content moderation with Llama Guard / Prompt Guard | | **Agents** | Multi-step agentic workflows with tool use and memory | | **RAG** | Document ingestion, vector search, contextual retrieval | | **Eval** | Benchmark and evaluate model quality | | **Memory** | Persistent memory banks for agent context | | **Tool Use** | Web search, code execution, Wolfram Alpha, custom tools | ### Distribution Providers Run anywhere with pluggable backends: - **Local**: Ollama, vLLM, TGI - **Cloud**: Together, Fireworks, AWS Bedrock, NVIDIA NIM - **On-device**: Qualcomm, MediaTek, PyTorch ExecuTorch --- ### FAQ **Q: What is Llama Stack?** A: Meta's official framework for building LLM apps with Llama models. Provides standardized APIs for inference, safety, RAG, agents, and evals. 8.3K+ stars, MIT licensed. **Q: Can I use Llama Stack with non-Llama models?** A: Llama Stack is designed for Llama models, but inference providers like Ollama and vLLM can serve other models through the same API. --- ## Source & Thanks > Created by [Meta](https://github.com/meta-llama). Licensed under MIT. > [meta-llama/llama-stack](https://github.com/meta-llama/llama-stack) — 8,300+ GitHub stars --- Source: https://tokrepo.com/en/workflows/2670226a-fe9a-4de2-bc53-8d5a25b071f2 Author: Script Depot