# LLMWare — Unified Framework for Enterprise RAG Pipelines > Build retrieval-augmented generation workflows with small specialized models, parsing, embeddings, and vector search in one framework. ## Install Save in your project root: # LLMWare — Unified Framework for Enterprise RAG Pipelines ## Quick Use ```bash pip install llmware ``` ```python from llmware.models import ModelCatalog model = ModelCatalog().load_model("slim-summary-tool") response = model.function_call(text, function="summarize") ``` ## Introduction LLMWare is an open-source Python framework for building enterprise RAG (retrieval-augmented generation) pipelines. It provides an integrated stack covering document parsing, embedding, vector storage, and inference using small, specialized models that can run locally without GPU requirements. ## What LLMWare Does - Parses PDFs, Office documents, HTML, and text into structured chunks for retrieval - Generates embeddings and stores them in supported vector databases (Milvus, FAISS, Pinecone, Postgres/pgvector) - Ships a catalog of 50+ small specialized GGUF and ONNX models for targeted tasks - Runs function-calling models locally for summarization, extraction, classification, and Q&A - Provides a library abstraction that connects parsing, retrieval, and generation into cohesive pipelines ## Architecture Overview LLMWare organizes work around a Library object that ingests documents, chunks them, and stores metadata in a document store (MongoDB, SQLite, or Postgres). Embeddings are generated and pushed to a vector database for similarity search. At query time, retrieved context is passed to a model from the built-in catalog or an external API. The SLIM model series (small language models under 3B parameters) handle structured extraction tasks efficiently on CPU. ## Self-Hosting & Configuration - Install with `pip install llmware` on Python 3.9+ - Choose a document store backend: SQLite (default), MongoDB, or PostgreSQL - Select a vector database: FAISS (local default), Milvus, Qdrant, Pinecone, or pgvector - Download models on first use from Hugging Face Hub via the ModelCatalog - Configure API-based models (OpenAI, Anthropic, Google) via environment variables for hybrid deployments ## Key Features - Small specialized models (SLIM series) that run on CPU without GPU infrastructure - End-to-end pipeline covering ingestion, parsing, embedding, retrieval, and generation - Multi-format document parsing including scanned PDFs with OCR support - Model catalog with 50+ pre-configured models for different tasks and hardware profiles - Enterprise-friendly with support for air-gapped deployments and local-only operation ## Comparison with Similar Tools - **LangChain** — General-purpose LLM orchestration; LLMWare focuses on RAG with built-in models and parsing - **LlamaIndex** — Specialized in data indexing and retrieval; LLMWare bundles its own small models - **Haystack** — Pipeline-based NLP framework; LLMWare emphasizes CPU-friendly small models - **Unstructured** — Document parsing library; LLMWare integrates parsing with retrieval and inference - **txtai** — Embeddings and RAG; LLMWare provides a broader enterprise pipeline abstraction ## FAQ **Q: Do I need a GPU to run LLMWare?** A: No. The SLIM model series and GGUF models are designed to run on CPU. GPU acceleration is optional. **Q: What document formats does it support?** A: PDF, DOCX, PPTX, XLSX, HTML, CSV, TXT, and JSON. Scanned PDFs are handled via integrated OCR. **Q: Can I use external LLM APIs instead of local models?** A: Yes. LLMWare supports OpenAI, Anthropic, Google, and other API providers alongside local models. **Q: How does it compare to using LangChain with a vector store?** A: LLMWare provides a more opinionated, integrated stack with built-in small models, reducing the need to assemble components separately. ## Sources - https://github.com/llmware-ai/llmware - https://llmware-ai.github.io/llmware/ --- Source: https://tokrepo.com/en/workflows/asset-cb2e817c Author: AI Open Source