# LLMWare — Unified Framework for Enterprise RAG Pipelines

> Build retrieval-augmented generation workflows with small specialized models, parsing, embeddings, and vector search in one framework.

## Install

Save in your project root:

# LLMWare — Unified Framework for Enterprise RAG Pipelines

## Quick Use
```bash
pip install llmware
```
```python
from llmware.models import ModelCatalog
model = ModelCatalog().load_model("slim-summary-tool")
response = model.function_call(text, function="summarize")
```

## Introduction
LLMWare is an open-source Python framework for building enterprise RAG (retrieval-augmented generation) pipelines. It provides an integrated stack covering document parsing, embedding, vector storage, and inference using small, specialized models that can run locally without GPU requirements.

## What LLMWare Does
- Parses PDFs, Office documents, HTML, and text into structured chunks for retrieval
- Generates embeddings and stores them in supported vector databases (Milvus, FAISS, Pinecone, Postgres/pgvector)
- Ships a catalog of 50+ small specialized GGUF and ONNX models for targeted tasks
- Runs function-calling models locally for summarization, extraction, classification, and Q&A
- Provides a library abstraction that connects parsing, retrieval, and generation into cohesive pipelines

## Architecture Overview
LLMWare organizes work around a Library object that ingests documents, chunks them, and stores metadata in a document store (MongoDB, SQLite, or Postgres). Embeddings are generated and pushed to a vector database for similarity search. At query time, retrieved context is passed to a model from the built-in catalog or an external API. The SLIM model series (small language models under 3B parameters) handle structured extraction tasks efficiently on CPU.

## Self-Hosting & Configuration
- Install with `pip install llmware` on Python 3.9+
- Choose a document store backend: SQLite (default), MongoDB, or PostgreSQL
- Select a vector database: FAISS (local default), Milvus, Qdrant, Pinecone, or pgvector
- Download models on first use from Hugging Face Hub via the ModelCatalog
- Configure API-based models (OpenAI, Anthropic, Google) via environment variables for hybrid deployments

## Key Features
- Small specialized models (SLIM series) that run on CPU without GPU infrastructure
- End-to-end pipeline covering ingestion, parsing, embedding, retrieval, and generation
- Multi-format document parsing including scanned PDFs with OCR support
- Model catalog with 50+ pre-configured models for different tasks and hardware profiles
- Enterprise-friendly with support for air-gapped deployments and local-only operation

## Comparison with Similar Tools
- **LangChain** — General-purpose LLM orchestration; LLMWare focuses on RAG with built-in models and parsing
- **LlamaIndex** — Specialized in data indexing and retrieval; LLMWare bundles its own small models
- **Haystack** — Pipeline-based NLP framework; LLMWare emphasizes CPU-friendly small models
- **Unstructured** — Document parsing library; LLMWare integrates parsing with retrieval and inference
- **txtai** — Embeddings and RAG; LLMWare provides a broader enterprise pipeline abstraction

## FAQ
**Q: Do I need a GPU to run LLMWare?**
A: No. The SLIM model series and GGUF models are designed to run on CPU. GPU acceleration is optional.

**Q: What document formats does it support?**
A: PDF, DOCX, PPTX, XLSX, HTML, CSV, TXT, and JSON. Scanned PDFs are handled via integrated OCR.

**Q: Can I use external LLM APIs instead of local models?**
A: Yes. LLMWare supports OpenAI, Anthropic, Google, and other API providers alongside local models.

**Q: How does it compare to using LangChain with a vector store?**
A: LLMWare provides a more opinionated, integrated stack with built-in small models, reducing the need to assemble components separately.

## Sources
- https://github.com/llmware-ai/llmware
- https://llmware-ai.github.io/llmware/

---
Source: https://tokrepo.com/en/workflows/asset-cb2e817c
Author: AI Open Source