How do I install LLMWare — Unified Framework for Enterprise RAG Pipelines?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

LLMWare — Unified Framework for Enterprise RAG Pipelines

Introduction

LLMWare is an open-source Python framework for building enterprise RAG (retrieval-augmented generation) pipelines. It provides an integrated stack covering document parsing, embedding, vector storage, and inference using small, specialized models that can run locally without GPU requirements.

What LLMWare Does

Parses PDFs, Office documents, HTML, and text into structured chunks for retrieval
Generates embeddings and stores them in supported vector databases (Milvus, FAISS, Pinecone, Postgres/pgvector)
Ships a catalog of 50+ small specialized GGUF and ONNX models for targeted tasks
Runs function-calling models locally for summarization, extraction, classification, and Q&A
Provides a library abstraction that connects parsing, retrieval, and generation into cohesive pipelines

Architecture Overview

LLMWare organizes work around a Library object that ingests documents, chunks them, and stores metadata in a document store (MongoDB, SQLite, or Postgres). Embeddings are generated and pushed to a vector database for similarity search. At query time, retrieved context is passed to a model from the built-in catalog or an external API. The SLIM model series (small language models under 3B parameters) handle structured extraction tasks efficiently on CPU.

Self-Hosting & Configuration

Install with pip install llmware on Python 3.9+
Choose a document store backend: SQLite (default), MongoDB, or PostgreSQL
Select a vector database: FAISS (local default), Milvus, Qdrant, Pinecone, or pgvector
Download models on first use from Hugging Face Hub via the ModelCatalog
Configure API-based models (OpenAI, Anthropic, Google) via environment variables for hybrid deployments

Key Features

Small specialized models (SLIM series) that run on CPU without GPU infrastructure
End-to-end pipeline covering ingestion, parsing, embedding, retrieval, and generation
Multi-format document parsing including scanned PDFs with OCR support
Model catalog with 50+ pre-configured models for different tasks and hardware profiles
Enterprise-friendly with support for air-gapped deployments and local-only operation

Comparison with Similar Tools

LangChain — General-purpose LLM orchestration; LLMWare focuses on RAG with built-in models and parsing
LlamaIndex — Specialized in data indexing and retrieval; LLMWare bundles its own small models
Haystack — Pipeline-based NLP framework; LLMWare emphasizes CPU-friendly small models
Unstructured — Document parsing library; LLMWare integrates parsing with retrieval and inference
txtai — Embeddings and RAG; LLMWare provides a broader enterprise pipeline abstraction

FAQ

Q: Do I need a GPU to run LLMWare? A: No. The SLIM model series and GGUF models are designed to run on CPU. GPU acceleration is optional.

Q: What document formats does it support? A: PDF, DOCX, PPTX, XLSX, HTML, CSV, TXT, and JSON. Scanned PDFs are handled via integrated OCR.

Q: Can I use external LLM APIs instead of local models? A: Yes. LLMWare supports OpenAI, Anthropic, Google, and other API providers alongside local models.

Q: How does it compare to using LangChain with a vector store? A: LLMWare provides a more opinionated, integrated stack with built-in small models, reducing the need to assemble components separately.

LLMWare — Unified Framework for Enterprise RAG Pipelines

Ready-to-run agent install

Introduction

What LLMWare Does

Architecture Overview

Self-Hosting & Configuration

Key Features

Comparison with Similar Tools

FAQ

Sources

Discussion

Related Assets

Xberg — Polyglot Document Intelligence Framework in Rust

LM Evaluation Harness — Unified LLM Benchmarking Framework

One API — Unified LLM Gateway for OpenAI, Claude, and Gemini

Candle — Minimalist Machine Learning Framework for Rust