ConfigsJul 2, 2026·3 min read

LLMWare — Unified Framework for Enterprise RAG Pipelines

Build retrieval-augmented generation workflows with small specialized models, parsing, embeddings, and vector search in one framework.

Agent ready

Ready-to-run agent install

This asset can be installed after the agent chooses its runtime, checks the plan, and runs the matching command.

Native · 98/100Policy: allow
Agent surface
Any MCP/CLI agent
Kind
Skill
Install
Single
Trust
Trust: Established
Entrypoint
LLMWare Overview
Direct install command
npx -y tokrepo@latest install cb2e817c-7657-11f1-9bc6-00163e2b0d79 --target codex

Run after dry-run confirms the install plan.

Introduction

LLMWare is an open-source Python framework for building enterprise RAG (retrieval-augmented generation) pipelines. It provides an integrated stack covering document parsing, embedding, vector storage, and inference using small, specialized models that can run locally without GPU requirements.

What LLMWare Does

  • Parses PDFs, Office documents, HTML, and text into structured chunks for retrieval
  • Generates embeddings and stores them in supported vector databases (Milvus, FAISS, Pinecone, Postgres/pgvector)
  • Ships a catalog of 50+ small specialized GGUF and ONNX models for targeted tasks
  • Runs function-calling models locally for summarization, extraction, classification, and Q&A
  • Provides a library abstraction that connects parsing, retrieval, and generation into cohesive pipelines

Architecture Overview

LLMWare organizes work around a Library object that ingests documents, chunks them, and stores metadata in a document store (MongoDB, SQLite, or Postgres). Embeddings are generated and pushed to a vector database for similarity search. At query time, retrieved context is passed to a model from the built-in catalog or an external API. The SLIM model series (small language models under 3B parameters) handle structured extraction tasks efficiently on CPU.

Self-Hosting & Configuration

  • Install with pip install llmware on Python 3.9+
  • Choose a document store backend: SQLite (default), MongoDB, or PostgreSQL
  • Select a vector database: FAISS (local default), Milvus, Qdrant, Pinecone, or pgvector
  • Download models on first use from Hugging Face Hub via the ModelCatalog
  • Configure API-based models (OpenAI, Anthropic, Google) via environment variables for hybrid deployments

Key Features

  • Small specialized models (SLIM series) that run on CPU without GPU infrastructure
  • End-to-end pipeline covering ingestion, parsing, embedding, retrieval, and generation
  • Multi-format document parsing including scanned PDFs with OCR support
  • Model catalog with 50+ pre-configured models for different tasks and hardware profiles
  • Enterprise-friendly with support for air-gapped deployments and local-only operation

Comparison with Similar Tools

  • LangChain — General-purpose LLM orchestration; LLMWare focuses on RAG with built-in models and parsing
  • LlamaIndex — Specialized in data indexing and retrieval; LLMWare bundles its own small models
  • Haystack — Pipeline-based NLP framework; LLMWare emphasizes CPU-friendly small models
  • Unstructured — Document parsing library; LLMWare integrates parsing with retrieval and inference
  • txtai — Embeddings and RAG; LLMWare provides a broader enterprise pipeline abstraction

FAQ

Q: Do I need a GPU to run LLMWare? A: No. The SLIM model series and GGUF models are designed to run on CPU. GPU acceleration is optional.

Q: What document formats does it support? A: PDF, DOCX, PPTX, XLSX, HTML, CSV, TXT, and JSON. Scanned PDFs are handled via integrated OCR.

Q: Can I use external LLM APIs instead of local models? A: Yes. LLMWare supports OpenAI, Anthropic, Google, and other API providers alongside local models.

Q: How does it compare to using LangChain with a vector store? A: LLMWare provides a more opinionated, integrated stack with built-in small models, reducing the need to assemble components separately.

Sources

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets