Configs2026年7月2日·1 分钟阅读

LLMWare — Unified Framework for Enterprise RAG Pipelines

Build retrieval-augmented generation workflows with small specialized models, parsing, embeddings, and vector search in one framework.

Agent 就绪

Agent 可直接安装

这个资产可安装;Agent 先选择当前运行时、检查安装计划,再运行匹配命令。

Native · 98/100策略:允许
Agent 入口
任意 MCP/CLI Agent
类型
Skill
安装
Single
信任
信任等级:Established
入口
LLMWare Overview
直接安装命令
npx -y tokrepo@latest install cb2e817c-7657-11f1-9bc6-00163e2b0d79 --target codex

先 dry-run 确认安装计划,再运行此命令。

Introduction

LLMWare is an open-source Python framework for building enterprise RAG (retrieval-augmented generation) pipelines. It provides an integrated stack covering document parsing, embedding, vector storage, and inference using small, specialized models that can run locally without GPU requirements.

What LLMWare Does

  • Parses PDFs, Office documents, HTML, and text into structured chunks for retrieval
  • Generates embeddings and stores them in supported vector databases (Milvus, FAISS, Pinecone, Postgres/pgvector)
  • Ships a catalog of 50+ small specialized GGUF and ONNX models for targeted tasks
  • Runs function-calling models locally for summarization, extraction, classification, and Q&A
  • Provides a library abstraction that connects parsing, retrieval, and generation into cohesive pipelines

Architecture Overview

LLMWare organizes work around a Library object that ingests documents, chunks them, and stores metadata in a document store (MongoDB, SQLite, or Postgres). Embeddings are generated and pushed to a vector database for similarity search. At query time, retrieved context is passed to a model from the built-in catalog or an external API. The SLIM model series (small language models under 3B parameters) handle structured extraction tasks efficiently on CPU.

Self-Hosting & Configuration

  • Install with pip install llmware on Python 3.9+
  • Choose a document store backend: SQLite (default), MongoDB, or PostgreSQL
  • Select a vector database: FAISS (local default), Milvus, Qdrant, Pinecone, or pgvector
  • Download models on first use from Hugging Face Hub via the ModelCatalog
  • Configure API-based models (OpenAI, Anthropic, Google) via environment variables for hybrid deployments

Key Features

  • Small specialized models (SLIM series) that run on CPU without GPU infrastructure
  • End-to-end pipeline covering ingestion, parsing, embedding, retrieval, and generation
  • Multi-format document parsing including scanned PDFs with OCR support
  • Model catalog with 50+ pre-configured models for different tasks and hardware profiles
  • Enterprise-friendly with support for air-gapped deployments and local-only operation

Comparison with Similar Tools

  • LangChain — General-purpose LLM orchestration; LLMWare focuses on RAG with built-in models and parsing
  • LlamaIndex — Specialized in data indexing and retrieval; LLMWare bundles its own small models
  • Haystack — Pipeline-based NLP framework; LLMWare emphasizes CPU-friendly small models
  • Unstructured — Document parsing library; LLMWare integrates parsing with retrieval and inference
  • txtai — Embeddings and RAG; LLMWare provides a broader enterprise pipeline abstraction

FAQ

Q: Do I need a GPU to run LLMWare? A: No. The SLIM model series and GGUF models are designed to run on CPU. GPU acceleration is optional.

Q: What document formats does it support? A: PDF, DOCX, PPTX, XLSX, HTML, CSV, TXT, and JSON. Scanned PDFs are handled via integrated OCR.

Q: Can I use external LLM APIs instead of local models? A: Yes. LLMWare supports OpenAI, Anthropic, Google, and other API providers alongside local models.

Q: How does it compare to using LangChain with a vector store? A: LLMWare provides a more opinionated, integrated stack with built-in small models, reducing the need to assemble components separately.

Sources

讨论

登录后参与讨论。
还没有评论,来写第一条吧。

相关资产