# PixelRAG — Pixel-Native Search and Retrieval for AI Applications

> Open-source system that indexes and retrieves documents using visual pixel representations instead of text parsing, enabling scalable search over any document format.

## Install

Save as a script file and run:

# PixelRAG — Pixel-Native Search and Retrieval for AI Applications

## Quick Use
```bash
pip install pixelrag
pixelrag index --input ./documents --collection my-docs
pixelrag search "quarterly revenue breakdown" --collection my-docs
```

## Introduction
PixelRAG takes a fundamentally different approach to document retrieval. Instead of parsing text from documents (which loses layout, figures, and formatting), it indexes visual pixel representations directly. This enables accurate search and retrieval across PDFs, slides, scanned images, and any visual document format without fragile parsing pipelines.

## What PixelRAG Does
- Indexes documents as visual embeddings from rendered pixel representations
- Retrieves relevant document pages based on semantic visual similarity
- Handles any document format without format-specific parsers
- Preserves layout, table, and figure context that text extraction loses
- Provides retrieved pages as images ready for vision-language model consumption

## Architecture Overview
PixelRAG renders each document page as an image and passes it through a vision encoder to produce dense embeddings. These embeddings are stored in a vector index for fast similarity search. At query time, the text query is encoded with a matching text encoder, and the nearest document pages are retrieved. This bypasses the entire OCR and text extraction pipeline.

## Self-Hosting & Configuration
- Install via pip with Python 3.9+ and a CUDA-capable GPU
- Configure the vector store backend (built-in FAISS, or external Qdrant/Milvus)
- Set rendering resolution and page splitting options per collection
- Batch indexing supports parallel processing across multiple GPUs
- REST API server mode available for integration with RAG pipelines

## Key Features
- Pixel-native approach eliminates parsing errors and format-specific toolchains
- Layout-aware retrieval finds information in tables, charts, and figures
- Format-agnostic indexing handles PDFs, PPTX, images, and screenshots identically
- Scalable to millions of pages with approximate nearest neighbor search
- Direct integration with vision-language models for downstream Q&A

## Comparison with Similar Tools
- **RAGFlow** — text-based RAG with deep parsing; PixelRAG avoids parsing entirely
- **LlamaIndex** — framework for text-based retrieval pipelines
- **Docling** — document conversion to structured text before indexing
- **ColPali** — similar vision-based retrieval using late interaction scoring
- **Marker** — PDF-to-Markdown conversion focused on text fidelity

## FAQ
**Q: How does pixel-based search handle text-heavy documents?**
A: The vision encoder captures text content along with layout context, so text-heavy documents are searched effectively while preserving structure.

**Q: What is the indexing speed?**
A: On a single GPU, PixelRAG indexes approximately 50-100 pages per second depending on resolution settings.

**Q: Can I combine PixelRAG with text-based retrieval?**
A: Yes. Hybrid retrieval pipelines can merge PixelRAG visual results with traditional text search for higher recall.

**Q: Does it work with handwritten documents?**
A: The vision encoder can match handwritten content visually, though accuracy depends on the pre-trained model and handwriting legibility.

## Sources
- https://github.com/StarTrail-org/PixelRAG
- https://pixelrag.ai/

---
Source: https://tokrepo.com/en/workflows/asset-180bec39
Author: Script Depot