What is Zerox — Zero-Shot PDF OCR for AI Pipelines?

Extract text from any PDF using vision models as OCR. Zerox converts PDF pages to images then uses GPT-4o or Claude to extract clean markdown without training.

Is Zerox — Zero-Shot PDF OCR for AI Pipelines free to use?

Yes. Zerox — Zero-Shot PDF OCR for AI Pipelines is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install Zerox — Zero-Shot PDF OCR for AI Pipelines?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

Zerox — Zero-Shot PDF OCR for AI Pipelines

What is Zerox?

Zerox is a zero-shot PDF OCR tool that uses vision language models instead of traditional OCR engines. It converts each PDF page into an image, sends it to a vision model (GPT-4o, Claude, Gemini), and extracts clean markdown text. No training, no templates, no configuration — it just works on any document layout.

Answer-Ready: Zerox is zero-shot PDF OCR using vision models. Converts PDF pages to images, extracts clean markdown via GPT-4o or Claude. No training or templates needed. Handles complex layouts, tables, and handwriting. 7k+ GitHub stars.

Best for: AI teams processing PDFs for RAG or data extraction. Works with: OpenAI GPT-4o, Anthropic Claude, Google Gemini. Setup time: Under 2 minutes.

Core Features

1. Multiple Model Support

# OpenAI
result = await zerox(file_path="doc.pdf", model="gpt-4o-mini")

# Anthropic Claude
result = await zerox(file_path="doc.pdf", model="claude-sonnet-4-20250514")

# Google Gemini
result = await zerox(file_path="doc.pdf", model="gemini/gemini-2.0-flash")

2. Page Selection

result = await zerox(
    file_path="long_report.pdf",
    model="gpt-4o-mini",
    select_pages=[1, 3, 5, 10],  # Only process specific pages
)

3. Node.js SDK

npm install zerox

const { zerox } = require("zerox");
const result = await zerox({
  filePath: "report.pdf",
  openaiAPIKey: process.env.OPENAI_API_KEY,
});

4. Custom Prompts

result = await zerox(
    file_path="invoice.pdf",
    model="gpt-4o-mini",
    custom_system_prompt="Extract all line items as a markdown table with columns: Item, Qty, Price, Total.",
)

Zerox vs Traditional OCR

Feature	Zerox	Tesseract	AWS Textract
Setup	pip install	System deps	AWS account
Complex layouts	Excellent	Poor	Good
Tables	Markdown tables	Raw text	JSON
Handwriting	Yes	Limited	Yes
Cost	Per API call	Free	Per page
Training needed	None	Sometimes	No

FAQ

Q: How much does it cost? A: Depends on the vision model. GPT-4o-mini is ~$0.01/page, Claude is similar. Self-hosted models are free.

Q: Can it handle scanned documents? A: Yes, that is its primary use case. Vision models can read scanned text, handwriting, and complex layouts.

Q: How does accuracy compare to Tesseract? A: Significantly better on complex layouts, tables, and mixed content. Tesseract may be better for simple, clean text.

Zerox — Zero-Shot PDF OCR for AI Pipelines

What is Zerox?

Core Features

1. Multiple Model Support

2. Page Selection

3. Node.js SDK

4. Custom Prompts

Zerox vs Traditional OCR

FAQ

Fuente y agradecimientos

Discusión

Activos relacionados

Unkey — Open-Source API Key Management Platform

Flagsmith — Open-Source Feature Flags and Remote Config

OpenStatus — Open-Source Monitoring and Status Page Platform