# Zerox — Zero-Shot PDF OCR for AI Pipelines

> Extract text from any PDF using vision models as OCR. Zerox converts PDF pages to images then uses GPT-4o or Claude to extract clean markdown without training.

## Install

Save as a script file and run:

## Quick Use

```bash
pip install py-zerox
```

```python
from pyzerox import zerox
import asyncio

async def main():
    result = await zerox(
        file_path="report.pdf",
        model="gpt-4o-mini",
    )
    for page in result.pages:
        print(page.content)

asyncio.run(main())
```

## What is Zerox?

Zerox is a zero-shot PDF OCR tool that uses vision language models instead of traditional OCR engines. It converts each PDF page into an image, sends it to a vision model (GPT-4o, Claude, Gemini), and extracts clean markdown text. No training, no templates, no configuration — it just works on any document layout.

**Answer-Ready**: Zerox is zero-shot PDF OCR using vision models. Converts PDF pages to images, extracts clean markdown via GPT-4o or Claude. No training or templates needed. Handles complex layouts, tables, and handwriting. 7k+ GitHub stars.

**Best for**: AI teams processing PDFs for RAG or data extraction. **Works with**: OpenAI GPT-4o, Anthropic Claude, Google Gemini. **Setup time**: Under 2 minutes.

## Core Features

### 1. Multiple Model Support

```python
# OpenAI
result = await zerox(file_path="doc.pdf", model="gpt-4o-mini")

# Anthropic Claude
result = await zerox(file_path="doc.pdf", model="claude-sonnet-4-20250514")

# Google Gemini
result = await zerox(file_path="doc.pdf", model="gemini/gemini-2.0-flash")
```

### 2. Page Selection

```python
result = await zerox(
    file_path="long_report.pdf",
    model="gpt-4o-mini",
    select_pages=[1, 3, 5, 10],  # Only process specific pages
)
```

### 3. Node.js SDK

```bash
npm install zerox
```

```javascript
const { zerox } = require("zerox");
const result = await zerox({
  filePath: "report.pdf",
  openaiAPIKey: process.env.OPENAI_API_KEY,
});
```

### 4. Custom Prompts

```python
result = await zerox(
    file_path="invoice.pdf",
    model="gpt-4o-mini",
    custom_system_prompt="Extract all line items as a markdown table with columns: Item, Qty, Price, Total.",
)
```

## Zerox vs Traditional OCR

| Feature | Zerox | Tesseract | AWS Textract |
|---------|-------|-----------|--------------|
| Setup | pip install | System deps | AWS account |
| Complex layouts | Excellent | Poor | Good |
| Tables | Markdown tables | Raw text | JSON |
| Handwriting | Yes | Limited | Yes |
| Cost | Per API call | Free | Per page |
| Training needed | None | Sometimes | No |

## FAQ

**Q: How much does it cost?**
A: Depends on the vision model. GPT-4o-mini is ~$0.01/page, Claude is similar. Self-hosted models are free.

**Q: Can it handle scanned documents?**
A: Yes, that is its primary use case. Vision models can read scanned text, handwriting, and complex layouts.

**Q: How does accuracy compare to Tesseract?**
A: Significantly better on complex layouts, tables, and mixed content. Tesseract may be better for simple, clean text.

## Source & Thanks

> Created by [getomni-ai](https://github.com/getomni-ai). Licensed under MIT.
>
> [getomni-ai/zerox](https://github.com/getomni-ai/zerox) — 7k+ stars

<!-- ZH -->

## 快速使用

```bash
pip install py-zerox
```

用视觉模型做 OCR，零配置提取 PDF 文本。

## 什么是 Zerox？

Zerox 是零样本 PDF OCR 工具，用视觉语言模型替代传统 OCR。将 PDF 页面转为图片，通过 GPT-4o 或 Claude 提取干净的 Markdown。

**一句话总结**：视觉模型 OCR，PDF 转 Markdown，支持 GPT-4o/Claude/Gemini，处理复杂版式和表格，7k+ stars。

**适合人群**：需要处理 PDF 的 RAG 和数据提取团队。

## 核心功能

### 1. 多模型支持
GPT-4o、Claude、Gemini 均可。

### 2. 页面选择
指定处理特定页面。

### 3. 自定义提示词
定制提取格式。

## 常见问题

**Q: 多少钱？**
A: GPT-4o-mini 约 $0.01/页。

**Q: 能处理扫描件？**
A: 这是主要场景，视觉模型可读取扫描文字和手写。

## 来源与致谢

> [getomni-ai/zerox](https://github.com/getomni-ai/zerox) — 7k+ stars, MIT

---
Source: https://tokrepo.com/en/workflows/3ac555d9-d75c-4208-ba46-974e4a717234
Author: Script Depot