Is Zerox — Zero-Shot PDF OCR for AI Pipelines free to use?

Yes. Zerox — Zero-Shot PDF OCR for AI Pipelines is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install Zerox — Zero-Shot PDF OCR for AI Pipelines?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

ScriptsApr 8, 2026·2 min read

Zerox — Zero-Shot PDF OCR for AI Pipelines

Extract text from any PDF using vision models as OCR. Zerox converts PDF pages to images then uses GPT-4o or Claude to extract clean markdown without training.

Script Depot · Community

TL;DR

Zerox converts PDF pages to images and uses vision LLMs to extract clean markdown text without any OCR training data.

§01

What it is

Zerox is a Python library that extracts text from PDFs by converting each page to an image and then using vision-capable LLMs (GPT-4o, Claude, etc.) as the OCR engine. Unlike traditional OCR tools that require trained models for specific fonts and layouts, Zerox leverages the visual understanding of large language models to read any document format without training.

Data engineers processing scanned documents, researchers extracting text from academic papers, and developers building document processing pipelines use Zerox when traditional OCR produces poor results on complex layouts, tables, or handwritten content.

§02

How it saves time or tokens

Traditional OCR pipelines require installing Tesseract, training custom models for specific document types, and writing post-processing logic to clean up OCR errors. Zerox replaces the entire pipeline with a single function call. Vision models handle complex layouts, tables, and multi-column documents that trip up conventional OCR. The output is clean markdown rather than raw text, reducing downstream parsing work.

§03

How to use

Install Zerox:

pip install py-zerox

Extract text from a PDF:

from pyzerox import zerox
import asyncio

async def main():
    result = await zerox(
        file_path='report.pdf',
        model='gpt-4o-mini',
    )
    for page in result.pages:
        print(page.content)

asyncio.run(main())

The output is clean markdown for each page, ready for further processing or LLM consumption.

§04

Example

from pyzerox import zerox
import asyncio

async def extract_with_claude():
    result = await zerox(
        file_path='financial_report.pdf',
        model='claude-3-5-sonnet-20241022',
        custom_system_prompt='Extract all text preserving table structure as markdown tables.',
    )
    # Each page returns clean markdown
    for i, page in enumerate(result.pages):
        print(f'--- Page {i+1} ---')
        print(page.content)
    
    # Save all pages to a single file
    with open('extracted.md', 'w') as f:
        for page in result.pages:
            f.write(page.content + '\n\n')

asyncio.run(extract_with_claude())

§05

Related on TokRepo

Document Processing Tools -- explore tools for PDF and document handling
AI Tools for Research -- discover tools for academic and data research workflows

§06

Common pitfalls

Vision model API calls cost more than traditional OCR. For large documents (100+ pages), estimate API costs before processing. GPT-4o-mini is cheaper but less accurate than GPT-4o on complex layouts.
Zerox converts each page to an image before sending to the model. High-resolution settings produce better results but increase API costs and processing time.
The library is async by default. Wrap calls in asyncio.run() for synchronous usage, or integrate into an existing async application.

Frequently Asked Questions

What models does Zerox support for OCR?+

Zerox supports any vision-capable LLM including GPT-4o, GPT-4o-mini, Claude 3.5 Sonnet, and other models that accept image inputs. You specify the model name in the function call, and Zerox handles the image conversion and API interaction.

How does Zerox compare to Tesseract OCR?+

Tesseract is a traditional OCR engine that works locally without API costs but struggles with complex layouts, tables, and handwritten text. Zerox uses vision LLMs that handle these cases much better but requires API calls with associated costs. Zerox produces markdown output while Tesseract outputs raw text.

Can I customize the extraction prompt?+

Yes. Zerox accepts a custom_system_prompt parameter that lets you instruct the vision model on how to handle the extraction. For example, you can ask it to preserve table structures as markdown tables or extract only specific sections of each page.

How much does it cost to process a document with Zerox?+

Cost depends on the model and page count. Each page is sent as an image to the vision model API. GPT-4o-mini costs roughly $0.01-0.02 per page, while GPT-4o costs more. For a 50-page document, expect $0.50-1.00 with GPT-4o-mini.

Does Zerox work with scanned documents and handwriting?+

Yes. Because Zerox uses vision models that understand images, it handles scanned documents, photographs of text, and handwritten content. The accuracy depends on the vision model capabilities and image quality. Results are generally better than traditional OCR for these difficult cases.

Citations (3)

Zerox GitHub— Vision model-based PDF OCR without training data
OpenAI GPT-4o— GPT-4o vision capabilities for document understanding
Anthropic Claude Vision— Claude vision model for image understanding

Related on TokRepo

Document tools Research tools AI coding tools

🙏

Source & Thanks

Created by getomni-ai. Licensed under MIT.

getomni-ai/zerox — 7k+ stars

Discussion

No comments yet. Be the first to share your thoughts.

Zerox — Zero-Shot PDF OCR for AI Pipelines

What it is

How it saves time or tokens

How to use

Example

Related on TokRepo

Common pitfalls

Frequently Asked Questions

Citations (3)

Related on TokRepo

Source & Thanks

Discussion

Related Assets

NAPI-RS — Build Node.js Native Addons in Rust

Mamba — Fast Cross-Platform Package Manager

Plasmo — The Browser Extension Framework