PDFMathTranslate — Translate PDF Papers Preserving Format
Translate PDF scientific papers while preserving math formulas, charts, and layout. Supports Google, DeepL, OpenAI, Ollama. CLI, GUI, MCP, Docker, Zotero plugin.
What it is
PDFMathTranslate is a tool that translates PDF scientific papers while preserving mathematical formulas, charts, tables, and page layout. Unlike generic PDF translators that break LaTeX equations and mangle figures, PDFMathTranslate detects and protects these elements during translation. It supports Google Translate, DeepL, OpenAI, and Ollama as translation backends.
The tool targets researchers, students, and engineers who read papers in foreign languages. It provides CLI, GUI, Docker, MCP server, and Zotero plugin interfaces, fitting into multiple research workflows.
How it saves time or tokens
PDFMathTranslate preserves the original PDF layout, eliminating the need to manually reconstruct formulas and figures after translation. For researchers processing multiple papers per day, this saves hours of reformatting work. The Ollama backend option means translations can run locally without API costs, making it practical for high-volume academic reading.
How to use
- Install via pip:
pip install pdf2zh
- Translate a PDF document:
# Translate entire document
pdf2zh input.pdf
# Translate specific pages with DeepL
pdf2zh input.pdf -p 1-10 -s deepl
# Use OpenAI for translation
pdf2zh input.pdf -s openai:gpt-4o
# Use local Ollama model
pdf2zh input.pdf -s ollama:llama3
- The output PDF preserves the original formatting with translated text.
Example
Translate a Chinese machine learning paper to English:
# Translate with OpenAI, keeping math intact
pdf2zh chinese_paper.pdf -s openai:gpt-4o -t en
# Output: chinese_paper_translated.pdf
# - All LaTeX equations preserved as-is
# - Figures and tables in original positions
# - Text translated to English
# - Page layout matches original
The tool detects math regions using layout analysis, skips them during translation, and reassembles the final PDF.
Related on TokRepo
- AI Tools for Research — Research tools for academic paper analysis and processing
- AI Tools for Documents — Document processing tools for PDF, translation, and extraction
Common pitfalls
- Scanned PDFs (image-based) require OCR preprocessing. PDFMathTranslate works with text-based PDFs. For scanned papers, run OCR first.
- Complex multi-column layouts may occasionally misalign after translation if the translated text is significantly longer or shorter than the original.
- API-based translation backends (Google, DeepL, OpenAI) incur costs per page. Use Ollama for free local translation when cost is a concern.
- Always check the official documentation for the latest version-specific changes and migration guides before upgrading in production environments.
- For team deployments, establish clear guidelines on configuration and usage patterns to ensure consistency across developers.
Frequently Asked Questions
PDFMathTranslate supports Google Translate, DeepL, OpenAI (GPT-4o and others), and Ollama for local model translation. You select the backend with the -s flag when running translations.
Yes. PDFMathTranslate detects mathematical formulas and equations in the PDF and preserves them without modification during translation. This is its primary advantage over generic PDF translation tools.
Yes. Use the -p flag to specify page ranges, for example -p 1-10 for the first ten pages. This is useful for translating only the sections you need rather than the entire document.
Yes. PDFMathTranslate provides a GUI mode in addition to the CLI. It also offers a Docker deployment option and a Zotero plugin for integration with academic reference management.
The supported language pairs depend on the translation backend you choose. Google Translate and DeepL support 100+ languages. OpenAI and Ollama models support major languages but may vary in quality for less common language pairs.
Citations (3)
- PDFMathTranslate GitHub— PDFMathTranslate preserves math formulas during PDF translation
- PDFMathTranslate README— Multiple translation backend support including OpenAI and Ollama
- PDFMathTranslate Documentation— PDF layout analysis for academic papers
Related on TokRepo
Source & Thanks
- GitHub: Byaidu/PDFMathTranslate — 32,600+ stars, AGPL-3.0 License
- PyPI:
pdf2zh - Featured at EMNLP 2025
Discussion
Related Assets
Moodle — Open-Source Learning Management System
The most widely used open-source learning platform, providing course management, assessments, and collaboration tools for educators and organizations worldwide.
Sylius — Headless E-Commerce Framework on Symfony
An open-source headless e-commerce platform built on Symfony and API Platform, designed for developers who need a customizable and API-first commerce solution.
Akaunting — Free Self-Hosted Accounting Software
A free, open-source online accounting application built on Laravel for small businesses and freelancers to manage invoices, expenses, and financial reports.