PDFMathTranslate — Translate PDF Papers Preserving Format
Translate PDF scientific papers while preserving math formulas, charts, and layout. Supports Google, DeepL, OpenAI, Ollama. CLI, GUI, MCP, Docker, Zotero plugin.
Instalación con revisión previa
Este activo requiere revisión. El prompt copiado pide dry-run, muestra escrituras y continúa solo tras confirmación.
npx -y tokrepo@latest install 4c628f43-c803-45c8-ae39-a4caded80419 --target codexPrimero dry-run, confirma las escrituras y luego ejecuta este comando.
What it is
PDFMathTranslate is a tool that translates PDF scientific papers while preserving mathematical formulas, charts, tables, and page layout. Unlike generic PDF translators that break LaTeX equations and mangle figures, PDFMathTranslate detects and protects these elements during translation. It supports Google Translate, DeepL, OpenAI, and Ollama as translation backends.
The tool targets researchers, students, and engineers who read papers in foreign languages. It provides CLI, GUI, Docker, MCP server, and Zotero plugin interfaces, fitting into multiple research workflows.
How it saves time or tokens
PDFMathTranslate preserves the original PDF layout, eliminating the need to manually reconstruct formulas and figures after translation. For researchers processing multiple papers per day, this saves hours of reformatting work. The Ollama backend option means translations can run locally without API costs, making it practical for high-volume academic reading.
How to use
- Install via pip:
pip install pdf2zh
- Translate a PDF document:
# Translate entire document
pdf2zh input.pdf
# Translate specific pages with DeepL
pdf2zh input.pdf -p 1-10 -s deepl
# Use OpenAI for translation
pdf2zh input.pdf -s openai:gpt-4o
# Use local Ollama model
pdf2zh input.pdf -s ollama:llama3
- The output PDF preserves the original formatting with translated text.
Example
Translate a Chinese machine learning paper to English:
# Translate with OpenAI, keeping math intact
pdf2zh chinese_paper.pdf -s openai:gpt-4o -t en
# Output: chinese_paper_translated.pdf
# - All LaTeX equations preserved as-is
# - Figures and tables in original positions
# - Text translated to English
# - Page layout matches original
The tool detects math regions using layout analysis, skips them during translation, and reassembles the final PDF.
Related on TokRepo
- AI Tools for Research — Research tools for academic paper analysis and processing
- AI Tools for Documents — Document processing tools for PDF, translation, and extraction
Common pitfalls
- Scanned PDFs (image-based) require OCR preprocessing. PDFMathTranslate works with text-based PDFs. For scanned papers, run OCR first.
- Complex multi-column layouts may occasionally misalign after translation if the translated text is significantly longer or shorter than the original.
- API-based translation backends (Google, DeepL, OpenAI) incur costs per page. Use Ollama for free local translation when cost is a concern.
- Always check the official documentation for the latest version-specific changes and migration guides before upgrading in production environments.
- For team deployments, establish clear guidelines on configuration and usage patterns to ensure consistency across developers.
Preguntas frecuentes
PDFMathTranslate supports Google Translate, DeepL, OpenAI (GPT-4o and others), and Ollama for local model translation. You select the backend with the -s flag when running translations.
Yes. PDFMathTranslate detects mathematical formulas and equations in the PDF and preserves them without modification during translation. This is its primary advantage over generic PDF translation tools.
Yes. Use the -p flag to specify page ranges, for example -p 1-10 for the first ten pages. This is useful for translating only the sections you need rather than the entire document.
Yes. PDFMathTranslate provides a GUI mode in addition to the CLI. It also offers a Docker deployment option and a Zotero plugin for integration with academic reference management.
The supported language pairs depend on the translation backend you choose. Google Translate and DeepL support 100+ languages. OpenAI and Ollama models support major languages but may vary in quality for less common language pairs.
Referencias (3)
- PDFMathTranslate GitHub— PDFMathTranslate preserves math formulas during PDF translation
- PDFMathTranslate README— Multiple translation backend support including OpenAI and Ollama
- PDFMathTranslate Documentation— PDF layout analysis for academic papers
Relacionados en TokRepo
Fuente y agradecimientos
- GitHub: Byaidu/PDFMathTranslate — 32,600+ stars, AGPL-3.0 License
- PyPI:
pdf2zh - Featured at EMNLP 2025
Discusión
Activos relacionados
STORM — AI Research Report Generator by Stanford
Stanford's LLM-powered system that researches any topic and writes a full Wikipedia-style article with citations. Simulates multi-perspective expert conversations.
Stirling PDF — Self-Hosted PDF Editor & Toolkit
Stirling PDF is the #1 open-source PDF tool on GitHub. Merge, split, convert, compress, OCR, sign, and edit PDFs — all self-hosted with no data leaving your server.
Marker — Convert PDF to Markdown with High Accuracy
Fast, accurate PDF to Markdown + JSON converter. Handles tables, images, equations, code blocks, and multi-column layouts. GPU-accelerated. 33K+ GitHub stars.
Gotenberg — API-Driven Document Conversion and PDF Generation Server
Docker-powered API server for converting HTML, Markdown, Office documents, and URLs into PDFs using Chromium and LibreOffice.