研究Apr 2, 2026·2 min read

PDFMathTranslate — Translate PDF Papers Preserving Format

Translate PDF scientific papers while preserving math formulas, charts, and layout. Supports Google, DeepL, OpenAI, Ollama. CLI, GUI, MCP, Docker, Zotero plugin.

TL;DR
Translate scientific PDFs while preserving math formulas, charts, and layout. Supports multiple translation backends including OpenAI.
§01

What it is

PDFMathTranslate is a tool that translates PDF scientific papers while preserving mathematical formulas, charts, tables, and page layout. Unlike generic PDF translators that break LaTeX equations and mangle figures, PDFMathTranslate detects and protects these elements during translation. It supports Google Translate, DeepL, OpenAI, and Ollama as translation backends.

The tool targets researchers, students, and engineers who read papers in foreign languages. It provides CLI, GUI, Docker, MCP server, and Zotero plugin interfaces, fitting into multiple research workflows.

§02

How it saves time or tokens

PDFMathTranslate preserves the original PDF layout, eliminating the need to manually reconstruct formulas and figures after translation. For researchers processing multiple papers per day, this saves hours of reformatting work. The Ollama backend option means translations can run locally without API costs, making it practical for high-volume academic reading.

§03

How to use

  1. Install via pip:
pip install pdf2zh
  1. Translate a PDF document:
# Translate entire document
pdf2zh input.pdf

# Translate specific pages with DeepL
pdf2zh input.pdf -p 1-10 -s deepl

# Use OpenAI for translation
pdf2zh input.pdf -s openai:gpt-4o

# Use local Ollama model
pdf2zh input.pdf -s ollama:llama3
  1. The output PDF preserves the original formatting with translated text.
§04

Example

Translate a Chinese machine learning paper to English:

# Translate with OpenAI, keeping math intact
pdf2zh chinese_paper.pdf -s openai:gpt-4o -t en

# Output: chinese_paper_translated.pdf
# - All LaTeX equations preserved as-is
# - Figures and tables in original positions
# - Text translated to English
# - Page layout matches original

The tool detects math regions using layout analysis, skips them during translation, and reassembles the final PDF.

§05

Related on TokRepo

§06

Common pitfalls

  • Scanned PDFs (image-based) require OCR preprocessing. PDFMathTranslate works with text-based PDFs. For scanned papers, run OCR first.
  • Complex multi-column layouts may occasionally misalign after translation if the translated text is significantly longer or shorter than the original.
  • API-based translation backends (Google, DeepL, OpenAI) incur costs per page. Use Ollama for free local translation when cost is a concern.
  • Always check the official documentation for the latest version-specific changes and migration guides before upgrading in production environments.
  • For team deployments, establish clear guidelines on configuration and usage patterns to ensure consistency across developers.

Frequently Asked Questions

What translation backends does PDFMathTranslate support?+

PDFMathTranslate supports Google Translate, DeepL, OpenAI (GPT-4o and others), and Ollama for local model translation. You select the backend with the -s flag when running translations.

Does it preserve LaTeX equations?+

Yes. PDFMathTranslate detects mathematical formulas and equations in the PDF and preserves them without modification during translation. This is its primary advantage over generic PDF translation tools.

Can I translate only specific pages?+

Yes. Use the -p flag to specify page ranges, for example -p 1-10 for the first ten pages. This is useful for translating only the sections you need rather than the entire document.

Is there a GUI interface?+

Yes. PDFMathTranslate provides a GUI mode in addition to the CLI. It also offers a Docker deployment option and a Zotero plugin for integration with academic reference management.

Does PDFMathTranslate work with any language pair?+

The supported language pairs depend on the translation backend you choose. Google Translate and DeepL support 100+ languages. OpenAI and Ollama models support major languages but may vary in quality for less common language pairs.

Citations (3)
🙏

Source & Thanks

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets