Cette page est affichée en anglais. Une traduction française est en cours.
ScriptsMay 1, 2026·3 min de lecture

Pandoc — Universal Document Format Converter

Pandoc is a universal document converter that reads and writes dozens of markup formats. It converts between Markdown, LaTeX, HTML, DOCX, EPUB, PDF, and many more with a single command.

Introduction

Pandoc is a command-line tool written in Haskell that converts documents between a wide range of markup and publishing formats. It handles Markdown, reStructuredText, LaTeX, HTML, DOCX, EPUB, PDF, and many others, making it indispensable for technical writers, academics, and documentation pipelines.

What Pandoc Does

  • Converts between 40+ input and output formats with a single binary
  • Parses extended Markdown with footnotes, tables, citations, and math
  • Generates PDF output via LaTeX, Groff, Typst, or wkhtmltopdf
  • Handles citation processing with built-in CSL support
  • Supports custom templates, filters, and Lua scripting for transformations

Architecture Overview

Pandoc reads source documents into an internal abstract syntax tree (AST) that represents the logical structure. Writers then serialize the AST to the target format. Filters (written in Lua or any language via JSON pipes) can transform the AST between reading and writing. This design decouples input parsing from output generation, so adding a new format requires only a new reader or writer.

Self-Hosting & Configuration

  • Install via package managers (apt, brew, choco) or download binaries from GitHub
  • Use --defaults YAML files to store commonly used conversion options
  • Set up custom LaTeX templates for consistent PDF styling across a team
  • Integrate into CI pipelines to auto-generate documentation from Markdown
  • Combine with pandoc-crossref for numbered figures, tables, and equations

Key Features

  • Broad format coverage spanning plain text, office documents, and e-books
  • Citation and bibliography support using BibTeX, BibLaTeX, or CSL JSON
  • Lua filter API for powerful document transformations without external tools
  • Template system for controlling the output structure of every format
  • Self-contained HTML output that embeds images and CSS in a single file

Comparison with Similar Tools

  • MarkItDown — converts files to Markdown only; Pandoc handles dozens of output formats
  • Docutils — reStructuredText focused; Pandoc supports many more input formats
  • LibreOffice CLI — strong with office formats but limited for markup languages
  • Asciidoctor — AsciiDoc ecosystem tool; Pandoc covers more format pairs
  • Typst — a modern typesetting tool; Pandoc can output to Typst as one of many targets

FAQ

Q: Can Pandoc produce high-quality PDFs? A: Yes. It generates PDFs via LaTeX by default, giving you full typographic control. You can also use Typst or wkhtmltopdf as PDF engines.

Q: Does Pandoc handle Microsoft Word files? A: Yes. It reads and writes DOCX natively, including styles, images, and tables.

Q: How do I add citations? A: Use --citeproc with a bibliography file (BibTeX, CSL JSON, or YAML) and cite keys in your Markdown.

Q: Is Pandoc fast enough for large documents? A: Pandoc handles books and theses well. For very large batch jobs, parallelizing across files is straightforward.

Sources

Fil de discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires