Cette page est affichée en anglais. Une traduction française est en cours.
MCP ConfigsApr 2, 2026·2 min de lecture

MarkItDown — Convert Any Document to Markdown

Microsoft's Python tool to convert Office docs, PDFs, images, audio, and HTML to clean Markdown for LLM pipelines. Also available as MCP server.

Introduction

MarkItDown by Microsoft converts virtually any document format into clean Markdown — the lingua franca of LLMs. Feed it Word docs, PowerPoint decks, Excel spreadsheets, PDFs, images (with OCR/AI description), audio (with speech-to-text), HTML, CSV, JSON, XML, ZIP archives, and more. Out comes clean, structured Markdown ready for any AI pipeline.

With 93,000+ GitHub stars, it's become the standard tool for document-to-LLM preprocessing. The MCP server variant (markitdown-mcp) lets AI coding agents convert documents on the fly during conversations.

Supported Formats

Format Conversion Method
Word (.docx) Structure-preserving with headings, tables, lists
PowerPoint (.pptx) Slide-by-slide with speaker notes
Excel (.xlsx) Sheet-by-sheet as Markdown tables
PDF Text extraction with layout preservation
Images OCR + AI description (EXIF metadata included)
Audio (.mp3/.wav) Speech-to-text transcription
HTML Clean text extraction, tables preserved
CSV/JSON/XML Structured Markdown conversion
ZIP archives Recursive conversion of all contained files

LLM Integration via MCP

The markitdown-mcp server exposes a convert tool that AI agents can call to convert any file or URL to Markdown during a conversation. Works with Claude Code, Cursor, Windsurf, and any MCP-compatible client.

Advanced Usage

from markitdown import MarkItDown

# With LLM for image descriptions
md = MarkItDown(llm_client=openai_client, llm_model="gpt-4o")
result = md.convert("photo.jpg")
# → "A bar chart showing quarterly revenue growth..."

# Batch convert a directory
import glob
for f in glob.glob("docs/*.docx"):
    result = md.convert(f)
    open(f.replace(".docx", ".md"), "w").write(result.text_content)
🙏

Source et remerciements

  • GitHub: microsoft/markitdown — 93,000+ stars, MIT License
  • PyPI: markitdown (CLI/library), markitdown-mcp (MCP server)
  • By Microsoft

Discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.