Esta página se muestra en inglés. Una traducción al español está en curso.
MCP ConfigsApr 2, 2026·2 min de lectura

MarkItDown — Convert Any Document to Markdown

Microsoft's Python tool to convert Office docs, PDFs, images, audio, and HTML to clean Markdown for LLM pipelines. Also available as MCP server.

Introducción

MarkItDown by Microsoft converts virtually any document format into clean Markdown — the lingua franca of LLMs. Feed it Word docs, PowerPoint decks, Excel spreadsheets, PDFs, images (with OCR/AI description), audio (with speech-to-text), HTML, CSV, JSON, XML, ZIP archives, and more. Out comes clean, structured Markdown ready for any AI pipeline.

With 93,000+ GitHub stars, it's become the standard tool for document-to-LLM preprocessing. The MCP server variant (markitdown-mcp) lets AI coding agents convert documents on the fly during conversations.

Supported Formats

Format Conversion Method
Word (.docx) Structure-preserving with headings, tables, lists
PowerPoint (.pptx) Slide-by-slide with speaker notes
Excel (.xlsx) Sheet-by-sheet as Markdown tables
PDF Text extraction with layout preservation
Images OCR + AI description (EXIF metadata included)
Audio (.mp3/.wav) Speech-to-text transcription
HTML Clean text extraction, tables preserved
CSV/JSON/XML Structured Markdown conversion
ZIP archives Recursive conversion of all contained files

LLM Integration via MCP

The markitdown-mcp server exposes a convert tool that AI agents can call to convert any file or URL to Markdown during a conversation. Works with Claude Code, Cursor, Windsurf, and any MCP-compatible client.

Advanced Usage

from markitdown import MarkItDown

# With LLM for image descriptions
md = MarkItDown(llm_client=openai_client, llm_model="gpt-4o")
result = md.convert("photo.jpg")
# → "A bar chart showing quarterly revenue growth..."

# Batch convert a directory
import glob
for f in glob.glob("docs/*.docx"):
    result = md.convert(f)
    open(f.replace(".docx", ".md"), "w").write(result.text_content)
🙏

Fuente y agradecimientos

  • GitHub: microsoft/markitdown — 93,000+ stars, MIT License
  • PyPI: markitdown (CLI/library), markitdown-mcp (MCP server)
  • By Microsoft

Discusión

Inicia sesión para unirte a la discusión.
Aún no hay comentarios. Sé el primero en compartir tus ideas.