MCP Configs2026年4月2日·1 分钟阅读

MarkItDown — Convert Any Document to Markdown

Microsoft's Python tool to convert Office docs, PDFs, images, audio, and HTML to clean Markdown for LLM pipelines. Also available as MCP server.

介绍

MarkItDown by Microsoft converts virtually any document format into clean Markdown — the lingua franca of LLMs. Feed it Word docs, PowerPoint decks, Excel spreadsheets, PDFs, images (with OCR/AI description), audio (with speech-to-text), HTML, CSV, JSON, XML, ZIP archives, and more. Out comes clean, structured Markdown ready for any AI pipeline.

With 93,000+ GitHub stars, it's become the standard tool for document-to-LLM preprocessing. The MCP server variant (markitdown-mcp) lets AI coding agents convert documents on the fly during conversations.

Supported Formats

Format Conversion Method
Word (.docx) Structure-preserving with headings, tables, lists
PowerPoint (.pptx) Slide-by-slide with speaker notes
Excel (.xlsx) Sheet-by-sheet as Markdown tables
PDF Text extraction with layout preservation
Images OCR + AI description (EXIF metadata included)
Audio (.mp3/.wav) Speech-to-text transcription
HTML Clean text extraction, tables preserved
CSV/JSON/XML Structured Markdown conversion
ZIP archives Recursive conversion of all contained files

LLM Integration via MCP

The markitdown-mcp server exposes a convert tool that AI agents can call to convert any file or URL to Markdown during a conversation. Works with Claude Code, Cursor, Windsurf, and any MCP-compatible client.

Advanced Usage

from markitdown import MarkItDown

# With LLM for image descriptions
md = MarkItDown(llm_client=openai_client, llm_model="gpt-4o")
result = md.convert("photo.jpg")
# → "A bar chart showing quarterly revenue growth..."

# Batch convert a directory
import glob
for f in glob.glob("docs/*.docx"):
    result = md.convert(f)
    open(f.replace(".docx", ".md"), "w").write(result.text_content)
🙏

来源与感谢

  • GitHub: microsoft/markitdown — 93,000+ stars, MIT License
  • PyPI: markitdown (CLI/library), markitdown-mcp (MCP server)
  • By Microsoft

讨论

登录后参与讨论。
还没有评论,来写第一条吧。