Supported Formats
| Format | Conversion Method |
|---|---|
| Word (.docx) | Structure-preserving with headings, tables, lists |
| PowerPoint (.pptx) | Slide-by-slide with speaker notes |
| Excel (.xlsx) | Sheet-by-sheet as Markdown tables |
| Text extraction with layout preservation | |
| Images | OCR + AI description (EXIF metadata included) |
| Audio (.mp3/.wav) | Speech-to-text transcription |
| HTML | Clean text extraction, tables preserved |
| CSV/JSON/XML | Structured Markdown conversion |
| ZIP archives | Recursive conversion of all contained files |
LLM Integration via MCP
The markitdown-mcp server exposes a convert tool that AI agents can call to convert any file or URL to Markdown during a conversation. Works with Claude Code, Cursor, Windsurf, and any MCP-compatible client.
Advanced Usage
from markitdown import MarkItDown
# With LLM for image descriptions
md = MarkItDown(llm_client=openai_client, llm_model="gpt-4o")
result = md.convert("photo.jpg")
# → "A bar chart showing quarterly revenue growth..."
# Batch convert a directory
import glob
for f in glob.glob("docs/*.docx"):
result = md.convert(f)
open(f.replace(".docx", ".md"), "w").write(result.text_content)