Configs2026年4月7日·1 分钟阅读

MarkItDown — Convert Any File to Markdown for LLMs

Python library by Microsoft that converts PDF, DOCX, PPTX, XLSX, images, audio, and HTML to clean Markdown. Perfect for feeding documents into LLM context windows. 8,000+ stars.

介绍

MarkItDown is a Microsoft Python library with 8,000+ GitHub stars. Converts PDF, DOCX, PPTX, XLSX, images, audio, and HTML to clean Markdown. Supports 10+ formats through a single API. Best for building RAG pipelines that ingest multiple document types.


Quick Start

pip install markitdown
markitdown report.pdf > report.md

Overview

MarkItDown is a Microsoft Python library with 8,000+ GitHub stars. Converts PDF, DOCX, PPTX, XLSX, images, audio, and HTML to clean Markdown. Supports 10+ formats through a single API. Best for building RAG pipelines that ingest multiple document types.


Source & Thanks

Created by Microsoft. Licensed under MIT.

markitdown — stars 8,000+

🙏

来源与感谢

Created by Microsoft. Licensed under MIT.

markitdown — stars 8,000+

讨论

登录后参与讨论。
还没有评论,来写第一条吧。

相关资产