# Docling — Document Parsing for AI > IBM document parsing library. Converts PDFs, DOCX, PPTX, images, and HTML into structured markdown or JSON. Built for RAG pipelines and LLM ingestion. ## Install Copy the content below into your project: ## Quick Use ```bash pip install docling ``` ```python from docling.document_converter import DocumentConverter converter = DocumentConverter() result = converter.convert("report.pdf") print(result.document.export_to_markdown()) ``` --- ## Intro Docling is IBM's open-source document parsing library, designed for AI pipelines. It accurately converts PDFs (including scanned), Word docs, PowerPoints, images, and HTML into clean structured output — markdown, JSON, or document objects. **Best for**: RAG pipeline document ingestion, PDF parsing, enterprise document processing **Works with**: LangChain, LlamaIndex, any LLM pipeline --- ## Supported Formats - **PDF** — Text, tables, images, scanned documents (OCR) - **DOCX** — Microsoft Word documents - **PPTX** — PowerPoint presentations - **HTML** — Web pages - **Images** — PNG, JPG with OCR - **Markdown** — Passthrough with metadata ## Key Features - **Table extraction** — Accurate table parsing to structured data - **Layout analysis** — Understands headers, paragraphs, lists, captions - **OCR** — Built-in for scanned documents - **Chunking** — Hierarchical chunking that respects document structure - **LangChain integration** — `DoclingLoader` for direct pipeline use --- ### FAQ **Q: What is Docling?** A: IBM document parsing library. Converts PDFs, DOCX, PPTX, images, and HTML into structured markdown or JSON. Built for RAG pipelines and LLM ingestion. **Q: How do I install Docling?** A: Check the Quick Use section above for step-by-step installation instructions. Most assets can be set up in under 2 minutes. ## Source & Thanks > Created by [IBM](https://github.com/DS4SD). Licensed under MIT. > [DS4SD/docling](https://github.com/DS4SD/docling) — 15K+ GitHub stars --- Source: https://tokrepo.com/en/workflows/443e86c2-3811-496e-8e4d-6eef742ab219 Author: Script Depot