Is Docling — Document Parsing for AI free to use?

Yes. Docling — Document Parsing for AI is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install Docling — Document Parsing for AI?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

Scripts2026年3月29日·1 分钟阅读

Docling — Document Parsing for AI

Name: Docling — Document Parsing for AI
Author: TokRepo精选

IBM document parsing library. Converts PDFs, DOCX, PPTX, images, and HTML into structured markdown or JSON. Built for RAG pipelines and LLM ingestion.

TokRepo精选 · Community

快速使用

先拿来用，再决定要不要深挖

这里应该同时让用户和 Agent 知道第一步该复制什么、安装什么、落到哪里。

pip install docling

from docling.document_converter import DocumentConverter

converter = DocumentConverter()
result = converter.convert("report.pdf")
print(result.document.export_to_markdown())

介绍

Docling is IBM's open-source document parsing library, designed for AI pipelines. It accurately converts PDFs (including scanned), Word docs, PowerPoints, images, and HTML into clean structured output — markdown, JSON, or document objects.

Best for: RAG pipeline document ingestion, PDF parsing, enterprise document processing Works with: LangChain, LlamaIndex, any LLM pipeline

Supported Formats

PDF — Text, tables, images, scanned documents (OCR)
DOCX — Microsoft Word documents
PPTX — PowerPoint presentations
HTML — Web pages
Images — PNG, JPG with OCR
Markdown — Passthrough with metadata

Key Features

Table extraction — Accurate table parsing to structured data
Layout analysis — Understands headers, paragraphs, lists, captions
OCR — Built-in for scanned documents
Chunking — Hierarchical chunking that respects document structure
LangChain integration — DoclingLoader for direct pipeline use

🙏

来源与感谢

Created by IBM. Licensed under MIT. DS4SD/docling — 15K+ GitHub stars

Docling — Document Parsing for AI

先拿来用，再决定要不要深挖

Supported Formats

Key Features

来源与感谢

相关资产

Ruff — Ultra-Fast Python Linter & Formatter

UV — Ultra-Fast Python Package Manager

Firecrawl — Web Scraping API for LLMs