MCP Configs2026年4月2日·1 分钟阅读
Unstructured — Document ETL for LLM Pipelines
Extract clean data from PDFs, DOCX, HTML, images, and emails for RAG and LLM ingestion. 14K+ GitHub stars.
TO
TokRepo精选 · Community
快速使用
先拿来用,再决定要不要深挖
这里应该同时让用户和 Agent 知道第一步该复制什么、安装什么、落到哪里。
```bash
pip install "unstructured[pdf,docx]"
```
```python
from unstructured.partition.auto import partition
elements = partition(filename="report.pdf")
for e in elements:
print(f"{type(e).__name__}: {str(e)[:100]}")
```
---
🙏
来源与感谢
> Created by [Unstructured-IO](https://github.com/Unstructured-IO). Licensed under Apache-2.0.
>
> [unstructured](https://github.com/Unstructured-IO/unstructured) — ⭐ 14,400+
讨论
登录后参与讨论。
还没有评论,来写第一条吧。
相关资产
OpenLIT — OpenTelemetry LLM Observability
Monitor LLM costs, latency, and quality with OpenTelemetry-native tracing. GPU monitoring and guardrails built in. 2.3K+ stars.
TokRepo精选
Agenta — Open-Source LLMOps Platform
Prompt playground, evaluation, and observability in one platform. Compare prompts, run evals, trace production calls. 4K+ stars.
TokRepo精选
Rerun — Visualize Multimodal AI Data in Real-Time
SDK for logging, storing, and visualizing 3D, images, time series, and text in real-time. Built for robotics and AI. 10K+ stars.
TokRepo精选