Practical Notes
- Data point: Docker mode can enforce a read boundary via
MD_ALLOWED_PATHS. - Quant: mount only one directory first (e.g.
/data) to reduce accidental file exposure.
Pattern: convert → then summarize
When an agent reads arbitrary documents, keep the pipeline explicit:
- Convert to Markdown (normalize),
- store the Markdown (cache),
- summarize / chunk / index.
This avoids repeated parsing and makes outputs auditable.
Security note
If your agent has local file access, always restrict paths (allowlist) and run conversion in a container when possible.
FAQ
Q: Does it support PDFs?
A: Yes. The repo lists pdf-to-markdown and related tools.
Q: How do I restrict what it can read?
A: Set MD_ALLOWED_PATHS to an allowlist of directories.
Q: Should I run it in Docker? A: If you’re exposing local files, Docker + read-only mounts is a safer default.