Main
Make the defense boundary explicit: treat tool results as untrusted input and gate them before they enter model context.
Start conservative: block high-risk results, then whitelist/override per-tool fields once you understand false positives.
Log evidence: store
riskLevel,tier2Score, and matched detections so you can tune safely over time.
Source-backed notes
- README states the ONNX model (~22MB) is bundled — no extra downloads required.
- README describes a two-tier pipeline (pattern detection + ML classifier) and mentions ~10ms/sample after warmup.
- README positions it for MCP/CLI/tool-call agents to sanitize tool results (emails, documents, PRs) before LLM use.
FAQ
- Does this replace secure prompting?: No — it’s an extra guardrail; still keep strong system prompts and tool permissioning.
- Will it slow down my agent?: README cites ~10ms/sample after warmup; measure on your workload and cache where possible.
- Where should I apply it?: At the boundary: right after receiving tool output and before adding it to model context.