# Presidio — Detect and Anonymize PII > Detect and anonymize PII in text with Microsoft Presidio, then feed sanitized inputs to LLMs to reduce leakage risk. Works via pip or Docker deployments. ## Install Save the content below to `.claude/skills/` or append to your `CLAUDE.md`: # Presidio — Detect and Anonymize PII > Detect and anonymize PII in text with Microsoft Presidio, then feed sanitized inputs to LLMs to reduce leakage risk. Works via pip or Docker deployments. ## Quick Use 1. Install: ```bash pip install presidio_analyzer presidio_anonymizer ``` 2. Run: ```bash python -c "import presidio_analyzer, presidio_anonymizer; print('presidio ok')" ``` 3. Verify: - Run analyzer on a sample string with an email/phone number and confirm detections are anonymized or redacted. --- ## Intro Detect and anonymize PII in text with Microsoft Presidio, then feed sanitized inputs to LLMs to reduce leakage risk. Works via pip or Docker deployments. - **Best for:** LLM apps handling customer data that need PII de-identification before prompts, logs, or embeddings - **Works with:** Python, text pipelines, pre-processing for prompts/logging/indexing; optional Docker services - **Setup time:** 18 minutes ### Quantitative Notes - Setup time ~18 minutes (pip install + download one NLP model if needed) - GitHub stars + forks (verified): see Source & Thanks - Common pattern: sanitize inputs + sanitize outputs + sanitize logs (3 enforcement points) --- ## Practical Notes For production, treat PII sanitization as a policy: define what counts as PII for your domain, add allowlists for non-sensitive identifiers, and write regression tests with real-ish examples. Use Presidio as a pre-processor before prompts and embeddings, and consider sanitizing outputs as well when users paste secrets. **Safety note:** PII detection is probabilistic—combine rules, tests, and human review for high-stakes data flows. ### FAQ **Q: Why use it with LLMs?** A: It reduces the chance of leaking personal data to model providers, logs, or downstream tools. **Q: Is it only for text?** A: This repo focuses on PII anonymization tooling; follow the docs for supported modalities and deployments. **Q: Where should I integrate it?** A: Integrate in your request middleware and also sanitize transcripts before storage or embeddings. --- ## Source & Thanks > GitHub: https://github.com/microsoft/presidio > Owner avatar: https://avatars.githubusercontent.com/u/6154722?v=4 > License (SPDX): MIT > GitHub stars (verified via `api.github.com/repos/microsoft/presidio`): 8,019 > GitHub forks (verified via `api.github.com/repos/microsoft/presidio`): 1,041 --- # Presidio——PII 检测与匿名化工具包 > 用 Microsoft Presidio 在文本中检测并匿名化 PII,再把脱敏后的内容交给 LLM,降低数据泄露风险;同时支持 pip 安装与 Docker 部署,便于在生产链路稳定落地。 ## 快速使用 1. 安装: ```bash pip install presidio_analyzer presidio_anonymizer ``` 2. 运行: ```bash python -c "import presidio_analyzer, presidio_anonymizer; print('presidio ok')" ``` 3. 验证: - Run analyzer on a sample string with an email/phone number and confirm detections are anonymized or redacted. --- ## 简介 用 Microsoft Presidio 在文本中检测并匿名化 PII,再把脱敏后的内容交给 LLM,降低数据泄露风险;同时支持 pip 安装与 Docker 部署,便于在生产链路稳定落地。 - **适合谁(Best for):** 会处理客户数据的 LLM 应用,需要在 prompt/日志/向量化前做 PII 脱敏的团队 - **兼容工具(Works with):** Python、文本处理流水线、用于 prompt/日志/索引前的预处理;也可用 Docker 服务化 - **安装时间(Setup time):** 18 分钟 ### 量化信息 - 跑通约 18 分钟(pip 安装 + 按需下载一个 NLP 模型) - GitHub stars + forks(已核验):见「来源与感谢」 - 常见做法:输入脱敏 + 输出脱敏 + 日志脱敏(3 个强制点) --- ## 实战要点 生产落地要把 PII 脱敏当成“策略”:明确你领域里的 PII 范围,为非敏感标识符建立白名单,并用接近真实的数据写回归测试。把 Presidio 放在 prompt 与向量化之前做预处理;用户粘贴机密时,也建议对输出再做一次脱敏。 **安全提示:** PII 检测具有概率性;对高风险数据流需结合规则、测试与人工复核。 ### FAQ **Q: 为什么要和 LLM 一起用?** A: 可以降低个人信息泄露到模型供应商、日志或下游工具的概率。 **Q: 它只支持文本吗?** A: 仓库主要提供 PII 匿名化工具链;具体支持范围与部署方式以官方文档为准。 **Q: 应该集成在哪?** A: 建议在请求入口做中间件,并在落库/向量化前对对话记录再做一次脱敏。 --- ## 来源与感谢 > GitHub:https://github.com/microsoft/presidio > Owner avatar:https://avatars.githubusercontent.com/u/6154722?v=4 > 许可证(SPDX):MIT > GitHub stars(已通过 `api.github.com/repos/microsoft/presidio` 核验):8,019 > GitHub forks(已通过 `api.github.com/repos/microsoft/presidio` 核验):1,041 --- Source: https://tokrepo.com/en/workflows/presidio-detect-and-anonymize-pii Author: Script Depot