Cette page est affichée en anglais. Une traduction française est en cours.
SkillsMay 11, 2026·2 min de lecture

Presidio — Detect and Anonymize PII

Detect and anonymize PII in text with Microsoft Presidio, then feed sanitized inputs to LLMs to reduce leakage risk. Works via pip or Docker deployments.

Prêt pour agents

Cet actif peut être lu et installé directement par les agents

TokRepo expose une commande CLI universelle, un contrat d'installation, le metadata JSON, un plan selon l'adaptateur et le contenu raw pour aider les agents à juger l'adaptation, le risque et les prochaines actions.

Native · 98/100Policy : autoriser
Surface agent
Tout agent MCP/CLI
Type
Skill
Installation
Single
Confiance
Confiance : Established
Point d'entrée
Asset
Commande CLI universelle
npx tokrepo install d4d3e9a3-9494-4b05-bf05-74368b2ff338
Introduction

Detect and anonymize PII in text with Microsoft Presidio, then feed sanitized inputs to LLMs to reduce leakage risk. Works via pip or Docker deployments.

  • Best for: LLM apps handling customer data that need PII de-identification before prompts, logs, or embeddings
  • Works with: Python, text pipelines, pre-processing for prompts/logging/indexing; optional Docker services
  • Setup time: 18 minutes

Quantitative Notes

  • Setup time ~18 minutes (pip install + download one NLP model if needed)
  • GitHub stars + forks (verified): see Source & Thanks
  • Common pattern: sanitize inputs + sanitize outputs + sanitize logs (3 enforcement points)

Practical Notes

For production, treat PII sanitization as a policy: define what counts as PII for your domain, add allowlists for non-sensitive identifiers, and write regression tests with real-ish examples. Use Presidio as a pre-processor before prompts and embeddings, and consider sanitizing outputs as well when users paste secrets.

Safety note: PII detection is probabilistic—combine rules, tests, and human review for high-stakes data flows.

FAQ

Q: Why use it with LLMs? A: It reduces the chance of leaking personal data to model providers, logs, or downstream tools.

Q: Is it only for text? A: This repo focuses on PII anonymization tooling; follow the docs for supported modalities and deployments.

Q: Where should I integrate it? A: Integrate in your request middleware and also sanitize transcripts before storage or embeddings.


🙏

Source et remerciements

GitHub: https://github.com/microsoft/presidio Owner avatar: https://avatars.githubusercontent.com/u/6154722?v=4 License (SPDX): MIT GitHub stars (verified via api.github.com/repos/microsoft/presidio): 8,019 GitHub forks (verified via api.github.com/repos/microsoft/presidio): 1,041

Fil de discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires