Esta página se muestra en inglés. Una traducción al español está en curso.
SkillsMay 11, 2026·2 min de lectura

Presidio — Detect and Anonymize PII

Detect and anonymize PII in text with Microsoft Presidio, then feed sanitized inputs to LLMs to reduce leakage risk. Works via pip or Docker deployments.

Listo para agents

Este activo puede ser leído e instalado directamente por agents

TokRepo expone un comando CLI universal, contrato de instalación, metadata JSON, plan según adaptador y contenido raw para que los agents evalúen compatibilidad, riesgo y próximos pasos.

Native · 98/100Política: permitir
Superficie agent
Cualquier agent MCP/CLI
Tipo
Skill
Instalación
Single
Confianza
Confianza: Established
Entrada
Asset
Comando CLI universal
npx tokrepo install d4d3e9a3-9494-4b05-bf05-74368b2ff338
Introducción

Detect and anonymize PII in text with Microsoft Presidio, then feed sanitized inputs to LLMs to reduce leakage risk. Works via pip or Docker deployments.

  • Best for: LLM apps handling customer data that need PII de-identification before prompts, logs, or embeddings
  • Works with: Python, text pipelines, pre-processing for prompts/logging/indexing; optional Docker services
  • Setup time: 18 minutes

Quantitative Notes

  • Setup time ~18 minutes (pip install + download one NLP model if needed)
  • GitHub stars + forks (verified): see Source & Thanks
  • Common pattern: sanitize inputs + sanitize outputs + sanitize logs (3 enforcement points)

Practical Notes

For production, treat PII sanitization as a policy: define what counts as PII for your domain, add allowlists for non-sensitive identifiers, and write regression tests with real-ish examples. Use Presidio as a pre-processor before prompts and embeddings, and consider sanitizing outputs as well when users paste secrets.

Safety note: PII detection is probabilistic—combine rules, tests, and human review for high-stakes data flows.

FAQ

Q: Why use it with LLMs? A: It reduces the chance of leaking personal data to model providers, logs, or downstream tools.

Q: Is it only for text? A: This repo focuses on PII anonymization tooling; follow the docs for supported modalities and deployments.

Q: Where should I integrate it? A: Integrate in your request middleware and also sanitize transcripts before storage or embeddings.


🙏

Fuente y agradecimientos

GitHub: https://github.com/microsoft/presidio Owner avatar: https://avatars.githubusercontent.com/u/6154722?v=4 License (SPDX): MIT GitHub stars (verified via api.github.com/repos/microsoft/presidio): 8,019 GitHub forks (verified via api.github.com/repos/microsoft/presidio): 1,041

Discusión

Inicia sesión para unirte a la discusión.
Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados