Esta página se muestra en inglés. Una traducción al español está en curso.
WorkflowsMay 14, 2026·3 min de lectura

CKA-Agent — Trojan Knowledge Attack Agent (Research)

Research code for studying trojan-knowledge attacks in agent systems, with reproducible scripts and configs; verified 203★, pushed 2026-05-13.

Listo para agents

Este activo puede ser leído e instalado directamente por agents

TokRepo expone un comando CLI universal, contrato de instalación, metadata JSON, plan según adaptador y contenido raw para que los agents evalúen compatibilidad, riesgo y próximos pasos.

Needs Confirmation · 62/100Política: confirmar
Superficie agent
Cualquier agent MCP/CLI
Tipo
Workflow
Instalación
Uv|Pip
Confianza
Confianza: Established
Entrada
uv pip install ...
Comando CLI universal
npx tokrepo install 5f0a5eff-2382-5179-be0c-622add1557e4
Introducción

Research code for studying trojan-knowledge attacks in agent systems, with reproducible scripts and configs; verified 203★, pushed 2026-05-13.

Best for: Researchers and security-minded agent builders evaluating knowledge poisoning risks

Works with: Python tooling; README includes uv pip installs and experiment dependencies

Setup time: 20-45 minutes

Key facts (verified)

  • GitHub: 203 stars · 45 forks · pushed 2026-05-13.
  • License: AGPL-3.0 · owner avatar + repo URL verified via GitHub API.
  • README-backed entrypoint: uv pip install ....

Main

  • Treat it as a security lab: run experiments in an isolated environment and record the exact dependency set used.

  • Use it to build test cases: trojan knowledge scenarios can become unit/regression tests for your retrieval + tool pipeline.

  • Map the attack surface: separate poisoning in static docs vs retrieval corpora vs tool outputs so mitigations are targeted.

  • Export results as artifacts: logs, prompts, and configs are as important as code when reproducing agent-security claims.

README (excerpt)

[ICML 2026] CKA-Agent: Bypassing LLM Guardrails via Harmless Prompt Weaving and Adaptive Tree Search

arXiv Website GitHub code Cite Python

🛡️ Defense towards CKA

TurnGate is a response-aware defense mechanism designed to detect and mitigate hidden malicious intent in multi-turn dialogue systems. Defending state-of-the-art multi-turn malicious attacks like CKA-Agent, achieving great defense performance while avoiding overrefusal.

🔥 Latest Results on Frontier Models (Dec 2025)

CKA-Agent demonstrates consistent high attack success rates against the latest frontier models, including GPT-5.2, Gemini-3.0-Pro, and Claude-Haiku-4.5. The results are summarized below:

Source-backed notes

  • README shows uv pip install commands for installing experiment dependencies.
  • Repo is AGPL-3.0 licensed (verified via GitHub API).
  • The repository positions itself as a reproducible implementation for agent-security research (per README wording).

FAQ

  • Is this meant for production use?: It’s primarily research code; use it to evaluate and harden your own systems.
  • How do I install dependencies?: Follow the README uv pip install ... instructions and keep versions pinned for reproducibility.
  • What license applies?: AGPL-3.0 (verified via GitHub license metadata).
Model HarmBench StrongREJECT
FS ↑ PS ↑ V ↓ R ↓ FS ↑ PS ↑ V ↓ R ↓
🟢 GPT-5.2 0.889 0.079 0.024 0.008 0.932 0.056 0.006 0.006
🟣 Gemini-3.0-Pro 0.881 0.087
🙏

Fuente y agradecimientos

Created by Graph-COM. Licensed under AGPL-3.0.

Graph-COM/CKA-Agent — ⭐ 203

Thanks to the upstream maintainers and contributors for publishing this work under an open license.

Discusión

Inicia sesión para unirte a la discusión.
Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados