Cette page est affichée en anglais. Une traduction française est en cours.
WorkflowsMay 14, 2026·3 min de lecture

CKA-Agent — Trojan Knowledge Attack Agent (Research)

Research code for studying trojan-knowledge attacks in agent systems, with reproducible scripts and configs; verified 203★, pushed 2026-05-13.

Prêt pour agents

Cet actif peut être lu et installé directement par les agents

TokRepo expose une commande CLI universelle, un contrat d'installation, le metadata JSON, un plan selon l'adaptateur et le contenu raw pour aider les agents à juger l'adaptation, le risque et les prochaines actions.

Needs Confirmation · 62/100Policy : confirmer
Surface agent
Tout agent MCP/CLI
Type
Workflow
Installation
Uv|Pip
Confiance
Confiance : Established
Point d'entrée
uv pip install ...
Commande CLI universelle
npx tokrepo install 5f0a5eff-2382-5179-be0c-622add1557e4
Introduction

Research code for studying trojan-knowledge attacks in agent systems, with reproducible scripts and configs; verified 203★, pushed 2026-05-13.

Best for: Researchers and security-minded agent builders evaluating knowledge poisoning risks

Works with: Python tooling; README includes uv pip installs and experiment dependencies

Setup time: 20-45 minutes

Key facts (verified)

  • GitHub: 203 stars · 45 forks · pushed 2026-05-13.
  • License: AGPL-3.0 · owner avatar + repo URL verified via GitHub API.
  • README-backed entrypoint: uv pip install ....

Main

  • Treat it as a security lab: run experiments in an isolated environment and record the exact dependency set used.

  • Use it to build test cases: trojan knowledge scenarios can become unit/regression tests for your retrieval + tool pipeline.

  • Map the attack surface: separate poisoning in static docs vs retrieval corpora vs tool outputs so mitigations are targeted.

  • Export results as artifacts: logs, prompts, and configs are as important as code when reproducing agent-security claims.

README (excerpt)

[ICML 2026] CKA-Agent: Bypassing LLM Guardrails via Harmless Prompt Weaving and Adaptive Tree Search

arXiv Website GitHub code Cite Python

🛡️ Defense towards CKA

TurnGate is a response-aware defense mechanism designed to detect and mitigate hidden malicious intent in multi-turn dialogue systems. Defending state-of-the-art multi-turn malicious attacks like CKA-Agent, achieving great defense performance while avoiding overrefusal.

🔥 Latest Results on Frontier Models (Dec 2025)

CKA-Agent demonstrates consistent high attack success rates against the latest frontier models, including GPT-5.2, Gemini-3.0-Pro, and Claude-Haiku-4.5. The results are summarized below:

Source-backed notes

  • README shows uv pip install commands for installing experiment dependencies.
  • Repo is AGPL-3.0 licensed (verified via GitHub API).
  • The repository positions itself as a reproducible implementation for agent-security research (per README wording).

FAQ

  • Is this meant for production use?: It’s primarily research code; use it to evaluate and harden your own systems.
  • How do I install dependencies?: Follow the README uv pip install ... instructions and keep versions pinned for reproducibility.
  • What license applies?: AGPL-3.0 (verified via GitHub license metadata).
Model HarmBench StrongREJECT
FS ↑ PS ↑ V ↓ R ↓ FS ↑ PS ↑ V ↓ R ↓
🟢 GPT-5.2 0.889 0.079 0.024 0.008 0.932 0.056 0.006 0.006
🟣 Gemini-3.0-Pro 0.881 0.087
🙏

Source et remerciements

Created by Graph-COM. Licensed under AGPL-3.0.

Graph-COM/CKA-Agent — ⭐ 203

Thanks to the upstream maintainers and contributors for publishing this work under an open license.

Fil de discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires