# CKA-Agent — Trojan Knowledge Attack Agent (Research) > Research code for studying trojan-knowledge attacks in agent systems, with reproducible scripts and configs; verified 203★, pushed 2026-05-13. ## Install Copy the content below into your project: ## Quick Use ```bash python -m venv .venv && source .venv/bin/activate # README uses uv for deps; follow the repo for exact pins: uv pip install accelerate fastchat nltk pandas google-genai httpx[socks] anthropic ``` ## Intro Research code for studying trojan-knowledge attacks in agent systems, with reproducible scripts and configs; verified 203★, pushed 2026-05-13. **Best for:** Researchers and security-minded agent builders evaluating knowledge poisoning risks **Works with:** Python tooling; README includes `uv pip` installs and experiment dependencies **Setup time:** 20-45 minutes ### Key facts (verified) - GitHub: 203 stars · 45 forks · pushed 2026-05-13. - License: AGPL-3.0 · owner avatar + repo URL verified via GitHub API. - README-backed entrypoint: `uv pip install ...`. ## Main - Treat it as a security lab: run experiments in an isolated environment and record the exact dependency set used. - Use it to build test cases: trojan knowledge scenarios can become unit/regression tests for your retrieval + tool pipeline. - Map the attack surface: separate poisoning in static docs vs retrieval corpora vs tool outputs so mitigations are targeted. - Export results as artifacts: logs, prompts, and configs are as important as code when reproducing agent-security claims. ### README (excerpt) **[ICML 2026] CKA-Agent: Bypassing LLM Guardrails via Harmless Prompt Weaving and Adaptive Tree Search** arXiv Website GitHub code Cite Python ## 🛡️ Defense towards CKA [TurnGate](https://github.com/Graph-COM/TurnGate) is a response-aware defense mechanism designed to detect and mitigate hidden malicious intent in multi-turn dialogue systems. Defending state-of-the-art multi-turn malicious attacks like CKA-Agent, achieving great defense performance while avoiding overrefusal. ## 🔥 Latest Results on Frontier Models (Dec 2025) CKA-Agent demonstrates consistent high attack success rates against the latest frontier models, including **GPT-5.2**, **Gemini-3.0-Pro**, and **Claude-Haiku-4.5**. The results are summarized below: ### Source-backed notes - README shows `uv pip install` commands for installing experiment dependencies. - Repo is AGPL-3.0 licensed (verified via GitHub API). - The repository positions itself as a reproducible implementation for agent-security research (per README wording). ### FAQ - **Is this meant for production use?**: It’s primarily research code; use it to evaluate and harden your own systems. - **How do I install dependencies?**: Follow the README `uv pip install ...` instructions and keep versions pinned for reproducibility. - **What license applies?**: AGPL-3.0 (verified via GitHub license metadata). ## Source & Thanks > Created by [Graph-COM](https://github.com/Graph-COM). Licensed under AGPL-3.0. > > [Graph-COM/CKA-Agent](https://github.com/Graph-COM/CKA-Agent) — ⭐ 203 Thanks to the upstream maintainers and contributors for publishing this work under an open license. --- ## Quick Use ```bash python -m venv .venv && source .venv/bin/activate # README uses uv for deps; follow the repo for exact pins: uv pip install accelerate fastchat nltk pandas google-genai httpx[socks] anthropic ``` ## Intro CKA-Agent 提供针对 agent 系统的“木马知识”攻击研究代码与复现实验脚本/配置,适合做安全评估、对抗实验与基准复现并总结防护要点;已验证 203★,更新于 2026-05-13。 **Best for:** 想评估知识投毒/木马知识风险的研究者与安全敏感型 agent 开发者 **Works with:** Python 工具链;README 包含 `uv pip` 安装与实验依赖说明 **Setup time:** 20-45 minutes ### Key facts (verified) - GitHub:203 stars · 45 forks;最近更新 2026-05-13。 - 许可证:AGPL-3.0;作者头像与仓库链接均已通过 GitHub API 复核。 - README 中可对照的入口:`uv pip install ...`。 ## Main - 把它当安全实验室:在隔离环境里跑实验,并记录精确依赖版本,保证可复现。 - 用它生成测试用例:把木马知识场景沉淀成检索/工具链的回归测试。 - 拆分攻击面:区分静态文档、检索语料、工具输出三类污染来源,针对性加固。 - 把结果导出成可审计工件:日志、prompts、configs 与代码同等重要。 ### README (excerpt) **[ICML 2026] CKA-Agent: Bypassing LLM Guardrails via Harmless Prompt Weaving and Adaptive Tree Search** arXiv Website GitHub code Cite Python ## 🛡️ Defense towards CKA [TurnGate](https://github.com/Graph-COM/TurnGate) is a response-aware defense mechanism designed to detect and mitigate hidden malicious intent in multi-turn dialogue systems. Defending state-of-the-art multi-turn malicious attacks like CKA-Agent, achieving great defense performance while avoiding overrefusal. ## 🔥 Latest Results on Frontier Models (Dec 2025) CKA-Agent demonstrates consistent high attack success rates against the latest frontier models, including **GPT-5.2**, **Gemini-3.0-Pro**, and **Claude-Haiku-4.5**. The results are summarized below:
Model HarmBench StrongREJECT
FS ↑ PS ↑ V ↓ R ↓ FS ↑ PS ↑ V ↓ R ↓
🟢 GPT-5.2 0.889 0.079 0.024 0.008 0.932 0.056 0.006 0.006
🟣 Gemini-3.0-Pro 0.881 0.087
### Source-backed notes - README 给出 `uv pip install` 命令用于安装实验依赖。 - 仓库为 AGPL-3.0 许可证(已通过 GitHub API 复核)。 - 仓库 README 将其定位为可复现实验实现,用于 agent 安全研究。 ### FAQ - **适合直接上生产吗?**:主要面向研究;更适合用来评估并加固你自己的系统。 - **依赖怎么装?**:按 README 的 `uv pip install ...` 说明安装,并建议锁定版本以保证复现。 - **许可证是什么?**:AGPL-3.0(已通过 GitHub 许可证元数据复核)。 ## Source & Thanks > Created by [Graph-COM](https://github.com/Graph-COM). Licensed under AGPL-3.0. > > [Graph-COM/CKA-Agent](https://github.com/Graph-COM/CKA-Agent) — ⭐ 203 --- Source: https://tokrepo.com/en/workflows/cka-agent-trojan-knowledge-attack-agent-research Author: Agent Toolkit
Model HarmBench StrongREJECT
FS ↑ PS ↑ V ↓ R ↓ FS ↑ PS ↑ V ↓ R ↓
🟢 GPT-5.2 0.889 0.079 0.024 0.008 0.932 0.056 0.006 0.006
🟣 Gemini-3.0-Pro 0.881 0.087