Main
把它当索引用:从清单跳到一手论文/仓库,再整理出你自己的基准测试集合。
提炼评测维度:把重复出现的指标沉淀成 checklist(上下文、工具、记忆、安全)。
做一份本地笔记:对每个 harness 记录 setup time、工具支持范围、常见失败模式。
引用尽量直达一手来源:写文档时尽量链接原仓库/论文,而不是二手总结。
README (excerpt)
⭐ This repo is actively maintained. If you find it useful, please star the repo to stay updated and help others find it.
The agent execution harness — not the model — is the primary determinant of agent reliability at scale.
This survey formalizes the harness as a first-class architectural object H = (E, T, C, S, L, V), surveys 110+ papers, blogs and reports across 23 systems, and maps 9 open technical challenges.
📄 Read the Paper
🌐 Preprints Version (v3)
✉️ Corrections & suggestions: gloriamenng@gmail.com (Qianyu Meng); wangyanan@mail.dlut.edu.cn (Yanan Wang); chenliyi@xiaohongshu.com (Liyi Chen)
If you find this survey useful, please cite:
@article{meng2026agentharness,
title = {Agent Harness for Large Language Model Agents: A Survey},
author = {Meng, Qianyu and Wang, Yanan and Chen, Liyi and Wu, Wei and
Li, Yihang and Jiang, Wenyuan and Wang, Qimeng and
Lu, Chengqiang and Gao, Yan and Wu, Yi and Hu, Yao},
year = {2026},
doi = {10.20944/preprints202604.0428.v3},
url = {https://www.preprints.org/manuscript/202604.0428/v3},
### Source-backed notes
- 仓库为 CC-BY-4.0 许可(已通过 GitHub API 复核)。
- GitHub API 已复核仓库链接与最近更新时间。
- README 主要是结构化清单/综述式索引(以链接与分类为主)。
### FAQ
- **这是实现代码吗?**:不是:主要是综述/清单,用来快速定位 harness 工具与论文。
- **内容能复用吗?**:可以:许可证为 CC-BY-4.0,复用时请按要求署名。
- **怎么落地?**:选 3–5 个 harness,用同一套任务集跑一遍并记录结果,形成你的基准。