[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"workflow-giskard-checks-evals-and-safety-tests-for-llm-agents-08f46b1e":3,"seo:featured-workflow:08f46b1e-5a82-59dc-916a-cb2f0ae17a63:es":39,"workflow-related-giskard-checks-evals-and-safety-tests-for-llm-agents-08f46b1e-08f46b1e-5a82-59dc-916a-cb2f0ae17a63":82},{"id":4,"uuid":5,"slug":6,"title":7,"description":8,"author_id":9,"author_name":10,"author_avatar":11,"token_estimate":12,"time_saved":12,"model_used":13,"fork_count":12,"vote_count":12,"view_count":14,"parent_id":12,"parent_uuid":13,"lang_type":15,"steps":16,"files":23,"tags":24,"has_voted":30,"visibility":19,"share_token":13,"is_featured":12,"content_hash":31,"asset_kind":28,"target_tools":32,"install_mode":36,"entrypoint":37,"risk_profile":38,"dependencies":40,"verification":45,"agent_metadata":48,"agent_fit":59,"trust":70,"provenance":78,"created_at":80,"updated_at":81},3276,"08f46b1e-5a82-59dc-916a-cb2f0ae17a63","giskard-checks-evals-and-safety-tests-for-llm-agents","Giskard Checks — Evals and Safety Tests for LLM Agents","Giskard Checks gives Python teams a modular eval layer for agent regressions, groundedness, and policy conformance with scenario-based tests.","8a910fec-3180-11f1-9bc6-00163e2b0d79","Agent Toolkit","https:\u002F\u002Ftokrepo.com\u002Fapple-touch-icon.png",0,"",12,"en",[17],{"id":18,"step_order":19,"title":20,"description":13,"prompt_template":21,"variables":13,"depends_on":22,"expected_output":13},3839,1,"Asset","## Quick Use\n\n1. Install the current v3 package:\n   ```bash\n   pip install giskard-checks\n   ```\n2. Write one scenario with `Scenario` + `Groundedness`, then run it in Python.\n3. Verify:\n   - Confirm the async scenario produces a report for one prompt\u002Fanswer pair before you scale to suites.\n\n## Intro\n\nGiskard Checks gives Python teams a modular eval layer for agent regressions, groundedness, and policy conformance with scenario-based tests.\n\n- **Best for:** Python teams that need reproducible evals for agent regressions and grounding checks\n- **Works with:** Python 3.12+, OpenAI-compatible clients, async test runs, scenario-based evaluation suites\n- **Setup time:** 10-25 minutes\n\n## Practical Notes\n\n- Quant: the current README requires Python 3.12+ and splits the project into modular packages such as `giskard-checks`.\n- Quant: built-in checks explicitly include Groundedness, Conformity, regex matching, semantic similarity, and LLM-as-judge patterns.\n\n## Why it matters\n\nGiskard is strongest when you want something stricter than eyeballing agent demos but lighter than building a full in-house eval framework.\n\n- The scenario API is aimed at non-deterministic systems, which is the right abstraction for LLM agents rather than brittle exact-match asserts.\n- The maintainers distinguish the new modular v3 line from legacy v2 scan\u002FRAG tooling, reducing version ambiguity.\n- Because checks are Python-native, teams can wire them into CI without standing up a separate control plane first.\n\n## Rollout pattern\n\n- Start with one regression scenario and one groundedness scenario around a user-facing workflow.\n- Add pass\u002Ffail gates only after you understand variance across repeated runs and model versions.\n- Keep old v2-only capabilities separate if you still rely on Scan or RAGET; the README is explicit that those are legacy paths.\n\n## Watchouts\n\nDo not assume every historical Giskard feature still exists in the same package line; v3 is a rewrite and the README explicitly separates planned versus available modules.\n\n### FAQ\n\n**Q: Is this the old all-in-one Giskard package?**\nA: No. The README frames v3 as a modular rewrite and points to v2 only for legacy Scan and RAGET use cases.\n\n**Q: Why is it useful for agents?**\nA: It gives scenario-based checks for outputs that can vary while still needing quality gates.\n\n**Q: What should I test first?**\nA: Groundedness and one regression path tied to a real business workflow, not synthetic toy prompts.\n\n## Source & Thanks\n\n> Source: https:\u002F\u002Fgithub.com\u002FGiskard-AI\u002Fgiskard-oss\n> License: Apache-2.0\n> GitHub stars: 5,344 · forks: 453\n\n---\n\n\u003C!-- ZH -->\n\n## 快速使用\n\n1. 安装当前 v3 包：\n   ```bash\n   pip install giskard-checks\n   ```\n2. 用 `Scenario` + `Groundedness` 写一个最小测试场景并运行。\n3. 验证：\n   - 先确认单个 prompt\u002Fanswer 能生成报告，再扩展到更大的测试套件。\n\n## 简介\n\nGiskard Checks 为 Python 团队提供模块化评测层，可针对回归、事实依据与策略合规性建立场景测试，适合把存在波动的 Agent 输出纳入可复验、可持续执行且可接入 CI 的工程质量流程。\n\n- **适合谁：** 需要为 Agent 回归与事实依据检查建立可复验评测流程的 Python 团队\n- **可搭配：** Python 3.12+、兼容 OpenAI 的客户端、异步运行与场景式评测套件\n- **准备时间：** 10-25 分钟\n\n## 实战建议\n\n- 量化信息：当前 README 要求 Python 3.12+，并把项目拆成 `giskard-checks` 等模块化包。\n- 量化信息：内建检查项明确包含 Groundedness、Conformity、正则匹配、语义相似度与 LLM-as-judge。 \n\n## 为什么值得收录\n\n如果你已经觉得“人工看 Demo”不够可靠，但又不想立刻自建整套评测平台，Giskard Checks 是一个很务实的中间层。\n\n- 它的 Scenario 抽象面向非确定性输出，比死板的 exact-match 更适合 LLM Agent。\n- README 清楚区分了新的 v3 模块化路线与旧版 v2 的 Scan\u002FRAGET，避免版本理解混乱。\n- 因为它是 Python 原生库，团队可以先接入 CI，再决定要不要做更重的评测平台。\n\n## 落地路径\n\n- 先围绕一个真实用户工作流建立 1 个回归场景和 1 个 groundedness 场景。\n- 在你理解模型波动之前，不要急着把所有结果都设成硬性失败门槛。\n- 如果你仍依赖 v2 的 Scan 或 RAGET，务必单独管理，因为 README 已明确那是旧路线。 \n\n## 注意事项\n\n不要把历史上所有 Giskard 能力都默认等同于当前包；v3 是重写版，可用模块和规划模块已经被明确拆开。\n\n### FAQ\n\n**这是旧版那个大一统 Giskard 吗？**\n答：不是。README 把 v3 定义为模块化重写版，v2 只保留给旧的 Scan \u002F RAGET 路线。\n\n**为什么适合 Agent？**\n答：它允许你为存在波动的输出建立场景化质量门槛，而不是只看一次生成结果。\n\n**第一步该测什么？**\n答：先测 groundedness 和一个真实业务路径的回归，不要从玩具例子开始。\n\n## 来源与感谢\n\n> Source: https:\u002F\u002Fgithub.com\u002FGiskard-AI\u002Fgiskard-oss\n> License: Apache-2.0\n> GitHub stars: 5,344 · forks: 453\n","0",[],[25],{"id":26,"name":27,"slug":28,"icon":29},11,"Scripts","script","📜",false,"b929dabe71d83aa81e7803962cb62cbb1a86cee42bba6c1704fbf290aeee4c64",[33,34,35],"claude_code","codex","gemini_cli","single","giskard-checks",{"executes_code":30,"modifies_global_config":30,"requires_secrets":39,"uses_absolute_paths":30,"network_access":30},null,{"npm":41,"pip":42,"brew":43,"system":44},[],[37],[],[],{"commands":46,"expected_files":47},[],[20],{"asset_kind":28,"target_tools":49,"install_mode":36,"entrypoint":37,"risk_profile":50,"dependencies":51,"content_hash":31,"verification":56},[33,34,35],{"executes_code":30,"modifies_global_config":30,"requires_secrets":39,"uses_absolute_paths":30,"network_access":30},{"npm":52,"pip":53,"brew":54,"system":55},[],[37],[],[],{"commands":57,"expected_files":58},[],[20],{"target":34,"score":60,"status":61,"policy":61,"why":62,"asset_kind":28,"install_mode":36},29,"stage_only",[63,64,65,66,67,68,69],"target_tools includes codex","asset_kind script","install_mode single","markdown-only","policy stage_only","asset_kind script is not activated directly for Codex","trust established",{"author_trust_level":71,"verified_publisher":30,"asset_signed_hash":31,"signature_status":72,"install_count":12,"report_count":12,"dangerous_capability_badges":73,"review_status":74,"signals":75},"established","hash_only",[28],"unreviewed",[76,77],"author has published assets","content hash available",{"owner_uuid":9,"owner_name":10,"source_url":79,"content_hash":31,"visibility":19,"created_at":80,"updated_at":81},"https:\u002F\u002Ftokrepo.com\u002Fen\u002Fworkflows\u002Fgiskard-checks-evals-and-safety-tests-for-llm-agents","2026-05-12 22:02:43","2026-05-14 09:30:37",[83,133,176,234],{"id":84,"uuid":85,"slug":86,"title":87,"description":88,"author_id":9,"author_name":10,"author_avatar":11,"token_estimate":12,"time_saved":12,"model_used":13,"fork_count":12,"vote_count":12,"view_count":89,"parent_id":12,"parent_uuid":13,"lang_type":15,"steps":90,"files":39,"tags":91,"has_voted":30,"visibility":19,"share_token":13,"is_featured":12,"content_hash":93,"asset_kind":28,"target_tools":94,"install_mode":36,"entrypoint":95,"risk_profile":96,"dependencies":97,"verification":102,"agent_metadata":105,"agent_fit":116,"trust":118,"provenance":121,"created_at":123,"updated_at":124,"__relatedScore":125,"__relatedReasons":126,"__sharedTags":131},3153,"73cd67c3-9db6-48ed-8a31-c082f618168e","agent-evaluation-test-virtual-agents-in-ci","Agent Evaluation — Test Virtual Agents in CI","Agent Evaluation is a Python framework that runs repeatable, scored tests for virtual agents, so teams can catch regressions automatically in CI.",14,[],[92],{"id":26,"name":27,"slug":28,"icon":29},"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",[33,34,35],"README.md",{"executes_code":30,"modifies_global_config":30,"requires_secrets":39,"uses_absolute_paths":30,"network_access":30},{"npm":98,"pip":99,"brew":100,"system":101},[],[],[],[],{"commands":103,"expected_files":104},[],[],{"asset_kind":28,"target_tools":106,"install_mode":36,"entrypoint":95,"risk_profile":107,"dependencies":108,"content_hash":93,"verification":113},[33,34,35],{"executes_code":30,"modifies_global_config":30,"requires_secrets":39,"uses_absolute_paths":30,"network_access":30},{"npm":109,"pip":110,"brew":111,"system":112},[],[],[],[],{"commands":114,"expected_files":115},[],[],{"target":34,"score":60,"status":61,"policy":61,"why":117,"asset_kind":28,"install_mode":36},[63,64,65,66,67,68,69],{"author_trust_level":71,"verified_publisher":30,"asset_signed_hash":93,"signature_status":72,"install_count":12,"report_count":12,"dangerous_capability_badges":119,"review_status":74,"signals":120},[28],[76,77],{"owner_uuid":9,"owner_name":10,"source_url":122,"content_hash":93,"visibility":19,"created_at":123,"updated_at":124},"https:\u002F\u002Ftokrepo.com\u002Fen\u002Fworkflows\u002Fagent-evaluation-test-virtual-agents-in-ci","2026-05-12 07:08:04","2026-05-14 08:17:15",102.76413688858352,[127,128,129,130],"topic-match","same-kind","same-target","same-author",[28,132],"scripts",{"id":134,"uuid":135,"slug":136,"title":137,"description":138,"author_id":9,"author_name":10,"author_avatar":11,"token_estimate":12,"time_saved":12,"model_used":13,"fork_count":12,"vote_count":12,"view_count":139,"parent_id":12,"parent_uuid":13,"lang_type":15,"steps":140,"files":39,"tags":141,"has_voted":30,"visibility":19,"share_token":13,"is_featured":12,"content_hash":93,"asset_kind":28,"target_tools":143,"install_mode":36,"entrypoint":95,"risk_profile":144,"dependencies":145,"verification":150,"agent_metadata":153,"agent_fit":164,"trust":166,"provenance":169,"created_at":171,"updated_at":172,"__relatedScore":173,"__relatedReasons":174,"__sharedTags":175},3104,"c866ac5d-23f3-4e59-9351-a402817c90ce","trulens-evaluate-and-track-llm-apps","TruLens — Evaluate and Track LLM Apps","Instrument LLM apps and run systematic evals for RAG quality and regressions to find failure modes fast. Combine tracing and scorecards in one workflow.",6,[],[142],{"id":26,"name":27,"slug":28,"icon":29},[33,34,35],{"executes_code":30,"modifies_global_config":30,"requires_secrets":39,"uses_absolute_paths":30,"network_access":30},{"npm":146,"pip":147,"brew":148,"system":149},[],[],[],[],{"commands":151,"expected_files":152},[],[],{"asset_kind":28,"target_tools":154,"install_mode":36,"entrypoint":95,"risk_profile":155,"dependencies":156,"content_hash":93,"verification":161},[33,34,35],{"executes_code":30,"modifies_global_config":30,"requires_secrets":39,"uses_absolute_paths":30,"network_access":30},{"npm":157,"pip":158,"brew":159,"system":160},[],[],[],[],{"commands":162,"expected_files":163},[],[],{"target":34,"score":60,"status":61,"policy":61,"why":165,"asset_kind":28,"install_mode":36},[63,64,65,66,67,68,69],{"author_trust_level":71,"verified_publisher":30,"asset_signed_hash":93,"signature_status":72,"install_count":12,"report_count":12,"dangerous_capability_badges":167,"review_status":74,"signals":168},[28],[76,77],{"owner_uuid":9,"owner_name":10,"source_url":170,"content_hash":93,"visibility":19,"created_at":171,"updated_at":172},"https:\u002F\u002Ftokrepo.com\u002Fen\u002Fworkflows\u002Ftrulens-evaluate-and-track-llm-apps","2026-05-12 03:00:17","2026-05-14 10:52:16",93.26764706002139,[127,128,129,130],[28,132],{"id":177,"uuid":178,"slug":179,"title":180,"description":181,"author_id":182,"author_name":183,"author_avatar":11,"token_estimate":184,"time_saved":12,"model_used":185,"fork_count":12,"vote_count":12,"view_count":186,"parent_id":12,"parent_uuid":13,"lang_type":15,"steps":187,"files":39,"tags":188,"has_voted":30,"visibility":19,"share_token":13,"is_featured":12,"content_hash":190,"asset_kind":191,"target_tools":192,"install_mode":36,"entrypoint":180,"risk_profile":193,"dependencies":195,"verification":200,"agent_metadata":203,"agent_fit":215,"trust":223,"provenance":227,"created_at":229,"updated_at":230,"__relatedScore":231,"__relatedReasons":232,"__sharedTags":233},228,"2670226a-fe9a-4de2-bc53-8d5a25b071f2","llama-stack-meta-official-llm-app-framework-2670226a","Llama Stack — Meta Official LLM App Framework","Official Meta framework for building LLM applications with Llama models. Inference, safety, RAG, agents, evals, and tool use. Standardized APIs. 8.3K+ stars.","8a910e34-3180-11f1-9bc6-00163e2b0d79","Script Depot",500,"Claude Code",99,[],[189],{"id":26,"name":27,"slug":28,"icon":29},"de1a8a0f85a7a6317e9c221ccb8efc8e3a5a871c20af5348647312850351eaf2","skill",[33,34,35],{"executes_code":30,"modifies_global_config":30,"requires_secrets":194,"uses_absolute_paths":30,"network_access":30},[],{"npm":196,"pip":197,"brew":198,"system":199},[],[],[],[],{"commands":201,"expected_files":202},[],[180],{"asset_kind":191,"target_tools":204,"install_mode":36,"entrypoint":180,"risk_profile":205,"dependencies":207,"content_hash":190,"verification":212},[33,34,35],{"executes_code":30,"modifies_global_config":30,"requires_secrets":206,"uses_absolute_paths":30,"network_access":30},[],{"npm":208,"pip":209,"brew":210,"system":211},[],[],[],[],{"commands":213,"expected_files":214},[],[180],{"target":34,"score":216,"status":217,"policy":218,"why":219,"asset_kind":191,"install_mode":36},98,"native","allow",[63,220,65,66,221,222,69],"asset_kind skill","policy allow","safe markdown-only Codex install",{"author_trust_level":71,"verified_publisher":30,"asset_signed_hash":190,"signature_status":72,"install_count":12,"report_count":12,"dangerous_capability_badges":224,"review_status":74,"signals":225},[],[76,77,226],"no dangerous capability badges",{"owner_uuid":182,"owner_name":183,"source_url":228,"content_hash":190,"visibility":19,"created_at":229,"updated_at":230},"https:\u002F\u002Ftokrepo.com\u002Fen\u002Fworkflows\u002Fllama-stack-meta-official-llm-app-framework-2670226a","2026-03-31 11:14:53","2026-05-13 19:24:43",90,[127,129],[28,132],{"id":235,"uuid":236,"slug":237,"title":238,"description":239,"author_id":9,"author_name":10,"author_avatar":11,"token_estimate":12,"time_saved":12,"model_used":13,"fork_count":12,"vote_count":12,"view_count":26,"parent_id":12,"parent_uuid":13,"lang_type":15,"steps":240,"files":39,"tags":241,"has_voted":30,"visibility":19,"share_token":13,"is_featured":12,"content_hash":93,"asset_kind":28,"target_tools":243,"install_mode":36,"entrypoint":95,"risk_profile":244,"dependencies":245,"verification":250,"agent_metadata":253,"agent_fit":264,"trust":266,"provenance":269,"created_at":271,"updated_at":272,"__relatedScore":273,"__relatedReasons":274,"__sharedTags":275},3216,"2414f9d2-b727-454b-9613-f45278226743","agents-cli-agent-build-eval-deploy-skills-for-coders","agents-cli — Agent Build\u002FEval\u002FDeploy Skills for Coders","agents-cli installs a CLI + skills so your coding assistant can scaffold, evaluate, and deploy production agents on Google Cloud with repeatable commands.",[],[242],{"id":26,"name":27,"slug":28,"icon":29},[33,34,35],{"executes_code":30,"modifies_global_config":30,"requires_secrets":39,"uses_absolute_paths":30,"network_access":30},{"npm":246,"pip":247,"brew":248,"system":249},[],[],[],[],{"commands":251,"expected_files":252},[],[],{"asset_kind":28,"target_tools":254,"install_mode":36,"entrypoint":95,"risk_profile":255,"dependencies":256,"content_hash":93,"verification":261},[33,34,35],{"executes_code":30,"modifies_global_config":30,"requires_secrets":39,"uses_absolute_paths":30,"network_access":30},{"npm":257,"pip":258,"brew":259,"system":260},[],[],[],[],{"commands":262,"expected_files":263},[],[],{"target":34,"score":60,"status":61,"policy":61,"why":265,"asset_kind":28,"install_mode":36},[63,64,65,66,67,68,69],{"author_trust_level":71,"verified_publisher":30,"asset_signed_hash":93,"signature_status":72,"install_count":12,"report_count":12,"dangerous_capability_badges":267,"review_status":74,"signals":268},[28],[76,77],{"owner_uuid":9,"owner_name":10,"source_url":270,"content_hash":93,"visibility":19,"created_at":271,"updated_at":272},"https:\u002F\u002Ftokrepo.com\u002Fen\u002Fworkflows\u002Fagents-cli-agent-build-eval-deploy-skills-for-coders","2026-05-12 13:29:47","2026-05-14 10:50:39",87.61877186907144,[127,128,129,130],[28,132]]