# Giskard Checks — Evals and Safety Tests for LLM Agents > Giskard Checks gives Python teams a modular eval layer for agent regressions, groundedness, and policy conformance with scenario-based tests. ## Install Save as a script file and run: ## Quick Use 1. Install the current v3 package: ```bash pip install giskard-checks ``` 2. Write one scenario with `Scenario` + `Groundedness`, then run it in Python. 3. Verify: - Confirm the async scenario produces a report for one prompt/answer pair before you scale to suites. ## Intro Giskard Checks gives Python teams a modular eval layer for agent regressions, groundedness, and policy conformance with scenario-based tests. - **Best for:** Python teams that need reproducible evals for agent regressions and grounding checks - **Works with:** Python 3.12+, OpenAI-compatible clients, async test runs, scenario-based evaluation suites - **Setup time:** 10-25 minutes ## Practical Notes - Quant: the current README requires Python 3.12+ and splits the project into modular packages such as `giskard-checks`. - Quant: built-in checks explicitly include Groundedness, Conformity, regex matching, semantic similarity, and LLM-as-judge patterns. ## Why it matters Giskard is strongest when you want something stricter than eyeballing agent demos but lighter than building a full in-house eval framework. - The scenario API is aimed at non-deterministic systems, which is the right abstraction for LLM agents rather than brittle exact-match asserts. - The maintainers distinguish the new modular v3 line from legacy v2 scan/RAG tooling, reducing version ambiguity. - Because checks are Python-native, teams can wire them into CI without standing up a separate control plane first. ## Rollout pattern - Start with one regression scenario and one groundedness scenario around a user-facing workflow. - Add pass/fail gates only after you understand variance across repeated runs and model versions. - Keep old v2-only capabilities separate if you still rely on Scan or RAGET; the README is explicit that those are legacy paths. ## Watchouts Do not assume every historical Giskard feature still exists in the same package line; v3 is a rewrite and the README explicitly separates planned versus available modules. ### FAQ **Q: Is this the old all-in-one Giskard package?** A: No. The README frames v3 as a modular rewrite and points to v2 only for legacy Scan and RAGET use cases. **Q: Why is it useful for agents?** A: It gives scenario-based checks for outputs that can vary while still needing quality gates. **Q: What should I test first?** A: Groundedness and one regression path tied to a real business workflow, not synthetic toy prompts. ## Source & Thanks > Source: https://github.com/Giskard-AI/giskard-oss > License: Apache-2.0 > GitHub stars: 5,344 · forks: 453 --- ## 快速使用 1. 安装当前 v3 包: ```bash pip install giskard-checks ``` 2. 用 `Scenario` + `Groundedness` 写一个最小测试场景并运行。 3. 验证: - 先确认单个 prompt/answer 能生成报告,再扩展到更大的测试套件。 ## 简介 Giskard Checks 为 Python 团队提供模块化评测层,可针对回归、事实依据与策略合规性建立场景测试,适合把存在波动的 Agent 输出纳入可复验、可持续执行且可接入 CI 的工程质量流程。 - **适合谁:** 需要为 Agent 回归与事实依据检查建立可复验评测流程的 Python 团队 - **可搭配:** Python 3.12+、兼容 OpenAI 的客户端、异步运行与场景式评测套件 - **准备时间:** 10-25 分钟 ## 实战建议 - 量化信息:当前 README 要求 Python 3.12+,并把项目拆成 `giskard-checks` 等模块化包。 - 量化信息:内建检查项明确包含 Groundedness、Conformity、正则匹配、语义相似度与 LLM-as-judge。 ## 为什么值得收录 如果你已经觉得“人工看 Demo”不够可靠,但又不想立刻自建整套评测平台,Giskard Checks 是一个很务实的中间层。 - 它的 Scenario 抽象面向非确定性输出,比死板的 exact-match 更适合 LLM Agent。 - README 清楚区分了新的 v3 模块化路线与旧版 v2 的 Scan/RAGET,避免版本理解混乱。 - 因为它是 Python 原生库,团队可以先接入 CI,再决定要不要做更重的评测平台。 ## 落地路径 - 先围绕一个真实用户工作流建立 1 个回归场景和 1 个 groundedness 场景。 - 在你理解模型波动之前,不要急着把所有结果都设成硬性失败门槛。 - 如果你仍依赖 v2 的 Scan 或 RAGET,务必单独管理,因为 README 已明确那是旧路线。 ## 注意事项 不要把历史上所有 Giskard 能力都默认等同于当前包;v3 是重写版,可用模块和规划模块已经被明确拆开。 ### FAQ **这是旧版那个大一统 Giskard 吗?** 答:不是。README 把 v3 定义为模块化重写版,v2 只保留给旧的 Scan / RAGET 路线。 **为什么适合 Agent?** 答:它允许你为存在波动的输出建立场景化质量门槛,而不是只看一次生成结果。 **第一步该测什么?** 答:先测 groundedness 和一个真实业务路径的回归,不要从玩具例子开始。 ## 来源与感谢 > Source: https://github.com/Giskard-AI/giskard-oss > License: Apache-2.0 > GitHub stars: 5,344 · forks: 453 --- Source: https://tokrepo.com/en/workflows/giskard-checks-evals-and-safety-tests-for-llm-agents Author: Agent Toolkit