# Giskard Checks — Evals and Safety Tests for LLM Agents

> Giskard Checks gives Python teams a modular eval layer for agent regressions, groundedness, and policy conformance with scenario-based tests.

## Install

Save as a script file and run:

## Quick Use

1. Install the current v3 package:
   ```bash
   pip install giskard-checks
   ```
2. Write one scenario with `Scenario` + `Groundedness`, then run it in Python.
3. Verify:
   - Confirm the async scenario produces a report for one prompt/answer pair before you scale to suites.

## Intro

Giskard Checks gives Python teams a modular eval layer for agent regressions, groundedness, and policy conformance with scenario-based tests.

- **Best for:** Python teams that need reproducible evals for agent regressions and grounding checks
- **Works with:** Python 3.12+, OpenAI-compatible clients, async test runs, scenario-based evaluation suites
- **Setup time:** 10-25 minutes

## Practical Notes

- Quant: the current README requires Python 3.12+ and splits the project into modular packages such as `giskard-checks`.
- Quant: built-in checks explicitly include Groundedness, Conformity, regex matching, semantic similarity, and LLM-as-judge patterns.

## Why it matters

Giskard is strongest when you want something stricter than eyeballing agent demos but lighter than building a full in-house eval framework.

- The scenario API is aimed at non-deterministic systems, which is the right abstraction for LLM agents rather than brittle exact-match asserts.
- The maintainers distinguish the new modular v3 line from legacy v2 scan/RAG tooling, reducing version ambiguity.
- Because checks are Python-native, teams can wire them into CI without standing up a separate control plane first.

## Rollout pattern

- Start with one regression scenario and one groundedness scenario around a user-facing workflow.
- Add pass/fail gates only after you understand variance across repeated runs and model versions.
- Keep old v2-only capabilities separate if you still rely on Scan or RAGET; the README is explicit that those are legacy paths.

## Watchouts

Do not assume every historical Giskard feature still exists in the same package line; v3 is a rewrite and the README explicitly separates planned versus available modules.

### FAQ

**Q: Is this the old all-in-one Giskard package?**
A: No. The README frames v3 as a modular rewrite and points to v2 only for legacy Scan and RAGET use cases.

**Q: Why is it useful for agents?**
A: It gives scenario-based checks for outputs that can vary while still needing quality gates.

**Q: What should I test first?**
A: Groundedness and one regression path tied to a real business workflow, not synthetic toy prompts.

## Source & Thanks

> Source: https://github.com/Giskard-AI/giskard-oss
> License: Apache-2.0
> GitHub stars: 5,344 · forks: 453

---

<!-- ZH -->

## 快速使用

1. 安装当前 v3 包：
   ```bash
   pip install giskard-checks
   ```
2. 用 `Scenario` + `Groundedness` 写一个最小测试场景并运行。
3. 验证：
   - 先确认单个 prompt/answer 能生成报告，再扩展到更大的测试套件。

## 简介

Giskard Checks 为 Python 团队提供模块化评测层，可针对回归、事实依据与策略合规性建立场景测试，适合把存在波动的 Agent 输出纳入可复验、可持续执行且可接入 CI 的工程质量流程。

- **适合谁：** 需要为 Agent 回归与事实依据检查建立可复验评测流程的 Python 团队
- **可搭配：** Python 3.12+、兼容 OpenAI 的客户端、异步运行与场景式评测套件
- **准备时间：** 10-25 分钟

## 实战建议

- 量化信息：当前 README 要求 Python 3.12+，并把项目拆成 `giskard-checks` 等模块化包。
- 量化信息：内建检查项明确包含 Groundedness、Conformity、正则匹配、语义相似度与 LLM-as-judge。 

## 为什么值得收录

如果你已经觉得“人工看 Demo”不够可靠，但又不想立刻自建整套评测平台，Giskard Checks 是一个很务实的中间层。

- 它的 Scenario 抽象面向非确定性输出，比死板的 exact-match 更适合 LLM Agent。
- README 清楚区分了新的 v3 模块化路线与旧版 v2 的 Scan/RAGET，避免版本理解混乱。
- 因为它是 Python 原生库，团队可以先接入 CI，再决定要不要做更重的评测平台。

## 落地路径

- 先围绕一个真实用户工作流建立 1 个回归场景和 1 个 groundedness 场景。
- 在你理解模型波动之前，不要急着把所有结果都设成硬性失败门槛。
- 如果你仍依赖 v2 的 Scan 或 RAGET，务必单独管理，因为 README 已明确那是旧路线。 

## 注意事项

不要把历史上所有 Giskard 能力都默认等同于当前包；v3 是重写版，可用模块和规划模块已经被明确拆开。

### FAQ

**这是旧版那个大一统 Giskard 吗？**
答：不是。README 把 v3 定义为模块化重写版，v2 只保留给旧的 Scan / RAGET 路线。

**为什么适合 Agent？**
答：它允许你为存在波动的输出建立场景化质量门槛，而不是只看一次生成结果。

**第一步该测什么？**
答：先测 groundedness 和一个真实业务路径的回归，不要从玩具例子开始。

## 来源与感谢

> Source: https://github.com/Giskard-AI/giskard-oss
> License: Apache-2.0
> GitHub stars: 5,344 · forks: 453


---
Source: https://tokrepo.com/en/workflows/giskard-checks-evals-and-safety-tests-for-llm-agents
Author: Agent Toolkit