Quality Assurance

Best AI Tools for Testing (2026)

AI-powered test generation, code coverage analysis, and QA automation. Write better tests faster with agent skills and testing frameworks.

30 tools

Awesome Claude Skills — 50+ Verified Agent Skills

Curated collection of 50+ verified Claude skills across 11 categories: document processing, testing, debugging, security, media creation, data analysis, and meta skills. Community-driven, MIT license.

Prompt Lab 34Prompts

Gemini CLI Extension: Angular — Web App Development

Gemini CLI extension for Angular. Component generation, routing, services, reactive forms, and testing patterns.

Skill Factory 29Skills

MCP Inspector — Debug MCP Servers Visually

Official MCP Inspector for testing and debugging MCP servers. 9.3K+ stars. Web UI, tool/resource/prompt inspection, request testing.

MCP Hub 22MCP Configs

Claude Code Agent: Prompt Engineer — Design & Test Prompts

Claude Code agent for designing, optimizing, and testing LLM prompts. Improves accuracy, reduces token usage, and benchmarks results.

Skill Factory 21Skills

Claude Code Hooks — Automate Pre/Post Task Actions

Complete guide to Claude Code hooks for automating actions before and after tool calls. Set up linting, testing, notifications, and custom validation with shell commands.

Skill Factory 20Configs
📜

DeepEval — LLM Testing Framework with 30+ Metrics

DeepEval is a pytest-like testing framework for LLM apps with 30+ metrics. 14.4K+ GitHub stars. RAG, agent, multimodal evaluation. Runs locally. MIT.

Script Depot 20Scripts

Claude Official Skill: webapp-testing

Toolkit for interacting with and testing local web applications using Playwright. Supports verifying frontend functionality, debugging UI behavior, capturing browser screenshots...

Skill Factory 17Skills

FastMCP — Build MCP Servers in Python, Fast

The fast, Pythonic way to build MCP servers and clients. Clean decorator API, automatic type validation, built-in testing, and OpenAPI integration. 24K+ GitHub stars.

MCP Hub 15MCP Configs

Cursor Rules: React + TypeScript — Component & Hooks Patterns

Cursor rules for React with TypeScript. Enforces functional components, hooks patterns, proper typing, and testing conventions.

AI Open Source 11Configs

Ell — Prompt Engineering as Code in Python

Treat prompts as versioned Python functions with automatic tracking, visualization, and A/B testing. Like Git for your AI prompts with a beautiful studio UI.

Script Depot 9Prompts
💬

Prompt Injection Defense — Security Guide for LLM Apps

Comprehensive security guide for defending LLM applications against prompt injection, jailbreaks, data exfiltration, and indirect attacks. Includes defense patterns, code examples, and testing strategies.

Prompt Lab 9Prompts

Claude Code Hooks — Custom Automation Recipes

Collection of ready-to-use Claude Code hook recipes for automating code formatting, testing, notifications, and security checks. Copy-paste into settings.json. Community-maintained.

Skill Factory 6Skills
💬

Build Your Own MCP Server — Step-by-Step Guide

Complete guide to building a custom MCP server from scratch. Covers the protocol, TypeScript and Python SDKs, tool definition, resource management, testing, and deployment patterns.

Prompt Lab 5Prompts
📜

Promptfoo — LLM Eval & Red-Team Testing Framework

Open-source framework for evaluating and red-teaming LLM applications. Test prompts across models, detect jailbreaks, measure quality, and catch regressions. 5,000+ GitHub stars.

Agent Toolkit 5Scripts
💬

LangSmith — Prompt Debugging and LLM Observability

Debug, test, and monitor LLM applications in production. LangSmith provides trace visualization, prompt playground, dataset evaluation, and regression testing for AI.

Prompt Lab 3Prompts

Neon — Serverless Postgres with Database Branching

Serverless PostgreSQL with instant database branching, autoscaling, and a generous free tier. Branch your database like git branches — test schema changes without touching production. 16,000+ stars.

MCP Hub 38MCP Configs
📜

Bun — All-in-One JavaScript Runtime

Fast JavaScript runtime, bundler, test runner, and package manager in one tool. Drop-in Node.js replacement. 88K+ GitHub stars.

Script Depot 36Scripts
📜

Great Expectations — Data Validation for AI Pipelines

Test your data like you test code. Validate data quality in AI/ML pipelines with expressive assertions, auto-profiling, and data docs. Apache-2.0, 11,400+ stars.

Script Depot 33Scripts

Systematic Debugging — 4-Phase Root Cause Protocol

Claude Code skill that enforces a scientific 4-phase debugging methodology: investigate, analyze patterns, test hypotheses, then fix. Achieves 95% first-time fix rate vs 40% with ad-hoc approaches.

Skill Factory 30Skills
📜

Bun — All-in-One JavaScript Runtime & Toolkit

Ultra-fast JavaScript runtime, bundler, test runner, and package manager in one tool. 4x faster than Node.js, drop-in compatible. Written in Zig with JavaScriptCore engine. 78,000+ stars.

Script Depot 25Scripts

Nuxt + Go-Zero Quality Audit Skill — 30 Checks from 250 Real Bugs

Production-tested quality check skill for Nuxt SSR + Go-Zero + MySQL projects. 30 automated checks across 7 dimensions (security, race conditions, transactions, frontend SSR, dependencies, API contracts, ops) — distilled from 10 rounds of Codex audit that found ~250 real issues in a live SaaS product.

henuwangkai 24代码

Hoppscotch — Open-Source API Development Platform

Test APIs with a beautiful UI. REST, GraphQL, WebSocket, SSE, and gRPC. Self-hostable Postman alternative. 78K+ GitHub stars.

AI Open Source 23Workflows
📜

FastAPI — Build AI Backend APIs in Minutes

Modern Python web framework for building AI backend APIs. FastAPI provides automatic OpenAPI docs, async support, Pydantic validation, and the fastest Python web performance.

Script Depot 20Scripts

Dagger — Programmable CI/CD Engine

Run CI/CD pipelines as code — locally, in CI, or in the cloud. Replace YAML with real programming languages. Cacheable, portable, testable. 15.6K+ stars.

Script Depot 20Scripts
📜

Ragas — Evaluate RAG & LLM Applications

Ragas evaluates LLM applications with objective metrics, test data generation, and data-driven insights. 13.2K+ GitHub stars. RAG evaluation, auto test generation. Apache 2.0.

Script Depot 18Scripts

Haystack MCP — Connect AI Pipelines to MCP Clients

Expose Haystack RAG pipelines as MCP servers. Let Claude Code and other AI tools query your document search, QA, and retrieval pipelines through the MCP protocol.

Skill Factory 18MCP Configs
⚙️

OpenRouter — Unified LLM API with Smart Routing

Single API endpoint for 200+ LLM models with automatic fallbacks, price comparison, and usage tracking. Route to the cheapest or fastest model that fits your needs. 3,000+ stars.

AI Open Source 16Configs
📜

Lefthook — Fast Git Hooks Manager in Go

Blazing-fast Git hooks manager written in Go. Run linters, formatters, and tests on git commit/push in parallel. Zero-dependency single binary. Replaces Husky + lint-staged. 5,000+ stars.

Script Depot 15Scripts

Cursor Rules: Python — Clean Code with AI

Cursor rules for Python development. Enforces PEP 8 style, type hints, docstrings, pytest patterns, and modern Python 3.12+ idioms.

AI Open Source 14Configs

Evidently — ML & LLM Monitoring with 100+ Metrics

Evaluate, test, and monitor AI systems with 100+ built-in metrics for data drift, model quality, and LLM output. 7.3K+ stars.

AI Open Source 13Workflows

AI-Powered Testing

AI-Powered Testing

AI testing tools in 2026 don't just generate tests — they understand your codebase well enough to write meaningful tests. Unit Test Generation — AI agents that analyze your functions, understand edge cases, and generate comprehensive test suites with proper mocking, assertions, and cleanup. They cover happy paths, error scenarios, and boundary conditions automatically.

Integration & E2E Testing — AI tools that generate Playwright, Cypress, or Puppeteer tests from user flow descriptions. They understand component interactions, API contracts, and state management — producing tests that catch real bugs, not just visual regressions. Test Maintenance — AI agents that detect flaky tests, suggest fixes for broken selectors, and update test assertions when intended behavior changes.

Coverage Analysis — Beyond line coverage, AI tools identify untested business logic, missing edge cases, and areas where tests exist but don't actually validate meaningful behavior. They prioritize which new tests will have the highest impact on reliability.

The best test suite is one that writes itself — and knows which tests matter most.

Frequently Asked Questions

Can AI write good unit tests?+

Yes, with caveats. AI generates excellent structural tests — correct setup, teardown, mocking, and assertions. It handles edge cases, error paths, and boundary conditions well. Where it falls short: tests that require deep domain knowledge or understanding of complex business rules. Best approach: use AI for the 80% of tests that are structural, write the 20% requiring domain expertise yourself.

How does AI help with test maintenance?+

AI test maintenance tools detect flaky tests (tests that pass/fail inconsistently), identify the root cause (timing issues, shared state, external dependencies), and suggest fixes. They also update test selectors when UI changes, regenerate snapshots, and flag tests that no longer cover the code they're supposed to test.

What AI testing agent skills are available?+

TokRepo hosts agent skills for automated test generation (unit, integration, E2E), coverage gap analysis, test refactoring, and performance testing. Install them in Claude Code with one command, and your AI assistant can generate tests for any file you're working on, following your project's testing conventions.

Explore Related Categories