Introduction
Kuberhealthy runs synthetic monitoring checks inside Kubernetes clusters to validate that the cluster and its components actually work from a workload perspective. Rather than just checking component status, Kuberhealthy launches real pods that exercise DNS resolution, create deployments, test network connectivity, and validate storage provisioning, surfacing failures before they affect users.
What Kuberhealthy Does
- Runs health check pods on a schedule to validate cluster subsystems
- Tests DNS resolution, deployment creation, pod scheduling, and network connectivity
- Exposes check results via a Prometheus-compatible metrics endpoint
- Provides a built-in status page showing check results across the cluster
- Supports custom check images for application-specific health validation
Architecture Overview
Kuberhealthy runs as a deployment that watches KuberhealthyCheck (khcheck) CRDs. Each check defines a container image, schedule, and timeout. On each interval, the operator creates a Job that runs the check container. The check container performs its validation and reports success or failure back to the Kuberhealthy API. Results are stored as KuberhealthyState (khstate) custom resources and exposed as Prometheus metrics.
Self-Hosting & Configuration
- Deploy via Helm chart with default health checks included
- Enable built-in checks for DNS, deployment, daemonset, and pod status
- Create custom KuberhealthyCheck CRDs with your own check container images
- Configure check intervals, timeouts, and alert thresholds per check
- Scrape the /metrics endpoint with Prometheus for alerting integration
Key Features
- Real pod-based checks validate cluster behavior from a workload perspective
- Pre-built checks cover DNS, deployment lifecycle, pod restart, and network
- Custom check framework lets you write checks in any language as a container
- Prometheus metrics endpoint integrates with existing monitoring stacks
- Namespace-scoped and cluster-scoped checks for multi-tenant monitoring
Comparison with Similar Tools
- Prometheus kube-state-metrics — reports Kubernetes object states but does not perform active validation
- Goldpinger — specifically tests pod-to-pod network connectivity, narrower scope
- Gatus — external endpoint monitoring, not cluster-internal synthetic checks
- Healthchecks — cron job monitoring service, not Kubernetes-native synthetic testing
FAQ
Q: What built-in checks does Kuberhealthy include? A: Built-in checks include DNS resolution, deployment creation and deletion, daemonset scheduling, pod restart detection, and HTTP endpoint availability.
Q: Can I write custom health checks? A: Yes. A custom check is any container image that calls the Kuberhealthy API to report success or failure. You can write checks in Go, Python, Bash, or any language.
Q: How does Kuberhealthy integrate with alerting? A: Kuberhealthy exposes check results as Prometheus metrics. Configure Prometheus AlertManager rules on these metrics to trigger alerts when checks fail.
Q: Does running health checks consume significant cluster resources? A: Checks run as short-lived pods with configurable resource limits. The default checks use minimal resources and run infrequently, so cluster overhead is negligible.