Introduction
The Kubernetes scheduler only acts when a pod is first created. Over time, node conditions change and pods can end up on suboptimal nodes. Descheduler evicts those pods so the scheduler can re-place them, improving resource utilization and honoring updated affinity or topology constraints.
What Descheduler Does
- Evicts pods from overutilized nodes to balance CPU and memory across the cluster
- Removes pods that violate inter-pod anti-affinity or node affinity rules added after scheduling
- Detects duplicate pods on the same node and redistributes them
- Evicts pods on tainted nodes that no longer tolerate the taint
- Runs as a CronJob, Deployment, or one-shot Job inside the cluster
Architecture Overview
Descheduler is a single Go binary that connects to the Kubernetes API server, reads node and pod status, and applies a configurable set of strategy plugins. Each plugin evaluates pods against a specific criterion (utilization, affinity, topology spread) and marks candidates for eviction. The eviction respects PodDisruptionBudgets and priority classes to avoid service disruption.
Self-Hosting & Configuration
- Deploy via Helm chart, kustomize overlay, or raw Job manifest
- Policy is defined in a ConfigMap or DeschedulerPolicy custom resource
- Strategies are individually enabled with per-strategy parameters (thresholds, namespaces, label selectors)
- Runs in dry-run mode to preview evictions without acting
- CronJob schedule controls how frequently rebalancing occurs
Key Features
- Plugin architecture with 10+ built-in strategies (LowNodeUtilization, RemoveDuplicates, RemovePodsViolatingTopologySpreadConstraint, etc.)
- Respects PodDisruptionBudgets to maintain application availability during evictions
- Namespace and label-selector scoping to limit blast radius
- Dry-run mode for safe evaluation before enabling evictions
- Kubernetes SIG-sponsored project with active community maintenance
Comparison with Similar Tools
- Kubernetes Cluster Autoscaler — Scales nodes up/down but does not rebalance existing pods
- KEDA — Event-driven scaling of workloads; complements but does not replace rebalancing
- Goldilocks — Recommends resource requests/limits but does not evict or move pods
- kube-scheduler-simulator — Simulates scheduling decisions; useful alongside descheduler for testing
- Karpenter — Provisions right-sized nodes; can reduce the need for rebalancing but serves a different purpose
FAQ
Q: Will descheduler cause downtime for my services? A: It respects PodDisruptionBudgets. If your deployments have proper PDBs, at least the minimum number of pods remains running.
Q: Can I exclude certain pods or namespaces? A: Yes. Each strategy supports namespace inclusion/exclusion lists and label selectors. System-critical pods (priority >= system-cluster-critical) are never evicted.
Q: How does it handle stateful workloads? A: By default it skips pods with local storage. You can override this, but evicting StatefulSet pods requires careful PDB configuration.
Q: Does descheduler work with custom schedulers? A: It evicts pods; the re-scheduling is handled by whatever scheduler is configured for the pod, including custom schedulers.