Introduction
Karpenter is an AWS-originated, now CNCF-hosted, open-source Kubernetes node autoscaler that watches pending pods and launches the right-size EC2 instance in ~30 seconds — an order of magnitude faster than Cluster Autoscaler + ASGs. It also continuously right-sizes and consolidates nodes, cutting compute spend by 30-60% for many workloads.
What Karpenter Does
- Watches
Pendingpods and computes the cheapest instance type that fits - Launches the node directly via EC2
RunInstances(no ASG round trip) - Binds pending pods to the new node as soon as kubelet registers
- Continuously consolidates underutilized nodes by moving pods and terminating idle hosts
- Supports Spot, On-Demand, Graviton, GPU, and Bottlerocket AMIs
Architecture Overview
Karpenter runs as a pair of Deployments in kube-system: the controller watches pod events and reconciles NodePool + NodeClass CRDs, and the webhook validates them. It calls EC2 APIs via IRSA (IAM Roles for Service Accounts) — so no long-lived credentials. Since v1.0 it supports the Cluster API interface, and community providers for Azure and GCP exist.
Self-Hosting & Configuration
- Requires EKS or a Kubernetes cluster with IRSA / Pod Identity
- IAM role needs EC2 RunInstances, TerminateInstances, DescribeSubnets, etc.
NodePooldefines workload requirements;EC2NodeClassdefines AMI + userdata- Tune with
limits.cpu,disruption.budgets,consolidationPolicy - Export metrics to Prometheus; dashboards ship on grafana.com
Key Features
- ~30 second scale-up latency (vs 2-5 min for Cluster Autoscaler)
- Picks the optimal instance type per batch of pending pods
- Consolidates under-utilized nodes automatically
- Native Spot interruption handling with graceful draining
- No Node Group / ASG sprawl — one
NodePoolcovers dozens of instance shapes
Comparison with Similar Tools
- Cluster Autoscaler — the classic; scales ASG node groups, slower, simpler
- KEDA — scales workloads, not nodes; often paired with Karpenter
- Azure AKS Karpenter Provider — Karpenter for Azure (community, graduating)
- GKE Autopilot — managed equivalent; hides nodes entirely, GCP-only
- Fargate — serverless pods, no node concept; simpler but pricier for steady load
FAQ
Q: Does Karpenter replace Cluster Autoscaler? A: Yes on EKS. Uninstall CA and let Karpenter manage all elastic capacity; keep static NG for system pods.
Q: Spot interruption handling? A: Karpenter subscribes to EventBridge rebalance/interruption notifications and pre-drains nodes.
Q: Can it run on non-AWS clusters?
A: Core is cloud-agnostic; community AKSNodeClass works on Azure AKS; GCP provider is in progress.
Q: How does consolidation decide which nodes to kill? A: It simulates rescheduling pods to cheaper / fewer nodes and executes when the delta is positive.