Configs2026年4月15日·1 分钟阅读

Karpenter — Just-in-Time Kubernetes Node Autoscaler for AWS

AWS-origin cluster autoscaler that launches the right EC2 instance shape and size for pending pods in ~30 seconds.

Introduction

Karpenter is an AWS-originated, now CNCF-hosted, open-source Kubernetes node autoscaler that watches pending pods and launches the right-size EC2 instance in ~30 seconds — an order of magnitude faster than Cluster Autoscaler + ASGs. It also continuously right-sizes and consolidates nodes, cutting compute spend by 30-60% for many workloads.

What Karpenter Does

  • Watches Pending pods and computes the cheapest instance type that fits
  • Launches the node directly via EC2 RunInstances (no ASG round trip)
  • Binds pending pods to the new node as soon as kubelet registers
  • Continuously consolidates underutilized nodes by moving pods and terminating idle hosts
  • Supports Spot, On-Demand, Graviton, GPU, and Bottlerocket AMIs

Architecture Overview

Karpenter runs as a pair of Deployments in kube-system: the controller watches pod events and reconciles NodePool + NodeClass CRDs, and the webhook validates them. It calls EC2 APIs via IRSA (IAM Roles for Service Accounts) — so no long-lived credentials. Since v1.0 it supports the Cluster API interface, and community providers for Azure and GCP exist.

Self-Hosting & Configuration

  • Requires EKS or a Kubernetes cluster with IRSA / Pod Identity
  • IAM role needs EC2 RunInstances, TerminateInstances, DescribeSubnets, etc.
  • NodePool defines workload requirements; EC2NodeClass defines AMI + userdata
  • Tune with limits.cpu, disruption.budgets, consolidationPolicy
  • Export metrics to Prometheus; dashboards ship on grafana.com

Key Features

  • ~30 second scale-up latency (vs 2-5 min for Cluster Autoscaler)
  • Picks the optimal instance type per batch of pending pods
  • Consolidates under-utilized nodes automatically
  • Native Spot interruption handling with graceful draining
  • No Node Group / ASG sprawl — one NodePool covers dozens of instance shapes

Comparison with Similar Tools

  • Cluster Autoscaler — the classic; scales ASG node groups, slower, simpler
  • KEDA — scales workloads, not nodes; often paired with Karpenter
  • Azure AKS Karpenter Provider — Karpenter for Azure (community, graduating)
  • GKE Autopilot — managed equivalent; hides nodes entirely, GCP-only
  • Fargate — serverless pods, no node concept; simpler but pricier for steady load

FAQ

Q: Does Karpenter replace Cluster Autoscaler? A: Yes on EKS. Uninstall CA and let Karpenter manage all elastic capacity; keep static NG for system pods.

Q: Spot interruption handling? A: Karpenter subscribes to EventBridge rebalance/interruption notifications and pre-drains nodes.

Q: Can it run on non-AWS clusters? A: Core is cloud-agnostic; community AKSNodeClass works on Azure AKS; GCP provider is in progress.

Q: How does consolidation decide which nodes to kill? A: It simulates rescheduling pods to cheaper / fewer nodes and executes when the delta is positive.

Sources

讨论

登录后参与讨论。
还没有评论,来写第一条吧。

相关资产