Kubernetes Cluster Autoscaler — Node-Level Autoscaling for K8s

Introduction

Cluster Autoscaler (CA) is the original Kubernetes node autoscaler, maintained in kubernetes/autoscaler. When pods can't be scheduled because there aren't enough nodes, it adds nodes from configured node groups; when nodes sit underutilized, it drains and deletes them. CA runs on every major cloud (AWS, Azure, GCP, Alibaba, OCI, DigitalOcean) and on-prem integrations like Cluster API and OpenStack Magnum.

What Cluster Autoscaler Does

Scales node groups up when Pending pods can't fit on current capacity.
Scales node groups down when utilization drops below thresholds for a grace period.
Respects Pod Disruption Budgets and safe-to-evict annotations during scale-down.
Balances similar node groups (e.g., across AZs) when multi-zone spread is desired.
Integrates with spot/preemptible instance groups via priority and expander strategies.

Architecture Overview

CA runs as a single-replica Deployment in kube-system and watches the scheduler via the Kubernetes API. On each loop it computes the simulated placement of pending pods against existing nodes, then against each node group's "template" node. If scheduling succeeds on group N with delta K, CA asks the cloud provider to scale that group by K. For scale-down, it sorts candidates by utilization, drains each (respecting PDBs), and asks the cloud provider to terminate the instance. Expanders choose between equally-valid groups: random, most-pods, least-waste, priority, price.

Self-Hosting & Configuration

Set IAM/service-account permissions per cloud (Auto Scaling for AWS, MIG for GCP).
Use node-group labels k8s.io/cluster-autoscaler/enabled=true for auto-discovery.
Tune --scale-down-unneeded-time, --scale-down-utilization-threshold, and --max-node-provision-time.
Mark critical Pods with cluster-autoscaler.kubernetes.io/safe-to-evict=false to block eviction.
For spot-heavy clusters, use priority expander with fallback to on-demand groups.

Key Features

Works across all major clouds with a unified CR/flags surface.
Respects PDBs and graceful termination so applications survive scale-down.
Supports "overprovisioning" via low-priority pause pods for snappy scale-out.
Cluster API integration makes CA work on bare-metal and hybrid stacks.
Safe expander model prevents runaway node creation when multiple groups match.

Comparison with Similar Tools

Karpenter — AWS-originated, works from pod specs directly, picks instance types dynamically.
KEDA — scales pods via external metrics; complementary, not a replacement.
CA on Cluster API — same CA binary, but targets CAPI MachineDeployments for hybrid clusters.
kubernetes-sigs/karmada autoscaler — multi-cluster; CA is single-cluster.
Custom scripts — before CA, teams wrote cron jobs against ASGs — strictly worse.

FAQ

Q: Does CA pick instance types? A: No. It scales predefined node groups. For dynamic instance selection, use Karpenter or Node Groups with multiple types.

Q: Can I combine CA with HPA and KEDA? A: Yes — HPA/KEDA scale pods; CA adds nodes when pods can't schedule. That is the recommended combo.

Q: What about GPU nodes? A: CA handles GPU node groups natively if the template node advertises the GPU resource.

Q: How does it interact with Pod Disruption Budgets? A: CA aborts scale-down on a node when draining would violate any PDB.

Kubernetes Cluster Autoscaler — Node-Level Autoscaling for K8s

Introduction

What Cluster Autoscaler Does

Architecture Overview

Self-Hosting & Configuration

Key Features

Comparison with Similar Tools

FAQ

Sources

Discussion

Related Assets

Talos Linux — Immutable, API-Managed OS for Kubernetes

Terragrunt — DRY Orchestration Layer for Terraform & OpenTofu

Chaos Mesh — Cloud-Native Chaos Engineering for Kubernetes