Introduction
Cluster API (CAPI) is a Kubernetes sub-project under SIG Cluster Lifecycle that brings declarative, Kubernetes-style APIs to cluster creation, configuration, and management. Instead of using provider-specific CLI tools or Terraform scripts to manage Kubernetes clusters, CAPI lets you define clusters as Kubernetes custom resources that are reconciled by controllers running in a management cluster.
What Cluster API Does
- Provisions Kubernetes clusters on AWS, Azure, GCP, vSphere, OpenStack, bare metal, and more
- Manages the full cluster lifecycle: creation, scaling, upgrades, and deletion
- Uses Kubernetes-native CRDs (Cluster, Machine, MachineDeployment, MachineSet) for declarative management
- Supports pluggable infrastructure and bootstrap providers via a well-defined contract
- Enables GitOps workflows by storing cluster definitions as YAML manifests in Git
Architecture Overview
CAPI runs in a management cluster as a set of controllers that watch custom resources. The core controllers handle Cluster and Machine abstractions. Infrastructure providers (e.g., CAPA for AWS, CAPZ for Azure) implement the actual VM and network provisioning. Bootstrap providers (e.g., kubeadm) handle node initialization. Control plane providers manage the Kubernetes control plane lifecycle. When you apply a Cluster CR, the controllers coordinate across these providers to provision infrastructure, bootstrap nodes, and join them into a working cluster.
Self-Hosting & Configuration
- Start with an existing Kubernetes cluster (kind, minikube, or existing cluster) as the management cluster
- Use
clusterctl initto install core CAPI and infrastructure provider components - Configure infrastructure credentials via environment variables or Kubernetes secrets
- Define cluster topology using ClusterClass templates for standardized configurations
- Use
clusterctl moveto migrate CAPI resources between management clusters
Key Features
- Provider ecosystem: 20+ infrastructure providers covering major clouds, virtualization, and bare metal
- ClusterClass: reusable cluster topology templates for standardized fleet management
- MachineHealthCheck: automatic remediation of unhealthy nodes via replacement
- In-place and rolling Kubernetes version upgrades for control plane and workers
- Cluster topology evolution: declarative changes to cluster shape without manual intervention
Comparison with Similar Tools
- Terraform + kubeadm — imperative infrastructure provisioning; CAPI is fully declarative and self-healing
- kOps — cluster lifecycle tool for AWS/GCE; CAPI is provider-agnostic with a pluggable architecture
- Rancher — multi-cluster management platform with UI; CAPI is a lower-level API-first framework
- Crossplane — general-purpose cloud control plane; CAPI is specialized for Kubernetes cluster lifecycle
FAQ
Q: What is a management cluster? A: The management cluster is a Kubernetes cluster where CAPI controllers run. It manages the lifecycle of workload clusters, which are the clusters your applications run on.
Q: Can I manage clusters across multiple clouds? A: Yes. A single management cluster can run multiple infrastructure providers and manage workload clusters on AWS, Azure, GCP, and other platforms simultaneously.
Q: What happens if the management cluster goes down? A: Workload clusters continue running independently. You lose the ability to make lifecycle changes until the management cluster is restored. CAPI resources can be moved to a new management cluster.
Q: Is Cluster API production-ready? A: Yes. CAPI reached v1 (GA) status and is used in production by multiple organizations for managing Kubernetes fleet operations.