ConfigsApr 16, 2026·3 min read

kOps — Production-Grade Kubernetes Cluster Management

Create, upgrade, and manage production Kubernetes clusters on AWS, GCE, and other clouds with kOps, the official Kubernetes operations tool.

TL;DR
kOps automates the full lifecycle of production Kubernetes clusters: creation, upgrades, rolling updates, and teardown.
§01

What it is

kOps (Kubernetes Operations) is the official tool for provisioning and managing production-grade Kubernetes clusters on cloud infrastructure. It automates the full lifecycle -- creation, upgrades, rolling updates, and teardown -- while following best practices for high availability and security.

It is built for platform engineers and DevOps teams who need to run Kubernetes on AWS, GCE, DigitalOcean, or other cloud providers without relying on managed services like EKS or GKE.

§02

How it saves time or tokens

kOps replaces dozens of manual steps (VPC setup, IAM roles, etcd configuration, node provisioning) with a single declarative spec. Rolling upgrades happen with zero downtime. The validate command catches configuration drift before it causes incidents, reducing the debugging cycle.

§03

How to use

  1. Install kOps: brew install kops
  2. Create a state store: export KOPS_STATE_STORE=s3://my-kops-state
  3. Create a cluster: kops create cluster --name=k8s.example.com --zones=us-east-1a
  4. Apply the cluster: kops update cluster --name=k8s.example.com --yes
  5. Validate: kops validate cluster
§04

Example

# Install kOps
brew install kops

# Set up state store
export KOPS_STATE_STORE=s3://my-kops-state

# Create a high-availability cluster
kops create cluster \
  --name=k8s.example.com \
  --zones=us-east-1a,us-east-1b,us-east-1c \
  --master-count=3 \
  --node-count=5 \
  --node-size=t3.large \
  --master-size=t3.medium \
  --networking=calico

# Preview changes
kops update cluster --name=k8s.example.com

# Apply
kops update cluster --name=k8s.example.com --yes

# Validate cluster health
kops validate cluster

# Generate Terraform output for GitOps
kops update cluster --target=terraform --out=./tf
§05

Related on TokRepo

§06

Common pitfalls

  • The S3 state store must be versioned; without versioning, accidental deletions destroy your cluster spec
  • DNS setup is required before cluster creation; kOps needs a real domain or gossip-based DNS (.k8s.local suffix)
  • Upgrading Kubernetes versions requires running both kops update and kops rolling-update in sequence

Frequently Asked Questions

Which cloud providers does kOps support?+

kOps officially supports AWS (most mature), GCE, DigitalOcean, Hetzner, and OpenStack. AWS has the most complete feature set including private topology, bastion hosts, and Terraform output generation.

How does kOps compare to EKS or GKE?+

kOps gives you full control over the Kubernetes control plane, which managed services abstract away. Use kOps when you need custom configurations, specific Kubernetes versions, or want to avoid vendor lock-in. Use managed services when you want less operational overhead.

Can kOps generate Terraform code?+

Yes. Run kops update cluster --target=terraform --out=./tf to generate Terraform HCL files. This enables GitOps workflows where infrastructure changes go through pull request reviews before applying.

How do rolling updates work in kOps?+

kOps drains nodes one at a time, replaces them with updated instances, and waits for the new nodes to become ready before proceeding to the next. This ensures zero-downtime upgrades for both control plane and worker nodes.

Does kOps handle networking?+

kOps supports multiple CNI plugins including Calico, Cilium, Flannel, and others. You choose the networking provider at cluster creation time. Private topology with bastion hosts is also supported on AWS.

Citations (3)

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets