ScriptsApr 16, 2026·3 min read

Kubeflow — Machine Learning Toolkit for Kubernetes

An open-source platform for deploying, orchestrating, and managing ML workflows on Kubernetes. Kubeflow brings portable and scalable machine learning pipelines, notebook servers, model training, and serving to any Kubernetes cluster.

TL;DR
Kubeflow deploys ML pipelines, notebook servers, distributed training, hyperparameter tuning, and model serving on any Kubernetes cluster.
§01

What it is

Kubeflow is an open-source platform for deploying, orchestrating, and managing machine learning workflows on Kubernetes. It provides Jupyter notebook servers, ML pipeline orchestration (Kubeflow Pipelines), distributed model training, hyperparameter tuning (Katib), and model serving (KServe). All components run as Kubernetes-native resources.

Kubeflow targets ML engineers and platform teams who run Kubernetes and want a standardized way to manage the ML lifecycle. It makes ML workflows portable across any Kubernetes cluster, whether on-premises, on cloud, or hybrid.

§02

How it saves time or tokens

Kubeflow eliminates the need to build custom infrastructure for each ML workflow stage. Pipelines define reproducible multi-step workflows as code. Katib automates hyperparameter search across multiple trials. KServe handles model deployment with auto-scaling and A/B testing. Everything runs on Kubernetes, so you leverage existing cluster management skills and infrastructure.

§03

How to use

  1. Install Kubeflow on an existing Kubernetes cluster: kubectl apply -k 'github.com/kubeflow/manifests/example?ref=v1.9'.
  2. Access the dashboard by port-forwarding: kubectl port-forward svc/istio-ingressgateway -n istio-system 8080:80.
  3. Create notebooks, build pipelines, and submit training jobs through the web UI or SDK.
§04

Example

# Install Kubeflow on existing K8s cluster
kubectl apply -k 'github.com/kubeflow/manifests/example?ref=v1.9'

# Access the dashboard
kubectl port-forward svc/istio-ingressgateway -n istio-system 8080:80
# Open http://localhost:8080
# Define a Kubeflow Pipeline
from kfp import dsl

@dsl.component
def preprocess(data_path: str) -> str:
    # preprocessing logic
    return processed_path

@dsl.component
def train(data_path: str) -> str:
    # training logic
    return model_path

@dsl.pipeline(name='ML Pipeline')
def ml_pipeline(data_path: str):
    preprocess_task = preprocess(data_path=data_path)
    train(data_path=preprocess_task.output)
§05

Related on TokRepo

§06

Common pitfalls

  • Kubeflow requires a functioning Kubernetes cluster with sufficient resources. The full installation consumes significant CPU and memory. Consider starting with a minimal profile.
  • Istio is a dependency for the full Kubeflow installation. If your cluster already runs a different service mesh, there may be conflicts.
  • Kubeflow Pipelines v2 uses a different SDK and pipeline format than v1. Check which version your installation supports before writing pipelines.

Frequently Asked Questions

Does Kubeflow require a specific Kubernetes provider?+

No. Kubeflow runs on any Kubernetes cluster including GKE, EKS, AKS, and on-premises clusters. The installation uses standard Kubernetes resources. Some cloud providers offer pre-configured Kubeflow distributions.

What is Kubeflow Pipelines?+

Kubeflow Pipelines is a component for defining and running multi-step ML workflows as directed acyclic graphs (DAGs). Each step runs in a container, and the pipeline handles data passing between steps, caching, and retry logic.

How does Katib work for hyperparameter tuning?+

Katib is Kubeflow's hyperparameter tuning system. You define a search space (learning rate, batch size, etc.), an objective metric, and a search algorithm (random, Bayesian, grid). Katib launches parallel trials and tracks the best configuration.

What is KServe?+

KServe (formerly KFServing) is Kubeflow's model serving component. It deploys trained models as scalable inference endpoints with auto-scaling, canary rollouts, and support for TensorFlow, PyTorch, XGBoost, and custom serving runtimes.

How much cluster resources does Kubeflow need?+

A minimal Kubeflow installation needs at least 4 CPUs and 8GB RAM. The full installation with all components requires more. Start with a minimal profile and enable components as needed.

Citations (3)

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets