Kubeflow — Machine Learning Toolkit for Kubernetes
An open-source platform for deploying, orchestrating, and managing ML workflows on Kubernetes. Kubeflow brings portable and scalable machine learning pipelines, notebook servers, model training, and serving to any Kubernetes cluster.
Ready-to-run agent install
This asset can be installed after the agent chooses its runtime, checks the plan, and runs the matching command.
npx -y tokrepo@latest install 984a00dd-398f-11f1-9bc6-00163e2b0d79 --target codexRun after dry-run confirms the install plan.
What it is
Kubeflow is an open-source platform for deploying, orchestrating, and managing machine learning workflows on Kubernetes. It provides Jupyter notebook servers, ML pipeline orchestration (Kubeflow Pipelines), distributed model training, hyperparameter tuning (Katib), and model serving (KServe). All components run as Kubernetes-native resources.
Kubeflow targets ML engineers and platform teams who run Kubernetes and want a standardized way to manage the ML lifecycle. It makes ML workflows portable across any Kubernetes cluster, whether on-premises, on cloud, or hybrid.
How it saves time or tokens
Kubeflow eliminates the need to build custom infrastructure for each ML workflow stage. Pipelines define reproducible multi-step workflows as code. Katib automates hyperparameter search across multiple trials. KServe handles model deployment with auto-scaling and A/B testing. Everything runs on Kubernetes, so you leverage existing cluster management skills and infrastructure.
How to use
- Install Kubeflow on an existing Kubernetes cluster:
kubectl apply -k 'github.com/kubeflow/manifests/example?ref=v1.9'. - Access the dashboard by port-forwarding:
kubectl port-forward svc/istio-ingressgateway -n istio-system 8080:80. - Create notebooks, build pipelines, and submit training jobs through the web UI or SDK.
Example
# Install Kubeflow on existing K8s cluster
kubectl apply -k 'github.com/kubeflow/manifests/example?ref=v1.9'
# Access the dashboard
kubectl port-forward svc/istio-ingressgateway -n istio-system 8080:80
# Open http://localhost:8080
# Define a Kubeflow Pipeline
from kfp import dsl
@dsl.component
def preprocess(data_path: str) -> str:
# preprocessing logic
return processed_path
@dsl.component
def train(data_path: str) -> str:
# training logic
return model_path
@dsl.pipeline(name='ML Pipeline')
def ml_pipeline(data_path: str):
preprocess_task = preprocess(data_path=data_path)
train(data_path=preprocess_task.output)
Related on TokRepo
- DevOps Tools — Kubernetes and infrastructure automation
- Automation Tools — ML and data pipeline automation
Common pitfalls
- Kubeflow requires a functioning Kubernetes cluster with sufficient resources. The full installation consumes significant CPU and memory. Consider starting with a minimal profile.
- Istio is a dependency for the full Kubeflow installation. If your cluster already runs a different service mesh, there may be conflicts.
- Kubeflow Pipelines v2 uses a different SDK and pipeline format than v1. Check which version your installation supports before writing pipelines.
Frequently Asked Questions
No. Kubeflow runs on any Kubernetes cluster including GKE, EKS, AKS, and on-premises clusters. The installation uses standard Kubernetes resources. Some cloud providers offer pre-configured Kubeflow distributions.
Kubeflow Pipelines is a component for defining and running multi-step ML workflows as directed acyclic graphs (DAGs). Each step runs in a container, and the pipeline handles data passing between steps, caching, and retry logic.
Katib is Kubeflow's hyperparameter tuning system. You define a search space (learning rate, batch size, etc.), an objective metric, and a search algorithm (random, Bayesian, grid). Katib launches parallel trials and tracks the best configuration.
KServe (formerly KFServing) is Kubeflow's model serving component. It deploys trained models as scalable inference endpoints with auto-scaling, canary rollouts, and support for TensorFlow, PyTorch, XGBoost, and custom serving runtimes.
A minimal Kubeflow installation needs at least 4 CPUs and 8GB RAM. The full installation with all components requires more. Start with a minimal profile and enable components as needed.
Citations (3)
- Kubeflow GitHub— Kubeflow provides ML pipelines, notebook servers, training, and serving on Kuber…
- Kubeflow Documentation— Kubeflow Pipelines for reproducible ML workflows
- Kubeflow Official Site— Kubernetes-native machine learning platform architecture
Related on TokRepo
Discussion
Related Assets
SHAP — Explain Any Machine Learning Model
Game-theoretic approach to explain the output of any machine learning model using Shapley values from cooperative game theory.
Feast — Open Source Feature Store for Machine Learning
Feast is an open-source feature store that manages and serves machine learning features for training and inference. It bridges the gap between data engineering and ML by providing a consistent feature retrieval layer backed by offline and online stores.
H2O-3 — Scalable Open-Source Machine Learning Platform
An in-memory distributed machine learning platform with AutoML support, offering gradient boosting, deep learning, GLM, and more through Python, R, and Java APIs.
Auto-Sklearn — Automated Machine Learning with Scikit-Learn
Auto-Sklearn is an AutoML toolkit that automatically selects scikit-learn algorithms and tunes hyperparameters using Bayesian optimization, meta-learning, and ensemble construction to build high-accuracy models.