What is Seldon Core — ML Model Serving on Kubernetes?

An MLOps framework for deploying, monitoring, and managing machine learning models at scale on Kubernetes.

Is Seldon Core — ML Model Serving on Kubernetes free to use?

Yes. Seldon Core — ML Model Serving on Kubernetes is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install Seldon Core — ML Model Serving on Kubernetes?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

Seldon Core — ML Model Serving on Kubernetes

Introduction

Seldon Core is an open-source platform for deploying machine learning models on Kubernetes. It converts trained models into production REST/gRPC microservices with built-in monitoring, A/B testing, and canary deployments, bridging the gap between data science and production operations.

What Seldon Core Does

Deploys ML models as Kubernetes-native microservices via custom resources
Supports pre-built servers for scikit-learn, XGBoost, TensorFlow, PyTorch, and Triton
Provides inference graphs for multi-model pipelines with routers, combiners, and transformers
Enables A/B testing and canary rollouts with traffic splitting
Integrates with Prometheus and Grafana for request metrics and model monitoring

Architecture Overview

Seldon Core runs as a Kubernetes operator that watches SeldonDeployment custom resources. When a deployment is created, the operator generates the required pods, services, and Istio/Ambassador virtual services. Each inference server wraps the model in a standardized REST/gRPC interface. An orchestrator sidecar routes requests through multi-step inference graphs, handling request transformation and response aggregation.

Self-Hosting & Configuration

Install via Helm into any Kubernetes cluster (1.18+)
Requires an ingress controller (Istio or Ambassador) for external access
Define models as SeldonDeployment YAML manifests with modelUri pointing to S3, GCS, or PVC
Configure autoscaling with Kubernetes HPA or KEDA
Enable request logging by routing prediction payloads to Elasticsearch

Key Features

Language-agnostic model wrapping with pre-built and custom inference servers
Inference graph DSL for chaining models, transformers, and routers
Drift and outlier detection via Alibi Detect integration
Explainability endpoints using Alibi Explain for model transparency
V2 inference protocol compatible with KServe and Triton standards

Comparison with Similar Tools

KServe — lighter-weight serverless inference; Seldon Core offers richer inference graph composition
BentoML — packaging-focused with BentoCloud; Seldon Core is Kubernetes-native from the start
Triton Inference Server — NVIDIA runtime engine; Seldon Core orchestrates Triton as one backend
Ray Serve — Python-first with Ray ecosystem; Seldon Core uses Kubernetes-native deployment model

FAQ

Q: Which model frameworks does Seldon Core support? A: scikit-learn, XGBoost, TensorFlow, PyTorch, ONNX, Triton, and custom Python/Java/Go servers.

Q: Can I run Seldon Core without Istio? A: Yes. Ambassador or nginx ingress controllers work as alternatives to Istio.

Q: How does Seldon Core handle model versioning? A: Deploy multiple model versions as separate predictors with traffic-split percentages for gradual rollout.

Q: Is there a managed cloud offering? A: Yes. Seldon Deploy provides an enterprise platform with a management UI, audit trails, and enhanced monitoring.

Seldon Core — ML Model Serving on Kubernetes

Ready-to-run agent install

Introduction

What Seldon Core Does

Architecture Overview

Self-Hosting & Configuration

Key Features

Comparison with Similar Tools

FAQ

Sources

Discussion

Related Assets

BentoML — Build AI Model Serving APIs

Weights & Biases — ML Experiment Tracking

Metrics Server — Lightweight Core Metrics for Kubernetes Autoscaling

Kedro — Production-Ready ML Pipeline Framework for Python