ConfigsMay 11, 2026·3 min read

Seldon Core — ML Model Serving on Kubernetes

An MLOps framework for deploying, monitoring, and managing machine learning models at scale on Kubernetes.

Introduction

Seldon Core is an open-source platform for deploying machine learning models on Kubernetes. It converts trained models into production REST/gRPC microservices with built-in monitoring, A/B testing, and canary deployments, bridging the gap between data science and production operations.

What Seldon Core Does

  • Deploys ML models as Kubernetes-native microservices via custom resources
  • Supports pre-built servers for scikit-learn, XGBoost, TensorFlow, PyTorch, and Triton
  • Provides inference graphs for multi-model pipelines with routers, combiners, and transformers
  • Enables A/B testing and canary rollouts with traffic splitting
  • Integrates with Prometheus and Grafana for request metrics and model monitoring

Architecture Overview

Seldon Core runs as a Kubernetes operator that watches SeldonDeployment custom resources. When a deployment is created, the operator generates the required pods, services, and Istio/Ambassador virtual services. Each inference server wraps the model in a standardized REST/gRPC interface. An orchestrator sidecar routes requests through multi-step inference graphs, handling request transformation and response aggregation.

Self-Hosting & Configuration

  • Install via Helm into any Kubernetes cluster (1.18+)
  • Requires an ingress controller (Istio or Ambassador) for external access
  • Define models as SeldonDeployment YAML manifests with modelUri pointing to S3, GCS, or PVC
  • Configure autoscaling with Kubernetes HPA or KEDA
  • Enable request logging by routing prediction payloads to Elasticsearch

Key Features

  • Language-agnostic model wrapping with pre-built and custom inference servers
  • Inference graph DSL for chaining models, transformers, and routers
  • Drift and outlier detection via Alibi Detect integration
  • Explainability endpoints using Alibi Explain for model transparency
  • V2 inference protocol compatible with KServe and Triton standards

Comparison with Similar Tools

  • KServe — lighter-weight serverless inference; Seldon Core offers richer inference graph composition
  • BentoML — packaging-focused with BentoCloud; Seldon Core is Kubernetes-native from the start
  • Triton Inference Server — NVIDIA runtime engine; Seldon Core orchestrates Triton as one backend
  • Ray Serve — Python-first with Ray ecosystem; Seldon Core uses Kubernetes-native deployment model

FAQ

Q: Which model frameworks does Seldon Core support? A: scikit-learn, XGBoost, TensorFlow, PyTorch, ONNX, Triton, and custom Python/Java/Go servers.

Q: Can I run Seldon Core without Istio? A: Yes. Ambassador or nginx ingress controllers work as alternatives to Istio.

Q: How does Seldon Core handle model versioning? A: Deploy multiple model versions as separate predictors with traffic-split percentages for gradual rollout.

Q: Is there a managed cloud offering? A: Yes. Seldon Deploy provides an enterprise platform with a management UI, audit trails, and enhanced monitoring.

Sources

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets