SkillsMay 11, 2026·3 min read

Seldon Core — ML Model Serving on Kubernetes

An MLOps framework for deploying, monitoring, and managing machine learning models at scale on Kubernetes.

Agent ready

This asset can be read and installed directly by agents

TokRepo exposes a universal CLI command, install contract, metadata JSON, adapter-aware plan, and raw content links so agents can judge fit, risk, and next actions.

Native · 98/100Policy: allow
Agent surface
Any MCP/CLI agent
Kind
Skill
Install
Single
Trust
Trust: Established
Entrypoint
Seldon Core Overview
Universal CLI install command
npx tokrepo install f0a44fcc-4cd0-11f1-9bc6-00163e2b0d79

Introduction

Seldon Core is an open-source platform for deploying machine learning models on Kubernetes. It converts trained models into production REST/gRPC microservices with built-in monitoring, A/B testing, and canary deployments, bridging the gap between data science and production operations.

What Seldon Core Does

  • Deploys ML models as Kubernetes-native microservices via custom resources
  • Supports pre-built servers for scikit-learn, XGBoost, TensorFlow, PyTorch, and Triton
  • Provides inference graphs for multi-model pipelines with routers, combiners, and transformers
  • Enables A/B testing and canary rollouts with traffic splitting
  • Integrates with Prometheus and Grafana for request metrics and model monitoring

Architecture Overview

Seldon Core runs as a Kubernetes operator that watches SeldonDeployment custom resources. When a deployment is created, the operator generates the required pods, services, and Istio/Ambassador virtual services. Each inference server wraps the model in a standardized REST/gRPC interface. An orchestrator sidecar routes requests through multi-step inference graphs, handling request transformation and response aggregation.

Self-Hosting & Configuration

  • Install via Helm into any Kubernetes cluster (1.18+)
  • Requires an ingress controller (Istio or Ambassador) for external access
  • Define models as SeldonDeployment YAML manifests with modelUri pointing to S3, GCS, or PVC
  • Configure autoscaling with Kubernetes HPA or KEDA
  • Enable request logging by routing prediction payloads to Elasticsearch

Key Features

  • Language-agnostic model wrapping with pre-built and custom inference servers
  • Inference graph DSL for chaining models, transformers, and routers
  • Drift and outlier detection via Alibi Detect integration
  • Explainability endpoints using Alibi Explain for model transparency
  • V2 inference protocol compatible with KServe and Triton standards

Comparison with Similar Tools

  • KServe — lighter-weight serverless inference; Seldon Core offers richer inference graph composition
  • BentoML — packaging-focused with BentoCloud; Seldon Core is Kubernetes-native from the start
  • Triton Inference Server — NVIDIA runtime engine; Seldon Core orchestrates Triton as one backend
  • Ray Serve — Python-first with Ray ecosystem; Seldon Core uses Kubernetes-native deployment model

FAQ

Q: Which model frameworks does Seldon Core support? A: scikit-learn, XGBoost, TensorFlow, PyTorch, ONNX, Triton, and custom Python/Java/Go servers.

Q: Can I run Seldon Core without Istio? A: Yes. Ambassador or nginx ingress controllers work as alternatives to Istio.

Q: How does Seldon Core handle model versioning? A: Deploy multiple model versions as separate predictors with traffic-split percentages for gradual rollout.

Q: Is there a managed cloud offering? A: Yes. Seldon Deploy provides an enterprise platform with a management UI, audit trails, and enhanced monitoring.

Sources

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets