Cette page est affichée en anglais. Une traduction française est en cours.
ConfigsMay 3, 2026·3 min de lecture

PostgresML — Machine Learning Inside PostgreSQL

PostgresML brings machine learning directly into PostgreSQL, allowing you to train models, run inference, and manage embeddings using SQL. No separate ML infrastructure needed — your database is your ML engine.

Introduction

PostgresML is a PostgreSQL extension that integrates machine learning workflows directly into the database. Instead of moving data to external ML services, you train models, generate embeddings, and run predictions where your data already lives — eliminating data movement overhead and simplifying your architecture.

What PostgresML Does

  • Trains classification, regression, and clustering models inside PostgreSQL
  • Generates text embeddings using transformer models with a single SQL call
  • Runs inference on live data without extracting it from the database
  • Provides vector search capabilities for similarity queries
  • Supports XGBoost, LightGBM, scikit-learn, and Hugging Face models

Architecture Overview

PostgresML runs as a PostgreSQL extension written in Rust. It loads ML runtimes (Python, XGBoost, Torch) in-process, giving models direct access to shared memory buffers. Trained models are serialized and stored in PostgreSQL tables, versioned automatically. The query planner can push predicates down into ML operations, enabling efficient batch inference on filtered datasets.

Self-Hosting & Configuration

  • Deploy via Docker image or install the extension on an existing PostgreSQL instance
  • Requires PostgreSQL 14+ with shared_preload_libraries configured
  • GPU acceleration available by mounting NVIDIA devices into the container
  • Configure model storage and caching via pgml schema settings
  • Use the dashboard (optional web UI) for experiment tracking and model comparison

Key Features

  • Zero data movement: train and predict where data lives
  • SQL-native interface lowers the barrier for database teams
  • Automatic model versioning and A/B deployment via SQL
  • Built-in text embedding generation and vector search
  • Horizontal read scaling through PostgreSQL replicas

Comparison with Similar Tools

  • MindsDB — Separate server proxying to databases; PostgresML is a native extension
  • MADlib — Older in-database ML for PostgreSQL; PostgresML supports modern transformers
  • BigQuery ML — Cloud-only; PostgresML is self-hosted and open source
  • MLflow — External experiment tracking; PostgresML keeps everything in one place
  • pgvector — Vector search only; PostgresML adds training, inference, and embeddings

FAQ

Q: Do I need a GPU? A: GPUs accelerate transformer models significantly but are not required. Classical ML algorithms (XGBoost, etc.) run fine on CPU.

Q: Can I use Hugging Face models? A: Yes, PostgresML can download and run Hugging Face transformer models for embeddings and text generation.

Q: Does it affect database performance? A: Training is resource-intensive but runs as a background task. Inference is lightweight and can be parallelized across connections.

Q: Is it production-ready? A: Yes, PostgresML is used in production for real-time personalization and search ranking workloads.

Sources

Fil de discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires