What is PostgresML — Machine Learning Inside PostgreSQL?

PostgresML brings machine learning directly into PostgreSQL, allowing you to train models, run inference, and manage embeddings using SQL. No separate ML infrastructure needed — your database is your ML engine.

Is PostgresML — Machine Learning Inside PostgreSQL free to use?

Yes. PostgresML — Machine Learning Inside PostgreSQL is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install PostgresML — Machine Learning Inside PostgreSQL?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

PostgresML — Machine Learning Inside PostgreSQL

Introduction

PostgresML is a PostgreSQL extension that integrates machine learning workflows directly into the database. Instead of moving data to external ML services, you train models, generate embeddings, and run predictions where your data already lives — eliminating data movement overhead and simplifying your architecture.

What PostgresML Does

Trains classification, regression, and clustering models inside PostgreSQL
Generates text embeddings using transformer models with a single SQL call
Runs inference on live data without extracting it from the database
Provides vector search capabilities for similarity queries
Supports XGBoost, LightGBM, scikit-learn, and Hugging Face models

Architecture Overview

PostgresML runs as a PostgreSQL extension written in Rust. It loads ML runtimes (Python, XGBoost, Torch) in-process, giving models direct access to shared memory buffers. Trained models are serialized and stored in PostgreSQL tables, versioned automatically. The query planner can push predicates down into ML operations, enabling efficient batch inference on filtered datasets.

Self-Hosting & Configuration

Deploy via Docker image or install the extension on an existing PostgreSQL instance
Requires PostgreSQL 14+ with shared_preload_libraries configured
GPU acceleration available by mounting NVIDIA devices into the container
Configure model storage and caching via pgml schema settings
Use the dashboard (optional web UI) for experiment tracking and model comparison

Key Features

Zero data movement: train and predict where data lives
SQL-native interface lowers the barrier for database teams
Automatic model versioning and A/B deployment via SQL
Built-in text embedding generation and vector search
Horizontal read scaling through PostgreSQL replicas

Comparison with Similar Tools

MindsDB — Separate server proxying to databases; PostgresML is a native extension
MADlib — Older in-database ML for PostgreSQL; PostgresML supports modern transformers
BigQuery ML — Cloud-only; PostgresML is self-hosted and open source
MLflow — External experiment tracking; PostgresML keeps everything in one place
pgvector — Vector search only; PostgresML adds training, inference, and embeddings

FAQ

Q: Do I need a GPU? A: GPUs accelerate transformer models significantly but are not required. Classical ML algorithms (XGBoost, etc.) run fine on CPU.

Q: Can I use Hugging Face models? A: Yes, PostgresML can download and run Hugging Face transformer models for embeddings and text generation.

Q: Does it affect database performance? A: Training is resource-intensive but runs as a background task. Inference is lightweight and can be parallelized across connections.

Q: Is it production-ready? A: Yes, PostgresML is used in production for real-time personalization and search ranking workloads.

PostgresML — Machine Learning Inside PostgreSQL

Introduction

What PostgresML Does

Architecture Overview

Self-Hosting & Configuration

Key Features

Comparison with Similar Tools

FAQ

Sources

Discusión

Activos relacionados

XTDB — Bitemporal Database with Immutable History

PgBouncer — Lightweight Connection Pooler for PostgreSQL

Apache AGE — Graph Database Extension for PostgreSQL