Skills2026年5月3日·1 分钟阅读

PostgresML — Machine Learning Inside PostgreSQL

PostgresML brings machine learning directly into PostgreSQL, allowing you to train models, run inference, and manage embeddings using SQL. No separate ML infrastructure needed — your database is your ML engine.

Agent 就绪

这个资产可以被 Agent 直接读取和安装

TokRepo 同时提供通用 CLI 命令、安装契约、metadata JSON、按适配器生成的安装计划和原始内容链接,方便 Agent 判断适配度、风险和下一步动作。

Native · 98/100策略:允许
Agent 入口
任意 MCP/CLI Agent
类型
Skill
安装
Single
信任
信任等级:Established
入口
PostgresML
通用 CLI 安装命令
npx tokrepo install 8f274465-46ca-11f1-9bc6-00163e2b0d79

Introduction

PostgresML is a PostgreSQL extension that integrates machine learning workflows directly into the database. Instead of moving data to external ML services, you train models, generate embeddings, and run predictions where your data already lives — eliminating data movement overhead and simplifying your architecture.

What PostgresML Does

  • Trains classification, regression, and clustering models inside PostgreSQL
  • Generates text embeddings using transformer models with a single SQL call
  • Runs inference on live data without extracting it from the database
  • Provides vector search capabilities for similarity queries
  • Supports XGBoost, LightGBM, scikit-learn, and Hugging Face models

Architecture Overview

PostgresML runs as a PostgreSQL extension written in Rust. It loads ML runtimes (Python, XGBoost, Torch) in-process, giving models direct access to shared memory buffers. Trained models are serialized and stored in PostgreSQL tables, versioned automatically. The query planner can push predicates down into ML operations, enabling efficient batch inference on filtered datasets.

Self-Hosting & Configuration

  • Deploy via Docker image or install the extension on an existing PostgreSQL instance
  • Requires PostgreSQL 14+ with shared_preload_libraries configured
  • GPU acceleration available by mounting NVIDIA devices into the container
  • Configure model storage and caching via pgml schema settings
  • Use the dashboard (optional web UI) for experiment tracking and model comparison

Key Features

  • Zero data movement: train and predict where data lives
  • SQL-native interface lowers the barrier for database teams
  • Automatic model versioning and A/B deployment via SQL
  • Built-in text embedding generation and vector search
  • Horizontal read scaling through PostgreSQL replicas

Comparison with Similar Tools

  • MindsDB — Separate server proxying to databases; PostgresML is a native extension
  • MADlib — Older in-database ML for PostgreSQL; PostgresML supports modern transformers
  • BigQuery ML — Cloud-only; PostgresML is self-hosted and open source
  • MLflow — External experiment tracking; PostgresML keeps everything in one place
  • pgvector — Vector search only; PostgresML adds training, inference, and embeddings

FAQ

Q: Do I need a GPU? A: GPUs accelerate transformer models significantly but are not required. Classical ML algorithms (XGBoost, etc.) run fine on CPU.

Q: Can I use Hugging Face models? A: Yes, PostgresML can download and run Hugging Face transformer models for embeddings and text generation.

Q: Does it affect database performance? A: Training is resource-intensive but runs as a background task. Inference is lightweight and can be parallelized across connections.

Q: Is it production-ready? A: Yes, PostgresML is used in production for real-time personalization and search ranking workloads.

Sources

讨论

登录后参与讨论。
还没有评论,来写第一条吧。

相关资产