# PostgresML — Machine Learning Inside PostgreSQL > PostgresML brings machine learning directly into PostgreSQL, allowing you to train models, run inference, and manage embeddings using SQL. No separate ML infrastructure needed — your database is your ML engine. ## Install Save the content below to `.claude/skills/` or append to your `CLAUDE.md`: # PostgresML — Machine Learning Inside PostgreSQL ## Quick Use ```bash # Using Docker docker run -d --name postgresml -p 5433:5432 ghcr.io/postgresml/postgresml:latest psql -h localhost -p 5433 -U postgresml # Train a model SELECT pgml.train('my_model', 'classification', 'my_table', 'target_column'); ``` ## Introduction PostgresML is a PostgreSQL extension that integrates machine learning workflows directly into the database. Instead of moving data to external ML services, you train models, generate embeddings, and run predictions where your data already lives — eliminating data movement overhead and simplifying your architecture. ## What PostgresML Does - Trains classification, regression, and clustering models inside PostgreSQL - Generates text embeddings using transformer models with a single SQL call - Runs inference on live data without extracting it from the database - Provides vector search capabilities for similarity queries - Supports XGBoost, LightGBM, scikit-learn, and Hugging Face models ## Architecture Overview PostgresML runs as a PostgreSQL extension written in Rust. It loads ML runtimes (Python, XGBoost, Torch) in-process, giving models direct access to shared memory buffers. Trained models are serialized and stored in PostgreSQL tables, versioned automatically. The query planner can push predicates down into ML operations, enabling efficient batch inference on filtered datasets. ## Self-Hosting & Configuration - Deploy via Docker image or install the extension on an existing PostgreSQL instance - Requires PostgreSQL 14+ with shared_preload_libraries configured - GPU acceleration available by mounting NVIDIA devices into the container - Configure model storage and caching via `pgml` schema settings - Use the dashboard (optional web UI) for experiment tracking and model comparison ## Key Features - Zero data movement: train and predict where data lives - SQL-native interface lowers the barrier for database teams - Automatic model versioning and A/B deployment via SQL - Built-in text embedding generation and vector search - Horizontal read scaling through PostgreSQL replicas ## Comparison with Similar Tools - **MindsDB** — Separate server proxying to databases; PostgresML is a native extension - **MADlib** — Older in-database ML for PostgreSQL; PostgresML supports modern transformers - **BigQuery ML** — Cloud-only; PostgresML is self-hosted and open source - **MLflow** — External experiment tracking; PostgresML keeps everything in one place - **pgvector** — Vector search only; PostgresML adds training, inference, and embeddings ## FAQ **Q: Do I need a GPU?** A: GPUs accelerate transformer models significantly but are not required. Classical ML algorithms (XGBoost, etc.) run fine on CPU. **Q: Can I use Hugging Face models?** A: Yes, PostgresML can download and run Hugging Face transformer models for embeddings and text generation. **Q: Does it affect database performance?** A: Training is resource-intensive but runs as a background task. Inference is lightweight and can be parallelized across connections. **Q: Is it production-ready?** A: Yes, PostgresML is used in production for real-time personalization and search ranking workloads. ## Sources - https://github.com/postgresml/postgresml - https://postgresml.org/docs --- Source: https://tokrepo.com/en/workflows/postgresml-machine-learning-inside-postgresql-8f274465 Author: AI Open Source