ScriptsMay 24, 2026·3 min read

Superduper — End-to-End AI Application Framework on Your Database

An open-source Python framework for building AI applications directly on existing databases, integrating vector search, LLM inference, and RAG without moving data.

Agent ready

Ready-to-run agent install

This asset can be installed after the agent chooses its runtime, checks the plan, and runs the matching command.

Native · 98/100Policy: allow
Agent surface
Any MCP/CLI agent
Kind
Skill
Install
Single
Trust
Trust: Established
Entrypoint
Superduper
Direct install command
npx -y tokrepo@latest install ea8d6896-57ad-11f1-9bc6-00163e2b0d79 --target codex

Run after dry-run confirms the install plan.

Introduction

Superduper is an open-source framework that brings AI capabilities directly to your existing database. Instead of extracting data into separate ML pipelines, Superduper lets you apply models, embeddings, and LLM-powered features as database-native operations, keeping your data in place while adding intelligence on top.

What Superduper Does

  • Applies ML models and LLMs directly to database records
  • Creates vector indexes for semantic search without a separate vector DB
  • Builds RAG pipelines that query your existing data stores
  • Triggers model inference automatically when new data arrives
  • Supports MongoDB, PostgreSQL, MySQL, SQLite, and S3 as backends

Architecture Overview

Superduper wraps your database connection with an AI-aware layer. Models register as listeners on collections or tables, executing automatically on inserts and updates. Vector indexes are maintained alongside regular data using the database's native storage. A scheduler coordinates batch and real-time inference, while a compute backend (local, Ray, or Dask) handles parallel execution.

Self-Hosting & Configuration

  • Install via pip with optional extras for your database backend
  • Connect by passing your existing database URI to the superduper() function
  • Models are defined as Python classes or imported from Hugging Face
  • Configure compute backends for distributed processing in YAML
  • Supports Docker Compose for running all components together

Key Features

  • Database-native vector search eliminates the need for a separate vector store
  • Change-data-capture triggers keep AI outputs fresh as data changes
  • Multi-model pipelines chain embeddings, classifiers, and LLMs
  • Version control for models and outputs enables reproducibility
  • Works with both SQL and document databases without code changes

Comparison with Similar Tools

  • LangChain — orchestration framework; Superduper is database-first, not chain-first
  • Pinecone/Weaviate — standalone vector DBs; Superduper adds vectors to your existing DB
  • MindsDB — SQL-based AI queries; Superduper offers richer Python model integration
  • Feature stores (Feast) — batch feature serving; Superduper does real-time model application

FAQ

Q: Do I need to migrate my data to use Superduper? A: No. Superduper connects to your existing database and operates in-place.

Q: Which embedding models are supported? A: Any model from Hugging Face, OpenAI, Cohere, or custom PyTorch/TensorFlow models.

Q: Can I use it for production workloads? A: Yes. Superduper supports distributed compute via Ray and is designed for production data volumes.

Q: How does vector search performance compare to dedicated vector databases? A: For most use cases, performance is comparable. Dedicated vector DBs may be faster at very large scale (100M+ vectors).

Sources

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets