ScriptsMay 24, 2026·3 min read

Superduper — End-to-End AI Application Framework on Your Database

An open-source Python framework for building AI applications directly on existing databases, integrating vector search, LLM inference, and RAG without moving data.

Agent ready

This asset can be read and installed directly by agents

TokRepo exposes a universal CLI command, install contract, metadata JSON, adapter-aware plan, and raw content links so agents can judge fit, risk, and next actions.

Native · 98/100Policy: allow
Agent surface
Any MCP/CLI agent
Kind
Skill
Install
Single
Trust
Trust: Established
Entrypoint
Superduper
Universal CLI install command
npx tokrepo install ea8d6896-57ad-11f1-9bc6-00163e2b0d79

Introduction

Superduper is an open-source framework that brings AI capabilities directly to your existing database. Instead of extracting data into separate ML pipelines, Superduper lets you apply models, embeddings, and LLM-powered features as database-native operations, keeping your data in place while adding intelligence on top.

What Superduper Does

  • Applies ML models and LLMs directly to database records
  • Creates vector indexes for semantic search without a separate vector DB
  • Builds RAG pipelines that query your existing data stores
  • Triggers model inference automatically when new data arrives
  • Supports MongoDB, PostgreSQL, MySQL, SQLite, and S3 as backends

Architecture Overview

Superduper wraps your database connection with an AI-aware layer. Models register as listeners on collections or tables, executing automatically on inserts and updates. Vector indexes are maintained alongside regular data using the database's native storage. A scheduler coordinates batch and real-time inference, while a compute backend (local, Ray, or Dask) handles parallel execution.

Self-Hosting & Configuration

  • Install via pip with optional extras for your database backend
  • Connect by passing your existing database URI to the superduper() function
  • Models are defined as Python classes or imported from Hugging Face
  • Configure compute backends for distributed processing in YAML
  • Supports Docker Compose for running all components together

Key Features

  • Database-native vector search eliminates the need for a separate vector store
  • Change-data-capture triggers keep AI outputs fresh as data changes
  • Multi-model pipelines chain embeddings, classifiers, and LLMs
  • Version control for models and outputs enables reproducibility
  • Works with both SQL and document databases without code changes

Comparison with Similar Tools

  • LangChain — orchestration framework; Superduper is database-first, not chain-first
  • Pinecone/Weaviate — standalone vector DBs; Superduper adds vectors to your existing DB
  • MindsDB — SQL-based AI queries; Superduper offers richer Python model integration
  • Feature stores (Feast) — batch feature serving; Superduper does real-time model application

FAQ

Q: Do I need to migrate my data to use Superduper? A: No. Superduper connects to your existing database and operates in-place.

Q: Which embedding models are supported? A: Any model from Hugging Face, OpenAI, Cohere, or custom PyTorch/TensorFlow models.

Q: Can I use it for production workloads? A: Yes. Superduper supports distributed compute via Ray and is designed for production data volumes.

Q: How does vector search performance compare to dedicated vector databases? A: For most use cases, performance is comparable. Dedicated vector DBs may be faster at very large scale (100M+ vectors).

Sources

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets