# Superduper — End-to-End AI Application Framework on Your Database

> An open-source Python framework for building AI applications directly on existing databases, integrating vector search, LLM inference, and RAG without moving data.

## Install

Save as a script file and run:

# Superduper — End-to-End AI Application Framework on Your Database

## Quick Use
```bash
pip install superduper
from superduper import superduper
db = superduper("mongodb://localhost:27017/mydb")
db.apply(VectorIndex(indexing_listener=model))
```

## Introduction
Superduper is an open-source framework that brings AI capabilities directly to your existing database. Instead of extracting data into separate ML pipelines, Superduper lets you apply models, embeddings, and LLM-powered features as database-native operations, keeping your data in place while adding intelligence on top.

## What Superduper Does
- Applies ML models and LLMs directly to database records
- Creates vector indexes for semantic search without a separate vector DB
- Builds RAG pipelines that query your existing data stores
- Triggers model inference automatically when new data arrives
- Supports MongoDB, PostgreSQL, MySQL, SQLite, and S3 as backends

## Architecture Overview
Superduper wraps your database connection with an AI-aware layer. Models register as listeners on collections or tables, executing automatically on inserts and updates. Vector indexes are maintained alongside regular data using the database's native storage. A scheduler coordinates batch and real-time inference, while a compute backend (local, Ray, or Dask) handles parallel execution.

## Self-Hosting & Configuration
- Install via pip with optional extras for your database backend
- Connect by passing your existing database URI to the superduper() function
- Models are defined as Python classes or imported from Hugging Face
- Configure compute backends for distributed processing in YAML
- Supports Docker Compose for running all components together

## Key Features
- Database-native vector search eliminates the need for a separate vector store
- Change-data-capture triggers keep AI outputs fresh as data changes
- Multi-model pipelines chain embeddings, classifiers, and LLMs
- Version control for models and outputs enables reproducibility
- Works with both SQL and document databases without code changes

## Comparison with Similar Tools
- **LangChain** — orchestration framework; Superduper is database-first, not chain-first
- **Pinecone/Weaviate** — standalone vector DBs; Superduper adds vectors to your existing DB
- **MindsDB** — SQL-based AI queries; Superduper offers richer Python model integration
- **Feature stores (Feast)** — batch feature serving; Superduper does real-time model application

## FAQ
**Q: Do I need to migrate my data to use Superduper?**
A: No. Superduper connects to your existing database and operates in-place.

**Q: Which embedding models are supported?**
A: Any model from Hugging Face, OpenAI, Cohere, or custom PyTorch/TensorFlow models.

**Q: Can I use it for production workloads?**
A: Yes. Superduper supports distributed compute via Ray and is designed for production data volumes.

**Q: How does vector search performance compare to dedicated vector databases?**
A: For most use cases, performance is comparable. Dedicated vector DBs may be faster at very large scale (100M+ vectors).

## Sources
- https://github.com/superduper-io/superduper
- https://docs.superduper.io/

---
Source: https://tokrepo.com/en/workflows/asset-ea8d6896
Author: Script Depot