Cette page est affichée en anglais. Une traduction française est en cours.
ConfigsMay 26, 2026·3 min de lecture

SQLBot — AI-Powered Text-to-SQL with RAG

An open-source conversational data analysis system that converts natural language questions into SQL queries using large language models and retrieval-augmented generation.

Prêt pour agents

Installation agent prête

Cet actif peut être installé après choix du runtime, vérification du plan et exécution de la commande adaptée.

Native · 98/100Policy : autoriser
Surface agent
Tout agent MCP/CLI
Type
Skill
Installation
Single
Confiance
Confiance : Established
Point d'entrée
SQLBot Overview
Commande d'installation directe
npx -y tokrepo@latest install 865f4127-5940-11f1-9bc6-00163e2b0d79 --target codex

À exécuter après confirmation du plan en dry-run.

Introduction

SQLBot is an open-source text-to-SQL system developed by the DataEase team. It lets non-technical users ask data questions in plain language and get accurate SQL-generated answers from their databases. Using RAG (retrieval-augmented generation), it understands your schema context to produce precise queries against MySQL, PostgreSQL, ClickHouse, and other databases.

What SQLBot Does

  • Converts natural language questions into executable SQL queries automatically
  • Uses RAG to learn your database schema, table relationships, and business terminology
  • Displays query results as tables, charts, and natural language summaries
  • Supports multiple database engines including MySQL, PostgreSQL, and ClickHouse
  • Provides a web-based chat interface for interactive data exploration

Architecture Overview

SQLBot consists of a frontend chat interface, a query generation backend, and a RAG pipeline. When a user asks a question, the system first retrieves relevant schema information and example queries from a vector store. This context is combined with the user question and sent to an LLM (supporting DeepSeek, OpenAI, and local models) to generate SQL. The generated query is validated, executed against the target database, and results are formatted for display. A feedback loop lets users correct queries to improve future accuracy.

Self-Hosting & Configuration

  • Deploy with Docker Compose — includes the web UI, backend service, and vector store
  • Configure database connections through the web interface or environment variables
  • Set your preferred LLM provider (OpenAI, DeepSeek, or local Ollama models) in .env
  • Import schema metadata automatically by pointing SQLBot at your database
  • Add business glossary terms and example queries to improve generation accuracy

Key Features

  • Schema-aware RAG pipeline produces more accurate SQL than generic chatbots
  • Multi-database support with automatic dialect adaptation
  • Visual result rendering with auto-generated charts and data summaries
  • Query history and saved analyses for repeatable reporting
  • Supports Chinese and English for both questions and generated SQL

Comparison with Similar Tools

  • Vanna — Python library for text-to-SQL; SQLBot provides a ready-to-use web application
  • Chat2DB — AI database client; SQLBot focuses specifically on conversational analytics with RAG
  • WrenAI — semantic layer approach to text-to-SQL; SQLBot uses RAG with direct schema retrieval
  • DataGPT — commercial conversational analytics; SQLBot is fully open-source and self-hosted
  • PandasAI — Python-first data querying; SQLBot works directly with SQL databases without dataframes

FAQ

Q: Which LLM models work best with SQLBot? A: DeepSeek and GPT-4 class models produce the best results for complex queries. Smaller local models via Ollama work well for simpler schemas.

Q: Do I need to train the system on my data? A: No training is needed. SQLBot automatically reads your schema metadata. Adding example queries and business terms to the RAG index improves accuracy but is optional.

Q: Can SQLBot handle joins across multiple tables? A: Yes. The RAG pipeline retrieves relationship information between tables, enabling the LLM to generate multi-table joins when the question requires it.

Q: Is my data sent to external APIs? A: Only schema metadata and the generated SQL context are sent to the LLM provider. Actual data rows stay within your infrastructure. Use a local model via Ollama for fully air-gapped operation.

Sources

Fil de discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires