# sqlite-utils — Python + CLI for ETL Into SQLite > Simon Willison's Python library + CLI for getting messy CSV/JSON/YAML into SQLite. Auto-schema, upserts, joins, FTS indexing one-liners. ## Install Copy the content below into your project: ## Quick Use 1. `pip install sqlite-utils` 2. `sqlite-utils insert data.db tablename file.csv --csv` (or `.json`) 3. `sqlite-utils enable-fts data.db tablename col1 col2 --create-triggers` for full-text --- ## Intro sqlite-utils is the Python library + CLI that Simon Willison pairs with Datasette — get messy CSV / JSON / YAML / JSONL into SQLite with auto-schema inference, then enrich, transform, upsert, and index for full-text search in one-line commands. Best for: data journalism imports, log ingestion, agent memory stores, any 'I have data, I want it queryable' task. Works with: any modern Python + SQLite 3.31+. Setup time: 2 minutes. --- ### Install ```bash pip install sqlite-utils ``` ### CSV → SQLite (one command) ```bash # Auto-detect column types from data, including dates and integers sqlite-utils insert data.db articles articles.csv --csv # JSON array of objects curl https://api.example.com/articles | sqlite-utils insert data.db articles - # JSONL stream sqlite-utils insert data.db logs logs.jsonl --nl ``` ### Add indexes + full-text search ```bash # B-tree index on a column sqlite-utils create-index data.db articles category # FTS5 full-text search index across columns sqlite-utils enable-fts data.db articles title body --create-triggers # Now this is fast: sqlite-utils search data.db articles "machine learning" ``` ### Python API (richer than CLI) ```python import sqlite_utils db = sqlite_utils.Database("data.db") # Upsert with composite PK db["articles"].upsert_all([ {"id": 1, "title": "Hello", "category": "intro"}, {"id": 2, "title": "World", "category": "intro"}, ], pk="id") # Add a computed column db["articles"].add_column("word_count", int) for row in db["articles"].rows: db["articles"].update(row["id"], {"word_count": len(row["body"].split())}) # Transform schema in place db["articles"].transform( drop={"old_column"}, rename={"body": "content"}, column_order=["id", "title", "content"], ) ``` ### Pair with Datasette ```bash sqlite-utils insert data.db log access.csv --csv datasette serve data.db # instant web UI + JSON API for the imported data ``` --- ### FAQ **Q: Why not just use pandas + to_sql?** A: pandas is fine if you already use it. sqlite-utils is built for shell-first workflows and unfamiliar data — auto-schema, upserts, FTS, and JSON line streams are first-class CLI commands, not flags. Most journalism / log-wrangling tasks finish faster from the shell. **Q: Can it handle multi-million-row imports?** A: Yes — SQLite handles billions of rows. sqlite-utils streams JSONL/CSV without loading the full file into memory. For very large imports, use `--batch-size 1000` to control transaction size. **Q: Does it work with sqlite3 in WAL mode?** A: Yes — sqlite-utils respects existing pragmas. For Datasette-served DBs, WAL mode is recommended so readers don't block writers during import. --- ## Source & Thanks > Built by [Simon Willison](https://github.com/simonw). Licensed under Apache-2.0. > > [simonw/sqlite-utils](https://github.com/simonw/sqlite-utils) — ⭐ 1,700+ --- ## 快速使用 1. `pip install sqlite-utils` 2. `sqlite-utils insert data.db tablename file.csv --csv`(或 `.json`) 3. 全文检索 `sqlite-utils enable-fts data.db tablename col1 col2 --create-triggers` --- ## 简介 sqlite-utils 是 Simon Willison 跟 Datasette 配套的 Python 库 + CLI —— 把杂乱的 CSV / JSON / YAML / JSONL 灌进 SQLite,自动推断 schema,再用一行命令做 enrich、transform、upsert、全文检索索引。适合数据新闻导入、日志摄入、agent 记忆库、任何「我有数据,想能查」的任务。兼容现代 Python + SQLite 3.31+。装机时间 2 分钟。 --- ### 安装 ```bash pip install sqlite-utils ``` ### CSV → SQLite(一行命令) ```bash # 从数据自动检测列类型,包括日期和整数 sqlite-utils insert data.db articles articles.csv --csv # JSON 对象数组 curl https://api.example.com/articles | sqlite-utils insert data.db articles - # JSONL 流 sqlite-utils insert data.db logs logs.jsonl --nl ``` ### 加索引 + 全文检索 ```bash # 列上加 B-tree 索引 sqlite-utils create-index data.db articles category # 跨列 FTS5 全文索引 sqlite-utils enable-fts data.db articles title body --create-triggers # 之后这就快了: sqlite-utils search data.db articles "machine learning" ``` ### Python API(比 CLI 更丰富) ```python import sqlite_utils db = sqlite_utils.Database("data.db") # 复合主键 upsert db["articles"].upsert_all([ {"id": 1, "title": "Hello", "category": "intro"}, {"id": 2, "title": "World", "category": "intro"}, ], pk="id") # 加计算列 db["articles"].add_column("word_count", int) for row in db["articles"].rows: db["articles"].update(row["id"], {"word_count": len(row["body"].split())}) # 原地变 schema db["articles"].transform( drop={"old_column"}, rename={"body": "content"}, column_order=["id", "title", "content"], ) ``` ### 跟 Datasette 配合 ```bash sqlite-utils insert data.db log access.csv --csv datasette serve data.db # 立刻拿到导入数据的 web UI + JSON API ``` --- ### FAQ **Q: 为啥不用 pandas + to_sql?** A: 已经用 pandas 没问题。sqlite-utils 为 shell 优先 + 陌生数据设计 —— 自动 schema、upsert、FTS、JSON 流都是头等 CLI 命令不是 flag。多数新闻 / 日志处理任务在 shell 里更快搞定。 **Q: 百万行级导入扛得住吗?** A: 扛得住 —— SQLite 处理十亿级行。sqlite-utils 流式读 JSONL/CSV 不把整个文件加载内存。超大导入用 `--batch-size 1000` 控制事务大小。 **Q: sqlite3 WAL 模式能用吗?** A: 能 —— sqlite-utils 尊重已有 pragma。Datasette serve 的 DB 建议 WAL 模式,导入时读者不阻塞写者。 --- ## 来源与感谢 > Built by [Simon Willison](https://github.com/simonw). Licensed under Apache-2.0. > > [simonw/sqlite-utils](https://github.com/simonw/sqlite-utils) — ⭐ 1,700+ --- Source: https://tokrepo.com/en/workflows/sqlite-utils-python-cli-for-etl-into-sqlite Author: Simon Willison