Introduction
Tantivy is a full-text search engine library written in Rust, designed as a Lucene-equivalent for the Rust ecosystem. It provides fast indexing and querying of text, numeric, and geo-spatial data, and can be embedded directly into applications without running a separate search server.
What Tantivy Does
- Indexes and searches text documents with BM25 scoring and term-level queries
- Supports boolean, phrase, range, regex, and fuzzy search queries
- Handles numeric, date, faceted, and IP address field types
- Provides concurrent indexing with near-real-time search visibility
- Offers Python bindings via the tantivy-py package for cross-language use
Architecture Overview
Tantivy stores data in segments, each containing an inverted index, column store, and positional data. Writes go to an in-memory segment that is periodically committed to disk. A merge policy compacts segments in the background. Searches fan out across segments and merge results. The architecture avoids global locks, allowing concurrent reads and writes. The storage format uses memory-mapped files for efficient I/O, and the codec compresses posting lists with bitpacking and SIMD-accelerated decoding.
Self-Hosting & Configuration
- Add
tantivyas a Cargo dependency for embedded use in Rust applications - Use
tantivy-pyfor Python integration via pip install - Configure schema with typed fields (TEXT, U64, F64, DATE, FACET, BYTES, IP)
- Set indexing parameters like heap size, merge policy, and commit frequency
- Deploy as part of your application binary with no external service dependencies
Key Features
- Written in safe Rust with no garbage collection pauses during indexing or search
- Single-node performance comparable to or exceeding Lucene for many workloads
- Supports configurable tokenizers including language-specific stemmers
- Provides snippet generation and search result highlighting
- Powers Quickwit, the distributed search engine, as its core indexing library
Comparison with Similar Tools
- Apache Lucene — the Java equivalent, mature and widely used but requires JVM
- Bleve — full-text search library for Go, similar embedded approach
- MeiliSearch — search server with REST API, not an embeddable library
- Elasticsearch — distributed search platform, much heavier for simple use cases
- Sonic — lightweight search backend but fewer query features and field types
FAQ
Q: Is Tantivy a search server like Elasticsearch? A: No. Tantivy is a library you embed in your application. For a distributed search server built on Tantivy, see Quickwit.
Q: Can I use Tantivy from Python? A: Yes. The tantivy-py package provides Python bindings for indexing and searching.
Q: How does Tantivy handle concurrent writes? A: Tantivy uses a single IndexWriter with configurable thread pools. Multiple threads can add documents concurrently, and commits make them searchable.
Q: Does Tantivy support distributed search? A: Tantivy itself is single-node. Quickwit builds distributed search on top of Tantivy for cluster deployments.