ScriptsApr 30, 2026·3 min read

Tantivy — Full-Text Search Engine Library for Rust

Tantivy is a high-performance full-text search engine library written in Rust, inspired by Apache Lucene, providing indexing and search capabilities that can be embedded into any application.

Introduction

Tantivy is a full-text search engine library written in Rust, designed as a Lucene-equivalent for the Rust ecosystem. It provides fast indexing and querying of text, numeric, and geo-spatial data, and can be embedded directly into applications without running a separate search server.

What Tantivy Does

  • Indexes and searches text documents with BM25 scoring and term-level queries
  • Supports boolean, phrase, range, regex, and fuzzy search queries
  • Handles numeric, date, faceted, and IP address field types
  • Provides concurrent indexing with near-real-time search visibility
  • Offers Python bindings via the tantivy-py package for cross-language use

Architecture Overview

Tantivy stores data in segments, each containing an inverted index, column store, and positional data. Writes go to an in-memory segment that is periodically committed to disk. A merge policy compacts segments in the background. Searches fan out across segments and merge results. The architecture avoids global locks, allowing concurrent reads and writes. The storage format uses memory-mapped files for efficient I/O, and the codec compresses posting lists with bitpacking and SIMD-accelerated decoding.

Self-Hosting & Configuration

  • Add tantivy as a Cargo dependency for embedded use in Rust applications
  • Use tantivy-py for Python integration via pip install
  • Configure schema with typed fields (TEXT, U64, F64, DATE, FACET, BYTES, IP)
  • Set indexing parameters like heap size, merge policy, and commit frequency
  • Deploy as part of your application binary with no external service dependencies

Key Features

  • Written in safe Rust with no garbage collection pauses during indexing or search
  • Single-node performance comparable to or exceeding Lucene for many workloads
  • Supports configurable tokenizers including language-specific stemmers
  • Provides snippet generation and search result highlighting
  • Powers Quickwit, the distributed search engine, as its core indexing library

Comparison with Similar Tools

  • Apache Lucene — the Java equivalent, mature and widely used but requires JVM
  • Bleve — full-text search library for Go, similar embedded approach
  • MeiliSearch — search server with REST API, not an embeddable library
  • Elasticsearch — distributed search platform, much heavier for simple use cases
  • Sonic — lightweight search backend but fewer query features and field types

FAQ

Q: Is Tantivy a search server like Elasticsearch? A: No. Tantivy is a library you embed in your application. For a distributed search server built on Tantivy, see Quickwit.

Q: Can I use Tantivy from Python? A: Yes. The tantivy-py package provides Python bindings for indexing and searching.

Q: How does Tantivy handle concurrent writes? A: Tantivy uses a single IndexWriter with configurable thread pools. Multiple threads can add documents concurrently, and commits make them searchable.

Q: Does Tantivy support distributed search? A: Tantivy itself is single-node. Quickwit builds distributed search on top of Tantivy for cluster deployments.

Sources

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets