Cette page est affichée en anglais. Une traduction française est en cours.
ScriptsApr 30, 2026·3 min de lecture

Tantivy — Full-Text Search Engine Library for Rust

Tantivy is a high-performance full-text search engine library written in Rust, inspired by Apache Lucene, providing indexing and search capabilities that can be embedded into any application.

Introduction

Tantivy is a full-text search engine library written in Rust, designed as a Lucene-equivalent for the Rust ecosystem. It provides fast indexing and querying of text, numeric, and geo-spatial data, and can be embedded directly into applications without running a separate search server.

What Tantivy Does

  • Indexes and searches text documents with BM25 scoring and term-level queries
  • Supports boolean, phrase, range, regex, and fuzzy search queries
  • Handles numeric, date, faceted, and IP address field types
  • Provides concurrent indexing with near-real-time search visibility
  • Offers Python bindings via the tantivy-py package for cross-language use

Architecture Overview

Tantivy stores data in segments, each containing an inverted index, column store, and positional data. Writes go to an in-memory segment that is periodically committed to disk. A merge policy compacts segments in the background. Searches fan out across segments and merge results. The architecture avoids global locks, allowing concurrent reads and writes. The storage format uses memory-mapped files for efficient I/O, and the codec compresses posting lists with bitpacking and SIMD-accelerated decoding.

Self-Hosting & Configuration

  • Add tantivy as a Cargo dependency for embedded use in Rust applications
  • Use tantivy-py for Python integration via pip install
  • Configure schema with typed fields (TEXT, U64, F64, DATE, FACET, BYTES, IP)
  • Set indexing parameters like heap size, merge policy, and commit frequency
  • Deploy as part of your application binary with no external service dependencies

Key Features

  • Written in safe Rust with no garbage collection pauses during indexing or search
  • Single-node performance comparable to or exceeding Lucene for many workloads
  • Supports configurable tokenizers including language-specific stemmers
  • Provides snippet generation and search result highlighting
  • Powers Quickwit, the distributed search engine, as its core indexing library

Comparison with Similar Tools

  • Apache Lucene — the Java equivalent, mature and widely used but requires JVM
  • Bleve — full-text search library for Go, similar embedded approach
  • MeiliSearch — search server with REST API, not an embeddable library
  • Elasticsearch — distributed search platform, much heavier for simple use cases
  • Sonic — lightweight search backend but fewer query features and field types

FAQ

Q: Is Tantivy a search server like Elasticsearch? A: No. Tantivy is a library you embed in your application. For a distributed search server built on Tantivy, see Quickwit.

Q: Can I use Tantivy from Python? A: Yes. The tantivy-py package provides Python bindings for indexing and searching.

Q: How does Tantivy handle concurrent writes? A: Tantivy uses a single IndexWriter with configurable thread pools. Multiple threads can add documents concurrently, and commits make them searchable.

Q: Does Tantivy support distributed search? A: Tantivy itself is single-node. Quickwit builds distributed search on top of Tantivy for cluster deployments.

Sources

Fil de discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires