# Tantivy — Full-Text Search Engine Library for Rust

> Tantivy is a high-performance full-text search engine library written in Rust, inspired by Apache Lucene, providing indexing and search capabilities that can be embedded into any application.

## Install

Save as a script file and run:

# Tantivy — Full-Text Search Engine Library for Rust

## Quick Use
```bash
# Add tantivy to your Rust project
cargo add tantivy

# Or try the CLI search tool
cargo install tantivy-cli

# Create an index and add documents
tantivy new -i ./my_index
tantivy index -i ./my_index < documents.json
tantivy search -i ./my_index -q "search query"
```

## Introduction
Tantivy is a full-text search engine library written in Rust, designed as a Lucene-equivalent for the Rust ecosystem. It provides fast indexing and querying of text, numeric, and geo-spatial data, and can be embedded directly into applications without running a separate search server.

## What Tantivy Does
- Indexes and searches text documents with BM25 scoring and term-level queries
- Supports boolean, phrase, range, regex, and fuzzy search queries
- Handles numeric, date, faceted, and IP address field types
- Provides concurrent indexing with near-real-time search visibility
- Offers Python bindings via the tantivy-py package for cross-language use

## Architecture Overview
Tantivy stores data in segments, each containing an inverted index, column store, and positional data. Writes go to an in-memory segment that is periodically committed to disk. A merge policy compacts segments in the background. Searches fan out across segments and merge results. The architecture avoids global locks, allowing concurrent reads and writes. The storage format uses memory-mapped files for efficient I/O, and the codec compresses posting lists with bitpacking and SIMD-accelerated decoding.

## Self-Hosting & Configuration
- Add `tantivy` as a Cargo dependency for embedded use in Rust applications
- Use `tantivy-py` for Python integration via pip install
- Configure schema with typed fields (TEXT, U64, F64, DATE, FACET, BYTES, IP)
- Set indexing parameters like heap size, merge policy, and commit frequency
- Deploy as part of your application binary with no external service dependencies

## Key Features
- Written in safe Rust with no garbage collection pauses during indexing or search
- Single-node performance comparable to or exceeding Lucene for many workloads
- Supports configurable tokenizers including language-specific stemmers
- Provides snippet generation and search result highlighting
- Powers Quickwit, the distributed search engine, as its core indexing library

## Comparison with Similar Tools
- **Apache Lucene** — the Java equivalent, mature and widely used but requires JVM
- **Bleve** — full-text search library for Go, similar embedded approach
- **MeiliSearch** — search server with REST API, not an embeddable library
- **Elasticsearch** — distributed search platform, much heavier for simple use cases
- **Sonic** — lightweight search backend but fewer query features and field types

## FAQ
**Q: Is Tantivy a search server like Elasticsearch?**
A: No. Tantivy is a library you embed in your application. For a distributed search server built on Tantivy, see Quickwit.

**Q: Can I use Tantivy from Python?**
A: Yes. The tantivy-py package provides Python bindings for indexing and searching.

**Q: How does Tantivy handle concurrent writes?**
A: Tantivy uses a single IndexWriter with configurable thread pools. Multiple threads can add documents concurrently, and commits make them searchable.

**Q: Does Tantivy support distributed search?**
A: Tantivy itself is single-node. Quickwit builds distributed search on top of Tantivy for cluster deployments.

## Sources
- https://github.com/quickwit-oss/tantivy
- https://docs.rs/tantivy

---
Source: https://tokrepo.com/en/workflows/fd82a53d-4491-11f1-9bc6-00163e2b0d79
Author: Script Depot