# tiktoken — Fast BPE Tokenizer for OpenAI Models > A high-performance byte pair encoding tokenizer used by OpenAI GPT models, written in Rust with Python bindings for counting and splitting tokens. ## Install Save as a script file and run: # tiktoken — Fast BPE Tokenizer for OpenAI Models ## Quick Use ```bash pip install tiktoken python -c "import tiktoken; enc = tiktoken.encoding_for_model('gpt-4o'); print(len(enc.encode('Hello world!')))" ``` ## Introduction tiktoken is a fast BPE (Byte Pair Encoding) tokenizer maintained by OpenAI. It lets developers count tokens, split text, and debug prompts before sending them to GPT-family models, preventing unexpected truncation and controlling costs. ## What tiktoken Does - Encodes and decodes text using the exact tokenization schemes of GPT-3.5, GPT-4, and GPT-4o - Counts tokens accurately so you can stay within context-window limits - Provides multiple encoding presets (cl100k_base, o200k_base, p50k_base) - Returns byte-level token IDs for low-level prompt inspection - Offers a thread-safe Rust core with Python bindings for high throughput ## Architecture Overview tiktoken is implemented in Rust for speed and exposes a thin Python wrapper via PyO3. The core performs regex-based pre-tokenization followed by BPE merging against a precomputed rank table. Encoding tables are lazy-loaded from a remote blob store on first use and cached locally. ## Self-Hosting & Configuration - Install from PyPI: `pip install tiktoken` - No server component required; runs entirely in-process - Encoding files are fetched once and cached in `~/.cache/tiktoken` - Set `TIKTOKEN_CACHE_DIR` to override the cache path - Use `tiktoken.get_encoding("cl100k_base")` to load a specific vocabulary ## Key Features - Sub-millisecond encoding of typical prompts due to Rust core - Supports all current OpenAI model tokenization schemes - Deterministic output matching the OpenAI API token count exactly - Lightweight with minimal dependencies - Works offline after initial cache warm-up ## Comparison with Similar Tools - **Hugging Face tokenizers** — more general but does not guarantee OpenAI-compatible counts - **SentencePiece** — supports BPE and Unigram but needs manual vocab loading for GPT models - **transformers AutoTokenizer** — convenient for HF models, heavier dependency tree - **GPT-2 Encoder (Python)** — reference implementation, much slower than tiktoken ## FAQ **Q: Which encoding should I use for GPT-4o?** A: Use `o200k_base`, which tiktoken selects automatically when you call `encoding_for_model("gpt-4o")`. **Q: Can I use tiktoken without an internet connection?** A: Yes, once the encoding file is cached locally. Pre-warm by running any encode call while online. **Q: Does tiktoken work with non-OpenAI models?** A: It only ships OpenAI vocabularies. For other models, use Hugging Face tokenizers or SentencePiece. **Q: Is tiktoken thread-safe?** A: Yes. The Rust core is safe to call from multiple Python threads concurrently. ## Sources - https://github.com/openai/tiktoken - https://cookbook.openai.com/examples/how_to_count_tokens_with_tiktoken --- Source: https://tokrepo.com/en/workflows/asset-9b284e97 Author: Script Depot