Introduction
fastText is a library from Meta AI Research for efficient text classification and word representation learning. It extends the Word2Vec approach with subword information, enabling it to generate embeddings for out-of-vocabulary words and train classifiers on large datasets in seconds rather than hours.
What fastText Does
- Learns word vectors using subword (character n-gram) information for robust embeddings
- Trains supervised text classifiers that scale to billions of examples
- Provides pre-trained word vectors for 157 languages
- Supports both CBOW and Skip-gram training objectives
- Offers quantization to compress models by 10x with minimal accuracy loss
Architecture Overview
fastText represents each word as a bag of character n-grams plus the word itself. During training, it learns embeddings for these subword units and composes word vectors by summing them. For classification, it uses a shallow neural network with a linear classifier on top of averaged word embeddings, achieving accuracy competitive with deep models at a fraction of the compute cost. The hierarchical softmax option further speeds up training on datasets with many labels.
Self-Hosting & Configuration
- Install via pip, conda, or compile from source for C++ CLI tools
- Pre-trained vectors available for download from the fastText website
- Training parameters (learning rate, epochs, n-grams) are set via CLI flags
- Use quantize to reduce model size for deployment on resource-constrained systems
- The Python API wraps the C++ core for easy integration into data pipelines
Key Features
- Subword embeddings handle misspellings, morphology, and rare words gracefully
- Training speed: classifies millions of examples per second on a single CPU core
- Pre-trained vectors for 157 languages trained on Common Crawl and Wikipedia
- Automatic hyperparameter tuning via the autotune feature
- Model compression through product quantization for mobile and edge deployment
Comparison with Similar Tools
- Word2Vec — pioneered word embeddings but lacks subword information; fastText handles OOV words naturally
- GloVe — global co-occurrence matrix approach; fastText is faster to train and supports subword units
- spaCy — full NLP pipeline with built-in vectors; fastText focuses purely on embeddings and classification
- Sentence Transformers — produces contextual sentence embeddings via Transformers; fastText is simpler and faster
- scikit-learn text classifiers — flexible but slower on large datasets; fastText is optimized for scale
FAQ
Q: Can fastText handle languages with rich morphology? A: Yes. Subword n-grams capture morphological patterns, making it effective for agglutinative languages like Finnish, Turkish, and Korean.
Q: How does fastText compare to Transformer-based embeddings? A: Transformer models produce contextual embeddings and generally achieve higher accuracy on benchmarks, but fastText is orders of magnitude faster and works well when compute or latency budgets are tight.
Q: What format does the training data need? A: For supervised classification, each line should contain a label prefixed with label followed by the text. For unsupervised training, plain text with one sentence per line.
Q: Is fastText suitable for production use? A: Yes. The C++ core is fast and memory-efficient. Quantized models can run on mobile devices, and the library has been deployed at scale inside Meta.