FastText library for efficient learning of word representations and sentence classification
npx @tessl/cli install tessl/pypi-fasttext@0.9.0FastText is a library for efficient learning of word representations and sentence classification developed by Facebook Research. The Python bindings provide comprehensive access to FastText's C++ core, enabling unsupervised word representation learning, supervised text classification, and subword information processing.
pip install fasttextimport fasttextMain functions and model class:
from fasttext import train_supervised, train_unsupervised, load_model, tokenizeimport fasttext
# Train an unsupervised model on text file
model = fasttext.train_unsupervised('data.txt', model='skipgram')
# Get word vector
word_vector = model.get_word_vector('king')
# Find similar words
neighbors = model.get_nearest_neighbors('king')
print(neighbors)import fasttext
# Train supervised classifier
model = fasttext.train_supervised('train.txt')
# Predict labels for text
predictions = model.predict('This is a sample text')
print(predictions)
# Evaluate on test data
results = model.test('test.txt')
print(f"P@1: {results[1]}, R@1: {results[2]}")import fasttext
# Load a pre-trained model
model = fasttext.load_model('model.bin')
# Get sentence vector
sentence_vector = model.get_sentence_vector('Hello world')FastText combines several key innovations:
The Python bindings expose the complete C++ API through pybind11, providing both high-level training functions and low-level model manipulation capabilities.
Core training functions for both supervised classification and unsupervised word embeddings with extensive hyperparameter control.
def train_supervised(input, **kwargs):
"""
Train a supervised classification model.
Args:
input (str): Path to training file
**kwargs: Training parameters (lr, dim, epoch, etc.)
Returns:
FastText model object
"""
def train_unsupervised(input, **kwargs):
"""
Train an unsupervised word embedding model.
Args:
input (str): Path to training file
**kwargs: Training parameters (model, lr, dim, etc.)
Returns:
FastText model object
"""
def load_model(path):
"""
Load a pre-trained FastText model.
Args:
path (str): Path to model file
Returns:
FastText model object
"""Access and manipulate word vectors, find similar words, and perform vector arithmetic operations.
def get_word_vector(word):
"""Get vector representation of a word."""
def get_sentence_vector(text):
"""Get vector representation of a sentence."""
def get_nearest_neighbors(word, k=10):
"""Find k nearest neighbors of a word."""
def get_analogies(wordA, wordB, wordC, k=10):
"""Find analogies of the form A:B::C:?"""Predict labels for text, evaluate model performance, and access detailed classification metrics.
def predict(text, k=1, threshold=0.0):
"""
Predict labels for input text.
Args:
text (str): Input text to classify
k (int): Number of top predictions to return
threshold (float): Minimum prediction confidence
Returns:
Tuple of (labels, probabilities)
"""
def test(path, k=1, threshold=0.0):
"""
Evaluate model on test data.
Returns:
Tuple of (sample_count, precision, recall)
"""Helper functions for text processing, model manipulation, and downloading pre-trained models.
def tokenize(text):
"""Tokenize text into list of tokens."""
def quantize(**kwargs):
"""Quantize model to reduce memory usage."""
# Utility module functions
import fasttext.util
fasttext.util.download_model(lang_id, if_exists='strict')
fasttext.util.reduce_model(model, target_dim)# Model type enums (from C++ bindings via fasttext_pybind)
import fasttext
model_name = fasttext.model_name # Enum with values: cbow, skipgram, supervised
loss_name = fasttext.loss_name # Enum with values: hs, ns, softmax, ova
# Special tokens used in text processing
EOS = "</s>" # End of sentence token - marks sentence boundaries
BOW = "<" # Beginning of word token - used in subword processing
EOW = ">" # End of word token - used in subword processing
# Deprecated functions (raise exceptions with migration guidance)
cbow = fasttext.cbow # Raises exception, use train_unsupervised(model='cbow')
skipgram = fasttext.skipgram # Raises exception, use train_unsupervised(model='skipgram')
supervised = fasttext.supervised # Raises exception, use train_supervised()FastText functions accept on_unicode_error parameter for handling Unicode errors:
'strict' (default): Raise exception on Unicode errors'ignore': Skip invalid Unicode characters'replace': Replace invalid Unicode with placeholder