Tessl Tile for pypi/fasttext@0.9.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

tessl/pypi-fasttext

FastText library for efficient learning of word representations and sentence classification

Workspace: tessl
Visibility: Public
Created: 3 months ago
Last updated: 3 months ago
Describes: pkg:pypi/fasttext@0.9.x

To install, run

npx @tessl/cli install tessl/pypi-fasttext@0.9.0

0
# FastText
1

2
FastText is a library for efficient learning of word representations and sentence classification developed by Facebook Research. The Python bindings provide comprehensive access to FastText's C++ core, enabling unsupervised word representation learning, supervised text classification, and subword information processing.
3

4
## Package Information
5

6
- **Package Name**: fasttext
7
- **Language**: Python (with C++ core)
8
- **Installation**: `pip install fasttext`
9

10
## Core Imports
11

12
```python
13
import fasttext
14
```
15

16
Main functions and model class:
17

18
```python
19
from fasttext import train_supervised, train_unsupervised, load_model, tokenize
20
```
21

22
## Basic Usage
23

24
### Training a Word Embedding Model
25

26
```python
27
import fasttext
28

29
# Train an unsupervised model on text file
30
model = fasttext.train_unsupervised('data.txt', model='skipgram')
31

32
# Get word vector
33
word_vector = model.get_word_vector('king')
34

35
# Find similar words
36
neighbors = model.get_nearest_neighbors('king')
37
print(neighbors)
38
```
39

40
### Training a Text Classification Model
41

42
```python
43
import fasttext
44

45
# Train supervised classifier
46
model = fasttext.train_supervised('train.txt')
47

48
# Predict labels for text
49
predictions = model.predict('This is a sample text')
50
print(predictions)
51

52
# Evaluate on test data
53
results = model.test('test.txt')
54
print(f"P@1: {results[1]}, R@1: {results[2]}")
55
```
56

57
### Loading Pre-trained Models
58

59
```python
60
import fasttext
61

62
# Load a pre-trained model
63
model = fasttext.load_model('model.bin')
64

65
# Get sentence vector
66
sentence_vector = model.get_sentence_vector('Hello world')
67
```
68

69
## Architecture
70

71
FastText combines several key innovations:
72

73
- **Subword Information**: Handles out-of-vocabulary words by learning representations for character n-grams
74
- **Hierarchical Softmax**: Efficient training for large vocabularies
75
- **Bag-of-Words Models**: CBOW and Skip-gram architectures for unsupervised learning
76
- **Fast Text Classification**: Linear classifiers with efficient training and inference
77

78
The Python bindings expose the complete C++ API through pybind11, providing both high-level training functions and low-level model manipulation capabilities.
79

80
## Capabilities
81

82
### Model Training
83

84
Core training functions for both supervised classification and unsupervised word embeddings with extensive hyperparameter control.
85

86
```python { .api }
87
def train_supervised(input, **kwargs):
88
    """
89
    Train a supervised classification model.
90
    
91
    Args:
92
        input (str): Path to training file
93
        **kwargs: Training parameters (lr, dim, epoch, etc.)
94
    
95
    Returns:
96
        FastText model object
97
    """
98

99
def train_unsupervised(input, **kwargs):
100
    """
101
    Train an unsupervised word embedding model.
102
    
103
    Args:
104
        input (str): Path to training file
105
        **kwargs: Training parameters (model, lr, dim, etc.)
106
    
107
    Returns:
108
        FastText model object
109
    """
110

111
def load_model(path):
112
    """
113
    Load a pre-trained FastText model.
114
    
115
    Args:
116
        path (str): Path to model file
117
    
118
    Returns:
119
        FastText model object
120
    """
121
```
122

123
[Model Training](./training.md)
124

125
### Word Vector Operations
126

127
Access and manipulate word vectors, find similar words, and perform vector arithmetic operations.
128

129
```python { .api }
130
def get_word_vector(word):
131
    """Get vector representation of a word."""
132

133
def get_sentence_vector(text):
134
    """Get vector representation of a sentence."""
135

136
def get_nearest_neighbors(word, k=10):
137
    """Find k nearest neighbors of a word."""
138

139
def get_analogies(wordA, wordB, wordC, k=10):
140
    """Find analogies of the form A:B::C:?"""
141
```
142

143
[Word Vectors](./word-vectors.md)
144

145
### Text Classification
146

147
Predict labels for text, evaluate model performance, and access detailed classification metrics.
148

149
```python { .api }
150
def predict(text, k=1, threshold=0.0):
151
    """
152
    Predict labels for input text.
153
    
154
    Args:
155
        text (str): Input text to classify
156
        k (int): Number of top predictions to return
157
        threshold (float): Minimum prediction confidence
158
    
159
    Returns:
160
        Tuple of (labels, probabilities)
161
    """
162

163
def test(path, k=1, threshold=0.0):
164
    """
165
    Evaluate model on test data.
166
    
167
    Returns:
168
        Tuple of (sample_count, precision, recall)
169
    """
170
```
171

172
[Classification](./classification.md)
173

174
### Utility Functions
175

176
Helper functions for text processing, model manipulation, and downloading pre-trained models.
177

178
```python { .api }
179
def tokenize(text):
180
    """Tokenize text into list of tokens."""
181

182
def quantize(**kwargs):
183
    """Quantize model to reduce memory usage."""
184

185
# Utility module functions
186
import fasttext.util
187
fasttext.util.download_model(lang_id, if_exists='strict')
188
fasttext.util.reduce_model(model, target_dim)
189
```
190

191
[Utilities](./utilities.md)
192

193
## Constants and Enums
194

195
```python { .api }
196
# Model type enums (from C++ bindings via fasttext_pybind)
197
import fasttext
198
model_name = fasttext.model_name  # Enum with values: cbow, skipgram, supervised
199
loss_name = fasttext.loss_name    # Enum with values: hs, ns, softmax, ova
200

201
# Special tokens used in text processing
202
EOS = "</s>"       # End of sentence token - marks sentence boundaries
203
BOW = "<"          # Beginning of word token - used in subword processing
204
EOW = ">"          # End of word token - used in subword processing
205

206
# Deprecated functions (raise exceptions with migration guidance)
207
cbow = fasttext.cbow              # Raises exception, use train_unsupervised(model='cbow')
208
skipgram = fasttext.skipgram      # Raises exception, use train_unsupervised(model='skipgram') 
209
supervised = fasttext.supervised  # Raises exception, use train_supervised()
210
```
211

212
## Error Handling
213

214
FastText functions accept `on_unicode_error` parameter for handling Unicode errors:
215
- `'strict'` (default): Raise exception on Unicode errors
216
- `'ignore'`: Skip invalid Unicode characters
217
- `'replace'`: Replace invalid Unicode with placeholder