or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/pypi-fasttext

FastText library for efficient learning of word representations and sentence classification

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/fasttext@0.9.x

To install, run

npx @tessl/cli install tessl/pypi-fasttext@0.9.0

0

# FastText

1

2

FastText is a library for efficient learning of word representations and sentence classification developed by Facebook Research. The Python bindings provide comprehensive access to FastText's C++ core, enabling unsupervised word representation learning, supervised text classification, and subword information processing.

3

4

## Package Information

5

6

- **Package Name**: fasttext

7

- **Language**: Python (with C++ core)

8

- **Installation**: `pip install fasttext`

9

10

## Core Imports

11

12

```python

13

import fasttext

14

```

15

16

Main functions and model class:

17

18

```python

19

from fasttext import train_supervised, train_unsupervised, load_model, tokenize

20

```

21

22

## Basic Usage

23

24

### Training a Word Embedding Model

25

26

```python

27

import fasttext

28

29

# Train an unsupervised model on text file

30

model = fasttext.train_unsupervised('data.txt', model='skipgram')

31

32

# Get word vector

33

word_vector = model.get_word_vector('king')

34

35

# Find similar words

36

neighbors = model.get_nearest_neighbors('king')

37

print(neighbors)

38

```

39

40

### Training a Text Classification Model

41

42

```python

43

import fasttext

44

45

# Train supervised classifier

46

model = fasttext.train_supervised('train.txt')

47

48

# Predict labels for text

49

predictions = model.predict('This is a sample text')

50

print(predictions)

51

52

# Evaluate on test data

53

results = model.test('test.txt')

54

print(f"P@1: {results[1]}, R@1: {results[2]}")

55

```

56

57

### Loading Pre-trained Models

58

59

```python

60

import fasttext

61

62

# Load a pre-trained model

63

model = fasttext.load_model('model.bin')

64

65

# Get sentence vector

66

sentence_vector = model.get_sentence_vector('Hello world')

67

```

68

69

## Architecture

70

71

FastText combines several key innovations:

72

73

- **Subword Information**: Handles out-of-vocabulary words by learning representations for character n-grams

74

- **Hierarchical Softmax**: Efficient training for large vocabularies

75

- **Bag-of-Words Models**: CBOW and Skip-gram architectures for unsupervised learning

76

- **Fast Text Classification**: Linear classifiers with efficient training and inference

77

78

The Python bindings expose the complete C++ API through pybind11, providing both high-level training functions and low-level model manipulation capabilities.

79

80

## Capabilities

81

82

### Model Training

83

84

Core training functions for both supervised classification and unsupervised word embeddings with extensive hyperparameter control.

85

86

```python { .api }

87

def train_supervised(input, **kwargs):

88

"""

89

Train a supervised classification model.

90

91

Args:

92

input (str): Path to training file

93

**kwargs: Training parameters (lr, dim, epoch, etc.)

94

95

Returns:

96

FastText model object

97

"""

98

99

def train_unsupervised(input, **kwargs):

100

"""

101

Train an unsupervised word embedding model.

102

103

Args:

104

input (str): Path to training file

105

**kwargs: Training parameters (model, lr, dim, etc.)

106

107

Returns:

108

FastText model object

109

"""

110

111

def load_model(path):

112

"""

113

Load a pre-trained FastText model.

114

115

Args:

116

path (str): Path to model file

117

118

Returns:

119

FastText model object

120

"""

121

```

122

123

[Model Training](./training.md)

124

125

### Word Vector Operations

126

127

Access and manipulate word vectors, find similar words, and perform vector arithmetic operations.

128

129

```python { .api }

130

def get_word_vector(word):

131

"""Get vector representation of a word."""

132

133

def get_sentence_vector(text):

134

"""Get vector representation of a sentence."""

135

136

def get_nearest_neighbors(word, k=10):

137

"""Find k nearest neighbors of a word."""

138

139

def get_analogies(wordA, wordB, wordC, k=10):

140

"""Find analogies of the form A:B::C:?"""

141

```

142

143

[Word Vectors](./word-vectors.md)

144

145

### Text Classification

146

147

Predict labels for text, evaluate model performance, and access detailed classification metrics.

148

149

```python { .api }

150

def predict(text, k=1, threshold=0.0):

151

"""

152

Predict labels for input text.

153

154

Args:

155

text (str): Input text to classify

156

k (int): Number of top predictions to return

157

threshold (float): Minimum prediction confidence

158

159

Returns:

160

Tuple of (labels, probabilities)

161

"""

162

163

def test(path, k=1, threshold=0.0):

164

"""

165

Evaluate model on test data.

166

167

Returns:

168

Tuple of (sample_count, precision, recall)

169

"""

170

```

171

172

[Classification](./classification.md)

173

174

### Utility Functions

175

176

Helper functions for text processing, model manipulation, and downloading pre-trained models.

177

178

```python { .api }

179

def tokenize(text):

180

"""Tokenize text into list of tokens."""

181

182

def quantize(**kwargs):

183

"""Quantize model to reduce memory usage."""

184

185

# Utility module functions

186

import fasttext.util

187

fasttext.util.download_model(lang_id, if_exists='strict')

188

fasttext.util.reduce_model(model, target_dim)

189

```

190

191

[Utilities](./utilities.md)

192

193

## Constants and Enums

194

195

```python { .api }

196

# Model type enums (from C++ bindings via fasttext_pybind)

197

import fasttext

198

model_name = fasttext.model_name # Enum with values: cbow, skipgram, supervised

199

loss_name = fasttext.loss_name # Enum with values: hs, ns, softmax, ova

200

201

# Special tokens used in text processing

202

EOS = "</s>" # End of sentence token - marks sentence boundaries

203

BOW = "<" # Beginning of word token - used in subword processing

204

EOW = ">" # End of word token - used in subword processing

205

206

# Deprecated functions (raise exceptions with migration guidance)

207

cbow = fasttext.cbow # Raises exception, use train_unsupervised(model='cbow')

208

skipgram = fasttext.skipgram # Raises exception, use train_unsupervised(model='skipgram')

209

supervised = fasttext.supervised # Raises exception, use train_supervised()

210

```

211

212

## Error Handling

213

214

FastText functions accept `on_unicode_error` parameter for handling Unicode errors:

215

- `'strict'` (default): Raise exception on Unicode errors

216

- `'ignore'`: Skip invalid Unicode characters

217

- `'replace'`: Replace invalid Unicode with placeholder