0
# Text Prediction
1
2
The Text Prediction system provides core prediction capabilities for single tokens, token sequences, and multiple completion alternatives. It uses n-gram analysis combined with high-dimensional embeddings to generate contextually appropriate text completions.
3
4
## Capabilities
5
6
### Token Prediction
7
8
Predicts the next single token based on input context using n-gram lookup and similarity analysis.
9
10
```javascript { .api }
11
/**
12
* Predict the next single token based on input context
13
* @param {string} token - Input token or phrase to predict from
14
* @returns {TokenPredictionResult} Prediction with token and ranked alternatives
15
*/
16
getTokenPrediction(token);
17
18
/**
19
* Token prediction result structure
20
*/
21
interface TokenPredictionResult {
22
token: string; // Best predicted next token
23
rankedTokenList: string[]; // Alternative tokens ranked by likelihood
24
error?: { // Error information if prediction fails
25
message: string;
26
};
27
}
28
```
29
30
**Usage Examples:**
31
32
```javascript
33
// Simple token prediction
34
const prediction1 = model.getTokenPrediction('hello');
35
// Returns: { token: 'world', rankedTokenList: ['world', 'there', 'everyone', ...] }
36
37
// Phrase-based prediction
38
const prediction2 = model.getTokenPrediction('the weather is');
39
// Returns: { token: 'beautiful', rankedTokenList: ['beautiful', 'nice', 'sunny', ...] }
40
41
// Error handling
42
const prediction3 = model.getTokenPrediction('xyzunknowntoken');
43
// May return: { error: { message: 'Failed to look up n-gram.' }, token: '', rankedTokenList: [] }
44
```
45
46
### Token Sequence Prediction
47
48
Predicts a sequence of multiple tokens by iteratively applying token prediction to build longer completions.
49
50
```javascript { .api }
51
/**
52
* Predict a sequence of tokens extending the input
53
* @param {string} input - Input text to extend
54
* @param {number} [sequenceLength=2] - Number of tokens to predict in sequence
55
* @returns {SequencePredictionResult} Sequence completion with metadata
56
*/
57
getTokenSequencePrediction(input, sequenceLength);
58
59
/**
60
* Sequence prediction result structure
61
*/
62
interface SequencePredictionResult {
63
completion: string; // Complete predicted sequence
64
sequenceLength: number; // Number of tokens in sequence
65
token: string; // First predicted token
66
rankedTokenList: string[]; // Alternative first tokens ranked by likelihood
67
}
68
```
69
70
**Usage Examples:**
71
72
```javascript
73
// Short sequence prediction
74
const sequence1 = model.getTokenSequencePrediction('JavaScript is', 3);
75
// Returns: {
76
// completion: 'a programming language',
77
// sequenceLength: 3,
78
// token: 'a',
79
// rankedTokenList: ['a', 'an', 'the', ...]
80
// }
81
82
// Single token sequence (equivalent to getTokenPrediction but different format)
83
const sequence2 = model.getTokenSequencePrediction('hello', 1);
84
// Returns: {
85
// completion: 'world',
86
// sequenceLength: 1,
87
// token: 'world',
88
// rankedTokenList: ['world', 'there', ...]
89
// }
90
91
// Longer sequence
92
const sequence3 = model.getTokenSequencePrediction('The quick brown', 5);
93
// Returns: {
94
// completion: 'fox jumps over the',
95
// sequenceLength: 5,
96
// token: 'fox',
97
// rankedTokenList: ['fox', 'dog', 'cat', ...]
98
// }
99
```
100
101
### Multiple Completions
102
103
Generates multiple alternative completions for comprehensive text prediction scenarios, providing a top-k sampling approach.
104
105
```javascript { .api }
106
/**
107
* Generate multiple completion alternatives with ranking
108
* @param {string} input - Input text to complete
109
* @returns {CompletionsResult} Multiple completions with ranking information
110
*/
111
getCompletions(input);
112
113
/**
114
* Multiple completions result structure
115
*/
116
interface CompletionsResult {
117
completion: string; // Primary/best completion
118
token: string; // First token of primary completion
119
rankedTokenList: string[]; // Alternative first tokens ranked by likelihood
120
completions: string[]; // Array of alternative full completions
121
}
122
```
123
124
**Usage Examples:**
125
126
```javascript
127
// Get multiple completion options
128
const completions = model.getCompletions('The sun');
129
// Returns: {
130
// completion: 'is shining brightly today',
131
// token: 'is',
132
// rankedTokenList: ['is', 'was', 'will', 'has', ...],
133
// completions: [
134
// 'is shining brightly today',
135
// 'was setting behind the mountains',
136
// 'will rise tomorrow morning',
137
// 'has been hidden by clouds',
138
// // ... more alternatives
139
// ]
140
// }
141
142
// Use for autocomplete suggestions
143
const suggestions = model.getCompletions('I need to');
144
console.log('Completion options:');
145
suggestions.completions.forEach((completion, index) => {
146
console.log(`${index + 1}. I need to ${completion}`);
147
});
148
```
149
150
### Prediction Configuration
151
152
The prediction system respects several environment variables for customization:
153
154
```javascript { .api }
155
/**
156
* Environment configuration affecting prediction behavior
157
*/
158
interface PredictionConfig {
159
RANKING_BATCH_SIZE: number; // Number of alternatives in rankedTokenList (default: 50)
160
MAX_RESPONSE_LENGTH: number; // Maximum sequence length for predictions (default: 240)
161
VARIANCE: number; // Prediction randomization level (default: 0)
162
}
163
```
164
165
**Configuration Examples:**
166
167
```bash
168
# Increase number of alternatives returned
169
export RANKING_BATCH_SIZE=100
170
171
# Allow longer sequence predictions
172
export MAX_RESPONSE_LENGTH=500
173
174
# Add some randomization to predictions (experimental)
175
export VARIANCE=1
176
```
177
178
## Internal Prediction Methods
179
180
These methods are available on the transformer instance but typically used internally:
181
182
```javascript { .api }
183
/**
184
* Look up n-gram by token sequence
185
* @param {string} input - Space-separated token sequence
186
* @returns {Object} N-gram lookup result
187
*/
188
ngramSearch(input);
189
190
/**
191
* Look up embedding vector for token pair
192
* @param {string} prevToken - Previous token context
193
* @param {string} token - Current token
194
* @returns {number[]} Embedding vector or null vector
195
*/
196
embeddingSearch(prevToken, token);
197
198
/**
199
* Calculate vector similarity between tokens
200
* @param {string} prevToken - Previous token context
201
* @param {string} token - Reference token
202
* @returns {Object} Similar token with ranking data
203
*/
204
getSimilarToken(prevToken, token);
205
```
206
207
## Error Handling
208
209
The prediction system handles several error conditions gracefully:
210
211
- **Missing N-grams**: When input tokens don't exist in training data, returns empty predictions with error message
212
- **Unknown Tokens**: Skips unrecognized tokens during processing
213
- **End of Sequence**: Gracefully handles completion at natural stopping points
214
- **Invalid Input**: Returns empty results for null or undefined inputs
215
216
**Error Response Format:**
217
218
```javascript
219
{
220
error: { message: "Failed to look up n-gram." },
221
token: "",
222
rankedTokenList: []
223
}
224
```