0
# Next Token Prediction
1
2
Next Token Prediction is a JavaScript library for creating and training language models with next-token prediction capabilities. It provides transformer-based architecture with support for custom training data, offering autocomplete, text completion, and AI-powered text generation functionality in pure JavaScript without external API dependencies.
3
4
## Package Information
5
6
- **Package Name**: next-token-prediction
7
- **Package Type**: npm
8
- **Language**: JavaScript (Node.js)
9
- **Installation**: `npm install next-token-prediction`
10
11
## Core Imports
12
13
```javascript
14
const { Language } = require('next-token-prediction');
15
```
16
17
## Basic Usage
18
19
```javascript
20
const { Language } = require('next-token-prediction');
21
22
// Simple bootstrap approach with built-in training data
23
const model = await Language({
24
bootstrap: true
25
});
26
27
// Predict next token
28
const nextWord = model.getTokenPrediction('hello');
29
30
// Complete a phrase
31
const completion = model.complete('The weather is');
32
33
// Get multiple completion alternatives
34
const completions = model.getCompletions('JavaScript is');
35
```
36
37
## Architecture
38
39
Next Token Prediction is built around several key components:
40
41
- **Language Model**: High-level factory function that provides training and prediction capabilities
42
- **Transformer Engine**: Core tokenization, n-gram analysis, and prediction engine
43
- **Vector System**: High-dimensional embedding vectors for semantic token relationships
44
- **Training Pipeline**: Comprehensive training system with multiple metrics and embedding generation
45
- **Dataset Management**: Built-in datasets and support for custom training documents
46
47
## Capabilities
48
49
### Language Model Creation
50
51
Factory function for creating language model instances with various initialization options including bootstrap training, custom datasets, or file-based training.
52
53
```javascript { .api }
54
/**
55
* Create a language model instance
56
* @param {Object} options - Configuration options
57
* @param {string} [options.name] - Dataset name
58
* @param {Object} [options.dataset] - Pre-existing dataset with name and files
59
* @param {string[]} [options.files] - Training document filenames (without .txt extension)
60
* @param {boolean} [options.bootstrap=false] - Use built-in default training data
61
* @returns {Promise<LanguageModel>} Language model API
62
*/
63
async function Language(options = {});
64
```
65
66
[Language Model](./language-model.md)
67
68
### Text Prediction
69
70
Core prediction capabilities for single tokens, token sequences, and multiple completion alternatives with ranking and confidence scoring.
71
72
```javascript { .api }
73
/**
74
* Predict the next single token
75
* @param {string} token - Input token or phrase
76
* @returns {Object} Prediction result with token and alternatives
77
*/
78
getTokenPrediction(token);
79
80
/**
81
* Predict a sequence of tokens
82
* @param {string} input - Input text
83
* @param {number} [sequenceLength=2] - Number of tokens to predict
84
* @returns {Object} Sequence prediction with completion and metadata
85
*/
86
getTokenSequencePrediction(input, sequenceLength);
87
88
/**
89
* Get multiple completion alternatives
90
* @param {string} input - Input text
91
* @returns {Object} Multiple completions with ranking information
92
*/
93
getCompletions(input);
94
```
95
96
[Text Prediction](./text-prediction.md)
97
98
### Training System
99
100
Advanced training capabilities for creating custom models from text documents with comprehensive embedding generation and n-gram analysis.
101
102
```javascript { .api }
103
/**
104
* Train model on dataset
105
* @param {Object} dataset - Training dataset
106
* @param {string} dataset.name - Dataset identifier
107
* @param {string[]} dataset.files - Document filenames (without .txt extension)
108
* @returns {Promise<void>} Completes when training finished
109
*/
110
train(dataset);
111
112
/**
113
* Create model context from pre-computed embeddings
114
* @param {Object} embeddings - Token embeddings object
115
*/
116
createContext(embeddings);
117
```
118
119
[Training System](./training-system.md)
120
121
### Vector Operations
122
123
Internal vector system for embedding representations and similarity calculations. The Vector class is used internally by the library for high-dimensional token embeddings but is not directly exported from the main package.
124
125
[Vector Operations](./vector-operations.md)
126
127
## Types
128
129
### Core Types
130
131
```javascript { .api }
132
/**
133
* Language model instance with prediction and training capabilities
134
*/
135
interface LanguageModel {
136
// Prediction methods
137
getTokenPrediction(token: string): TokenPredictionResult;
138
getTokenSequencePrediction(input: string, sequenceLength?: number): SequencePredictionResult;
139
getCompletions(input: string): CompletionsResult;
140
complete(query: string): string;
141
142
// Training methods
143
train(dataset: Dataset): Promise<void>;
144
createContext(embeddings: EmbeddingsObject): void;
145
ingest(text: string): void;
146
147
// Factory methods
148
fromTrainingData(trainingData: TrainingData): TransformerAPI;
149
fromFiles(files: string[]): Promise<TransformerAPI>;
150
}
151
152
/**
153
* Training dataset configuration
154
*/
155
interface Dataset {
156
name: string;
157
files: string[]; // Document filenames without .txt extension
158
}
159
160
/**
161
* Pre-computed training data with text and embeddings
162
*/
163
interface TrainingData {
164
text: string;
165
embeddings: EmbeddingsObject;
166
}
167
168
/**
169
* Token prediction result with alternatives
170
*/
171
interface TokenPredictionResult {
172
token: string;
173
rankedTokenList: string[];
174
error?: { message: string };
175
}
176
177
/**
178
* Sequence prediction result with completion details
179
*/
180
interface SequencePredictionResult {
181
completion: string;
182
sequenceLength: number;
183
token: string;
184
rankedTokenList: string[];
185
}
186
187
/**
188
* Multiple completions result with ranking
189
*/
190
interface CompletionsResult {
191
completion: string;
192
token: string;
193
rankedTokenList: string[];
194
completions: string[];
195
}
196
197
/**
198
* Nested embeddings structure
199
*/
200
interface EmbeddingsObject {
201
[token: string]: {
202
[nextToken: string]: number[]; // Vector of DIMENSIONS length
203
};
204
}
205
206
/**
207
* Transformer API with core prediction and training methods
208
*/
209
interface TransformerAPI {
210
// Core prediction methods
211
getTokenPrediction(token: string): TokenPredictionResult;
212
getTokenSequencePrediction(input: string, sequenceLength?: number): SequencePredictionResult;
213
getCompletions(input: string): CompletionsResult;
214
215
// Training and context methods
216
train(dataset: Dataset): Promise<void>;
217
createContext(embeddings: EmbeddingsObject): void;
218
ingest(text: string): void;
219
}
220
```