0
# Language Model
1
2
The Language Model is the primary interface for creating and managing next-token prediction models. It provides a factory function that handles initialization, training, and returns a comprehensive API for text prediction tasks.
3
4
## Capabilities
5
6
### Language Factory Function
7
8
Creates a language model instance with various initialization options including bootstrap training, custom datasets, or file-based training.
9
10
```javascript { .api }
11
/**
12
* Create a language model instance
13
* @param {Object} options - Configuration options
14
* @param {string} [options.name] - Dataset name for identification
15
* @param {Object} [options.dataset] - Pre-existing dataset with name and files
16
* @param {string[]} [options.files] - Training document filenames (without .txt extension)
17
* @param {boolean} [options.bootstrap=false] - Use built-in default training data
18
* @returns {Promise<LanguageModel>} Language model API with prediction and training methods
19
*/
20
async function Language(options = {});
21
```
22
23
**Usage Examples:**
24
25
```javascript
26
const { Language } = require('next-token-prediction');
27
28
// Bootstrap with default training data
29
const defaultModel = await Language({
30
bootstrap: true
31
});
32
33
// Use pre-existing dataset
34
const Dataset = require('./training/datasets/OpenSourceBooks');
35
const bookModel = await Language({
36
dataset: Dataset
37
});
38
39
// Train on custom files
40
const customModel = await Language({
41
name: 'my-dataset',
42
files: ['document1', 'document2', 'document3']
43
});
44
```
45
46
### Language Model Instance
47
48
The created language model instance provides both high-level convenience methods and full access to the underlying transformer capabilities.
49
50
```javascript { .api }
51
/**
52
* Language model instance with prediction and training capabilities
53
*/
54
interface LanguageModel {
55
// High-level prediction methods
56
complete(query: string): string;
57
58
// Full transformer API access
59
getTokenPrediction(token: string): TokenPredictionResult;
60
getTokenSequencePrediction(input: string, sequenceLength?: number): SequencePredictionResult;
61
getCompletions(input: string): CompletionsResult;
62
63
// Training and model management
64
train(dataset: Dataset): Promise<void>;
65
createContext(embeddings: EmbeddingsObject): void;
66
ingest(text: string): void;
67
fromTrainingData(trainingData: TrainingData): TransformerAPI;
68
fromFiles(files: string[]): Promise<TransformerAPI>;
69
}
70
```
71
72
### Complete Method
73
74
High-level convenience method that returns the single best completion for a given input query.
75
76
```javascript { .api }
77
/**
78
* Get the highest-ranked completion for input text
79
* @param {string} query - Input text to complete
80
* @returns {string} Best completion prediction
81
*/
82
complete(query);
83
```
84
85
**Usage Examples:**
86
87
```javascript
88
// Simple completion
89
const result1 = model.complete('The weather today is');
90
// Returns: "beautiful" (or other highest-ranked prediction)
91
92
// Phrase completion
93
const result2 = model.complete('JavaScript is a programming');
94
// Returns: "language" (or similar contextual completion)
95
```
96
97
### Factory Methods
98
99
Methods for creating transformer instances from different data sources.
100
101
```javascript { .api }
102
/**
103
* Create transformer from pre-computed training data
104
* @param {TrainingData} trainingData - Object with text and embeddings
105
* @returns {TransformerAPI} Transformer instance ready for predictions
106
*/
107
fromTrainingData(trainingData);
108
109
/**
110
* Create transformer from text files with full training process
111
* @param {string[]} files - Document filenames (without .txt extension)
112
* @returns {Promise<TransformerAPI>} Trained transformer instance
113
*/
114
fromFiles(files);
115
```
116
117
**Usage Examples:**
118
119
```javascript
120
// Using pre-computed embeddings
121
const trainingData = {
122
text: "Combined document text...",
123
embeddings: { /* pre-computed embeddings */ }
124
};
125
const transformer1 = model.fromTrainingData(trainingData);
126
127
// Training from files
128
const transformer2 = await model.fromFiles([
129
'shakespeare-hamlet',
130
'shakespeare-macbeth',
131
'shakespeare-othello'
132
]);
133
```
134
135
## Types
136
137
### Language Model Configuration
138
139
```javascript { .api }
140
/**
141
* Configuration options for Language factory function
142
*/
143
interface LanguageOptions {
144
name?: string; // Dataset identifier
145
dataset?: Dataset; // Pre-existing dataset configuration
146
files?: string[]; // Training document filenames
147
bootstrap?: boolean; // Use default training data
148
}
149
150
/**
151
* Training dataset configuration
152
*/
153
interface Dataset {
154
name: string; // Dataset identifier
155
files: string[]; // Document filenames without .txt extension
156
}
157
158
/**
159
* Pre-computed training data structure
160
*/
161
interface TrainingData {
162
text: string; // Combined training text
163
embeddings: EmbeddingsObject; // Token embedding vectors
164
}
165
```