Tessl Tile for npm/next-token-prediction@1.1.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

index.md language-model.md text-prediction.md training-system.md vector-operations.md

index.mddocs/

0
# Next Token Prediction
1

2
Next Token Prediction is a JavaScript library for creating and training language models with next-token prediction capabilities. It provides transformer-based architecture with support for custom training data, offering autocomplete, text completion, and AI-powered text generation functionality in pure JavaScript without external API dependencies.
3

4
## Package Information
5

6
- **Package Name**: next-token-prediction
7
- **Package Type**: npm
8
- **Language**: JavaScript (Node.js)
9
- **Installation**: `npm install next-token-prediction`
10

11
## Core Imports
12

13
```javascript
14
const { Language } = require('next-token-prediction');
15
```
16

17
## Basic Usage
18

19
```javascript
20
const { Language } = require('next-token-prediction');
21

22
// Simple bootstrap approach with built-in training data
23
const model = await Language({
24
  bootstrap: true
25
});
26

27
// Predict next token
28
const nextWord = model.getTokenPrediction('hello');
29

30
// Complete a phrase
31
const completion = model.complete('The weather is');
32

33
// Get multiple completion alternatives
34
const completions = model.getCompletions('JavaScript is');
35
```
36

37
## Architecture
38

39
Next Token Prediction is built around several key components:
40

41
- **Language Model**: High-level factory function that provides training and prediction capabilities
42
- **Transformer Engine**: Core tokenization, n-gram analysis, and prediction engine
43
- **Vector System**: High-dimensional embedding vectors for semantic token relationships
44
- **Training Pipeline**: Comprehensive training system with multiple metrics and embedding generation
45
- **Dataset Management**: Built-in datasets and support for custom training documents
46

47
## Capabilities
48

49
### Language Model Creation
50

51
Factory function for creating language model instances with various initialization options including bootstrap training, custom datasets, or file-based training.
52

53
```javascript { .api }
54
/**
55
 * Create a language model instance
56
 * @param {Object} options - Configuration options
57
 * @param {string} [options.name] - Dataset name
58
 * @param {Object} [options.dataset] - Pre-existing dataset with name and files
59
 * @param {string[]} [options.files] - Training document filenames (without .txt extension)
60
 * @param {boolean} [options.bootstrap=false] - Use built-in default training data
61
 * @returns {Promise<LanguageModel>} Language model API
62
 */
63
async function Language(options = {});
64
```
65

66
[Language Model](./language-model.md)
67

68
### Text Prediction
69

70
Core prediction capabilities for single tokens, token sequences, and multiple completion alternatives with ranking and confidence scoring.
71

72
```javascript { .api }
73
/**
74
 * Predict the next single token
75
 * @param {string} token - Input token or phrase
76
 * @returns {Object} Prediction result with token and alternatives
77
 */
78
getTokenPrediction(token);
79

80
/**
81
 * Predict a sequence of tokens
82
 * @param {string} input - Input text
83
 * @param {number} [sequenceLength=2] - Number of tokens to predict
84
 * @returns {Object} Sequence prediction with completion and metadata
85
 */
86
getTokenSequencePrediction(input, sequenceLength);
87

88
/**
89
 * Get multiple completion alternatives
90
 * @param {string} input - Input text
91
 * @returns {Object} Multiple completions with ranking information
92
 */
93
getCompletions(input);
94
```
95

96
[Text Prediction](./text-prediction.md)
97

98
### Training System
99

100
Advanced training capabilities for creating custom models from text documents with comprehensive embedding generation and n-gram analysis.
101

102
```javascript { .api }
103
/**
104
 * Train model on dataset
105
 * @param {Object} dataset - Training dataset
106
 * @param {string} dataset.name - Dataset identifier
107
 * @param {string[]} dataset.files - Document filenames (without .txt extension)
108
 * @returns {Promise<void>} Completes when training finished
109
 */
110
train(dataset);
111

112
/**
113
 * Create model context from pre-computed embeddings
114
 * @param {Object} embeddings - Token embeddings object
115
 */
116
createContext(embeddings);
117
```
118

119
[Training System](./training-system.md)
120

121
### Vector Operations
122

123
Internal vector system for embedding representations and similarity calculations. The Vector class is used internally by the library for high-dimensional token embeddings but is not directly exported from the main package.
124

125
[Vector Operations](./vector-operations.md)
126

127
## Types
128

129
### Core Types
130

131
```javascript { .api }
132
/**
133
 * Language model instance with prediction and training capabilities
134
 */
135
interface LanguageModel {
136
  // Prediction methods
137
  getTokenPrediction(token: string): TokenPredictionResult;
138
  getTokenSequencePrediction(input: string, sequenceLength?: number): SequencePredictionResult;
139
  getCompletions(input: string): CompletionsResult;
140
  complete(query: string): string;
141

142
  // Training methods
143
  train(dataset: Dataset): Promise<void>;
144
  createContext(embeddings: EmbeddingsObject): void;
145
  ingest(text: string): void;
146

147
  // Factory methods
148
  fromTrainingData(trainingData: TrainingData): TransformerAPI;
149
  fromFiles(files: string[]): Promise<TransformerAPI>;
150
}
151

152
/**
153
 * Training dataset configuration
154
 */
155
interface Dataset {
156
  name: string;
157
  files: string[]; // Document filenames without .txt extension
158
}
159

160
/**
161
 * Pre-computed training data with text and embeddings
162
 */
163
interface TrainingData {
164
  text: string;
165
  embeddings: EmbeddingsObject;
166
}
167

168
/**
169
 * Token prediction result with alternatives
170
 */
171
interface TokenPredictionResult {
172
  token: string;
173
  rankedTokenList: string[];
174
  error?: { message: string };
175
}
176

177
/**
178
 * Sequence prediction result with completion details
179
 */
180
interface SequencePredictionResult {
181
  completion: string;
182
  sequenceLength: number;
183
  token: string;
184
  rankedTokenList: string[];
185
}
186

187
/**
188
 * Multiple completions result with ranking
189
 */
190
interface CompletionsResult {
191
  completion: string;
192
  token: string;
193
  rankedTokenList: string[];
194
  completions: string[];
195
}
196

197
/**
198
 * Nested embeddings structure
199
 */
200
interface EmbeddingsObject {
201
  [token: string]: {
202
    [nextToken: string]: number[]; // Vector of DIMENSIONS length
203
  };
204
}
205

206
/**
207
 * Transformer API with core prediction and training methods
208
 */
209
interface TransformerAPI {
210
  // Core prediction methods
211
  getTokenPrediction(token: string): TokenPredictionResult;
212
  getTokenSequencePrediction(input: string, sequenceLength?: number): SequencePredictionResult;
213
  getCompletions(input: string): CompletionsResult;
214

215
  // Training and context methods
216
  train(dataset: Dataset): Promise<void>;
217
  createContext(embeddings: EmbeddingsObject): void;
218
  ingest(text: string): void;
219
}
220
```

Version

Tile

Files

index.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

index.mddocs/