or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

index.mdlanguage-model.mdtext-prediction.mdtraining-system.mdvector-operations.md

index.mddocs/

0

# Next Token Prediction

1

2

Next Token Prediction is a JavaScript library for creating and training language models with next-token prediction capabilities. It provides transformer-based architecture with support for custom training data, offering autocomplete, text completion, and AI-powered text generation functionality in pure JavaScript without external API dependencies.

3

4

## Package Information

5

6

- **Package Name**: next-token-prediction

7

- **Package Type**: npm

8

- **Language**: JavaScript (Node.js)

9

- **Installation**: `npm install next-token-prediction`

10

11

## Core Imports

12

13

```javascript

14

const { Language } = require('next-token-prediction');

15

```

16

17

## Basic Usage

18

19

```javascript

20

const { Language } = require('next-token-prediction');

21

22

// Simple bootstrap approach with built-in training data

23

const model = await Language({

24

bootstrap: true

25

});

26

27

// Predict next token

28

const nextWord = model.getTokenPrediction('hello');

29

30

// Complete a phrase

31

const completion = model.complete('The weather is');

32

33

// Get multiple completion alternatives

34

const completions = model.getCompletions('JavaScript is');

35

```

36

37

## Architecture

38

39

Next Token Prediction is built around several key components:

40

41

- **Language Model**: High-level factory function that provides training and prediction capabilities

42

- **Transformer Engine**: Core tokenization, n-gram analysis, and prediction engine

43

- **Vector System**: High-dimensional embedding vectors for semantic token relationships

44

- **Training Pipeline**: Comprehensive training system with multiple metrics and embedding generation

45

- **Dataset Management**: Built-in datasets and support for custom training documents

46

47

## Capabilities

48

49

### Language Model Creation

50

51

Factory function for creating language model instances with various initialization options including bootstrap training, custom datasets, or file-based training.

52

53

```javascript { .api }

54

/**

55

* Create a language model instance

56

* @param {Object} options - Configuration options

57

* @param {string} [options.name] - Dataset name

58

* @param {Object} [options.dataset] - Pre-existing dataset with name and files

59

* @param {string[]} [options.files] - Training document filenames (without .txt extension)

60

* @param {boolean} [options.bootstrap=false] - Use built-in default training data

61

* @returns {Promise<LanguageModel>} Language model API

62

*/

63

async function Language(options = {});

64

```

65

66

[Language Model](./language-model.md)

67

68

### Text Prediction

69

70

Core prediction capabilities for single tokens, token sequences, and multiple completion alternatives with ranking and confidence scoring.

71

72

```javascript { .api }

73

/**

74

* Predict the next single token

75

* @param {string} token - Input token or phrase

76

* @returns {Object} Prediction result with token and alternatives

77

*/

78

getTokenPrediction(token);

79

80

/**

81

* Predict a sequence of tokens

82

* @param {string} input - Input text

83

* @param {number} [sequenceLength=2] - Number of tokens to predict

84

* @returns {Object} Sequence prediction with completion and metadata

85

*/

86

getTokenSequencePrediction(input, sequenceLength);

87

88

/**

89

* Get multiple completion alternatives

90

* @param {string} input - Input text

91

* @returns {Object} Multiple completions with ranking information

92

*/

93

getCompletions(input);

94

```

95

96

[Text Prediction](./text-prediction.md)

97

98

### Training System

99

100

Advanced training capabilities for creating custom models from text documents with comprehensive embedding generation and n-gram analysis.

101

102

```javascript { .api }

103

/**

104

* Train model on dataset

105

* @param {Object} dataset - Training dataset

106

* @param {string} dataset.name - Dataset identifier

107

* @param {string[]} dataset.files - Document filenames (without .txt extension)

108

* @returns {Promise<void>} Completes when training finished

109

*/

110

train(dataset);

111

112

/**

113

* Create model context from pre-computed embeddings

114

* @param {Object} embeddings - Token embeddings object

115

*/

116

createContext(embeddings);

117

```

118

119

[Training System](./training-system.md)

120

121

### Vector Operations

122

123

Internal vector system for embedding representations and similarity calculations. The Vector class is used internally by the library for high-dimensional token embeddings but is not directly exported from the main package.

124

125

[Vector Operations](./vector-operations.md)

126

127

## Types

128

129

### Core Types

130

131

```javascript { .api }

132

/**

133

* Language model instance with prediction and training capabilities

134

*/

135

interface LanguageModel {

136

// Prediction methods

137

getTokenPrediction(token: string): TokenPredictionResult;

138

getTokenSequencePrediction(input: string, sequenceLength?: number): SequencePredictionResult;

139

getCompletions(input: string): CompletionsResult;

140

complete(query: string): string;

141

142

// Training methods

143

train(dataset: Dataset): Promise<void>;

144

createContext(embeddings: EmbeddingsObject): void;

145

ingest(text: string): void;

146

147

// Factory methods

148

fromTrainingData(trainingData: TrainingData): TransformerAPI;

149

fromFiles(files: string[]): Promise<TransformerAPI>;

150

}

151

152

/**

153

* Training dataset configuration

154

*/

155

interface Dataset {

156

name: string;

157

files: string[]; // Document filenames without .txt extension

158

}

159

160

/**

161

* Pre-computed training data with text and embeddings

162

*/

163

interface TrainingData {

164

text: string;

165

embeddings: EmbeddingsObject;

166

}

167

168

/**

169

* Token prediction result with alternatives

170

*/

171

interface TokenPredictionResult {

172

token: string;

173

rankedTokenList: string[];

174

error?: { message: string };

175

}

176

177

/**

178

* Sequence prediction result with completion details

179

*/

180

interface SequencePredictionResult {

181

completion: string;

182

sequenceLength: number;

183

token: string;

184

rankedTokenList: string[];

185

}

186

187

/**

188

* Multiple completions result with ranking

189

*/

190

interface CompletionsResult {

191

completion: string;

192

token: string;

193

rankedTokenList: string[];

194

completions: string[];

195

}

196

197

/**

198

* Nested embeddings structure

199

*/

200

interface EmbeddingsObject {

201

[token: string]: {

202

[nextToken: string]: number[]; // Vector of DIMENSIONS length

203

};

204

}

205

206

/**

207

* Transformer API with core prediction and training methods

208

*/

209

interface TransformerAPI {

210

// Core prediction methods

211

getTokenPrediction(token: string): TokenPredictionResult;

212

getTokenSequencePrediction(input: string, sequenceLength?: number): SequencePredictionResult;

213

getCompletions(input: string): CompletionsResult;

214

215

// Training and context methods

216

train(dataset: Dataset): Promise<void>;

217

createContext(embeddings: EmbeddingsObject): void;

218

ingest(text: string): void;

219

}

220

```