or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

index.mdlanguage-model.mdtext-prediction.mdtraining-system.mdvector-operations.md

language-model.mddocs/

0

# Language Model

1

2

The Language Model is the primary interface for creating and managing next-token prediction models. It provides a factory function that handles initialization, training, and returns a comprehensive API for text prediction tasks.

3

4

## Capabilities

5

6

### Language Factory Function

7

8

Creates a language model instance with various initialization options including bootstrap training, custom datasets, or file-based training.

9

10

```javascript { .api }

11

/**

12

* Create a language model instance

13

* @param {Object} options - Configuration options

14

* @param {string} [options.name] - Dataset name for identification

15

* @param {Object} [options.dataset] - Pre-existing dataset with name and files

16

* @param {string[]} [options.files] - Training document filenames (without .txt extension)

17

* @param {boolean} [options.bootstrap=false] - Use built-in default training data

18

* @returns {Promise<LanguageModel>} Language model API with prediction and training methods

19

*/

20

async function Language(options = {});

21

```

22

23

**Usage Examples:**

24

25

```javascript

26

const { Language } = require('next-token-prediction');

27

28

// Bootstrap with default training data

29

const defaultModel = await Language({

30

bootstrap: true

31

});

32

33

// Use pre-existing dataset

34

const Dataset = require('./training/datasets/OpenSourceBooks');

35

const bookModel = await Language({

36

dataset: Dataset

37

});

38

39

// Train on custom files

40

const customModel = await Language({

41

name: 'my-dataset',

42

files: ['document1', 'document2', 'document3']

43

});

44

```

45

46

### Language Model Instance

47

48

The created language model instance provides both high-level convenience methods and full access to the underlying transformer capabilities.

49

50

```javascript { .api }

51

/**

52

* Language model instance with prediction and training capabilities

53

*/

54

interface LanguageModel {

55

// High-level prediction methods

56

complete(query: string): string;

57

58

// Full transformer API access

59

getTokenPrediction(token: string): TokenPredictionResult;

60

getTokenSequencePrediction(input: string, sequenceLength?: number): SequencePredictionResult;

61

getCompletions(input: string): CompletionsResult;

62

63

// Training and model management

64

train(dataset: Dataset): Promise<void>;

65

createContext(embeddings: EmbeddingsObject): void;

66

ingest(text: string): void;

67

fromTrainingData(trainingData: TrainingData): TransformerAPI;

68

fromFiles(files: string[]): Promise<TransformerAPI>;

69

}

70

```

71

72

### Complete Method

73

74

High-level convenience method that returns the single best completion for a given input query.

75

76

```javascript { .api }

77

/**

78

* Get the highest-ranked completion for input text

79

* @param {string} query - Input text to complete

80

* @returns {string} Best completion prediction

81

*/

82

complete(query);

83

```

84

85

**Usage Examples:**

86

87

```javascript

88

// Simple completion

89

const result1 = model.complete('The weather today is');

90

// Returns: "beautiful" (or other highest-ranked prediction)

91

92

// Phrase completion

93

const result2 = model.complete('JavaScript is a programming');

94

// Returns: "language" (or similar contextual completion)

95

```

96

97

### Factory Methods

98

99

Methods for creating transformer instances from different data sources.

100

101

```javascript { .api }

102

/**

103

* Create transformer from pre-computed training data

104

* @param {TrainingData} trainingData - Object with text and embeddings

105

* @returns {TransformerAPI} Transformer instance ready for predictions

106

*/

107

fromTrainingData(trainingData);

108

109

/**

110

* Create transformer from text files with full training process

111

* @param {string[]} files - Document filenames (without .txt extension)

112

* @returns {Promise<TransformerAPI>} Trained transformer instance

113

*/

114

fromFiles(files);

115

```

116

117

**Usage Examples:**

118

119

```javascript

120

// Using pre-computed embeddings

121

const trainingData = {

122

text: "Combined document text...",

123

embeddings: { /* pre-computed embeddings */ }

124

};

125

const transformer1 = model.fromTrainingData(trainingData);

126

127

// Training from files

128

const transformer2 = await model.fromFiles([

129

'shakespeare-hamlet',

130

'shakespeare-macbeth',

131

'shakespeare-othello'

132

]);

133

```

134

135

## Types

136

137

### Language Model Configuration

138

139

```javascript { .api }

140

/**

141

* Configuration options for Language factory function

142

*/

143

interface LanguageOptions {

144

name?: string; // Dataset identifier

145

dataset?: Dataset; // Pre-existing dataset configuration

146

files?: string[]; // Training document filenames

147

bootstrap?: boolean; // Use default training data

148

}

149

150

/**

151

* Training dataset configuration

152

*/

153

interface Dataset {

154

name: string; // Dataset identifier

155

files: string[]; // Document filenames without .txt extension

156

}

157

158

/**

159

* Pre-computed training data structure

160

*/

161

interface TrainingData {

162

text: string; // Combined training text

163

embeddings: EmbeddingsObject; // Token embedding vectors

164

}

165

```