JavaScript library for creating language models with next-token prediction capabilities including autocomplete, text completion, and AI-powered text generation.
The Language Model is the primary interface for creating and managing next-token prediction models. It provides a factory function that handles initialization, training, and returns a comprehensive API for text prediction tasks.
Creates a language model instance with various initialization options including bootstrap training, custom datasets, or file-based training.
/**
* Create a language model instance
* @param {Object} options - Configuration options
* @param {string} [options.name] - Dataset name for identification
* @param {Object} [options.dataset] - Pre-existing dataset with name and files
* @param {string[]} [options.files] - Training document filenames (without .txt extension)
* @param {boolean} [options.bootstrap=false] - Use built-in default training data
* @returns {Promise<LanguageModel>} Language model API with prediction and training methods
*/
async function Language(options = {});Usage Examples:
const { Language } = require('next-token-prediction');
// Bootstrap with default training data
const defaultModel = await Language({
bootstrap: true
});
// Use pre-existing dataset
const Dataset = require('./training/datasets/OpenSourceBooks');
const bookModel = await Language({
dataset: Dataset
});
// Train on custom files
const customModel = await Language({
name: 'my-dataset',
files: ['document1', 'document2', 'document3']
});The created language model instance provides both high-level convenience methods and full access to the underlying transformer capabilities.
/**
* Language model instance with prediction and training capabilities
*/
interface LanguageModel {
// High-level prediction methods
complete(query: string): string;
// Full transformer API access
getTokenPrediction(token: string): TokenPredictionResult;
getTokenSequencePrediction(input: string, sequenceLength?: number): SequencePredictionResult;
getCompletions(input: string): CompletionsResult;
// Training and model management
train(dataset: Dataset): Promise<void>;
createContext(embeddings: EmbeddingsObject): void;
ingest(text: string): void;
fromTrainingData(trainingData: TrainingData): TransformerAPI;
fromFiles(files: string[]): Promise<TransformerAPI>;
}High-level convenience method that returns the single best completion for a given input query.
/**
* Get the highest-ranked completion for input text
* @param {string} query - Input text to complete
* @returns {string} Best completion prediction
*/
complete(query);Usage Examples:
// Simple completion
const result1 = model.complete('The weather today is');
// Returns: "beautiful" (or other highest-ranked prediction)
// Phrase completion
const result2 = model.complete('JavaScript is a programming');
// Returns: "language" (or similar contextual completion)Methods for creating transformer instances from different data sources.
/**
* Create transformer from pre-computed training data
* @param {TrainingData} trainingData - Object with text and embeddings
* @returns {TransformerAPI} Transformer instance ready for predictions
*/
fromTrainingData(trainingData);
/**
* Create transformer from text files with full training process
* @param {string[]} files - Document filenames (without .txt extension)
* @returns {Promise<TransformerAPI>} Trained transformer instance
*/
fromFiles(files);Usage Examples:
// Using pre-computed embeddings
const trainingData = {
text: "Combined document text...",
embeddings: { /* pre-computed embeddings */ }
};
const transformer1 = model.fromTrainingData(trainingData);
// Training from files
const transformer2 = await model.fromFiles([
'shakespeare-hamlet',
'shakespeare-macbeth',
'shakespeare-othello'
]);/**
* Configuration options for Language factory function
*/
interface LanguageOptions {
name?: string; // Dataset identifier
dataset?: Dataset; // Pre-existing dataset configuration
files?: string[]; // Training document filenames
bootstrap?: boolean; // Use default training data
}
/**
* Training dataset configuration
*/
interface Dataset {
name: string; // Dataset identifier
files: string[]; // Document filenames without .txt extension
}
/**
* Pre-computed training data structure
*/
interface TrainingData {
text: string; // Combined training text
embeddings: EmbeddingsObject; // Token embedding vectors
}Install with Tessl CLI
npx tessl i tessl/npm-next-token-prediction