JavaScript library for creating language models with next-token prediction capabilities including autocomplete, text completion, and AI-powered text generation.
npx @tessl/cli install tessl/npm-next-token-prediction@1.1.0Next Token Prediction is a JavaScript library for creating and training language models with next-token prediction capabilities. It provides transformer-based architecture with support for custom training data, offering autocomplete, text completion, and AI-powered text generation functionality in pure JavaScript without external API dependencies.
npm install next-token-predictionconst { Language } = require('next-token-prediction');const { Language } = require('next-token-prediction');
// Simple bootstrap approach with built-in training data
const model = await Language({
bootstrap: true
});
// Predict next token
const nextWord = model.getTokenPrediction('hello');
// Complete a phrase
const completion = model.complete('The weather is');
// Get multiple completion alternatives
const completions = model.getCompletions('JavaScript is');Next Token Prediction is built around several key components:
Factory function for creating language model instances with various initialization options including bootstrap training, custom datasets, or file-based training.
/**
* Create a language model instance
* @param {Object} options - Configuration options
* @param {string} [options.name] - Dataset name
* @param {Object} [options.dataset] - Pre-existing dataset with name and files
* @param {string[]} [options.files] - Training document filenames (without .txt extension)
* @param {boolean} [options.bootstrap=false] - Use built-in default training data
* @returns {Promise<LanguageModel>} Language model API
*/
async function Language(options = {});Core prediction capabilities for single tokens, token sequences, and multiple completion alternatives with ranking and confidence scoring.
/**
* Predict the next single token
* @param {string} token - Input token or phrase
* @returns {Object} Prediction result with token and alternatives
*/
getTokenPrediction(token);
/**
* Predict a sequence of tokens
* @param {string} input - Input text
* @param {number} [sequenceLength=2] - Number of tokens to predict
* @returns {Object} Sequence prediction with completion and metadata
*/
getTokenSequencePrediction(input, sequenceLength);
/**
* Get multiple completion alternatives
* @param {string} input - Input text
* @returns {Object} Multiple completions with ranking information
*/
getCompletions(input);Advanced training capabilities for creating custom models from text documents with comprehensive embedding generation and n-gram analysis.
/**
* Train model on dataset
* @param {Object} dataset - Training dataset
* @param {string} dataset.name - Dataset identifier
* @param {string[]} dataset.files - Document filenames (without .txt extension)
* @returns {Promise<void>} Completes when training finished
*/
train(dataset);
/**
* Create model context from pre-computed embeddings
* @param {Object} embeddings - Token embeddings object
*/
createContext(embeddings);Internal vector system for embedding representations and similarity calculations. The Vector class is used internally by the library for high-dimensional token embeddings but is not directly exported from the main package.
/**
* Language model instance with prediction and training capabilities
*/
interface LanguageModel {
// Prediction methods
getTokenPrediction(token: string): TokenPredictionResult;
getTokenSequencePrediction(input: string, sequenceLength?: number): SequencePredictionResult;
getCompletions(input: string): CompletionsResult;
complete(query: string): string;
// Training methods
train(dataset: Dataset): Promise<void>;
createContext(embeddings: EmbeddingsObject): void;
ingest(text: string): void;
// Factory methods
fromTrainingData(trainingData: TrainingData): TransformerAPI;
fromFiles(files: string[]): Promise<TransformerAPI>;
}
/**
* Training dataset configuration
*/
interface Dataset {
name: string;
files: string[]; // Document filenames without .txt extension
}
/**
* Pre-computed training data with text and embeddings
*/
interface TrainingData {
text: string;
embeddings: EmbeddingsObject;
}
/**
* Token prediction result with alternatives
*/
interface TokenPredictionResult {
token: string;
rankedTokenList: string[];
error?: { message: string };
}
/**
* Sequence prediction result with completion details
*/
interface SequencePredictionResult {
completion: string;
sequenceLength: number;
token: string;
rankedTokenList: string[];
}
/**
* Multiple completions result with ranking
*/
interface CompletionsResult {
completion: string;
token: string;
rankedTokenList: string[];
completions: string[];
}
/**
* Nested embeddings structure
*/
interface EmbeddingsObject {
[token: string]: {
[nextToken: string]: number[]; // Vector of DIMENSIONS length
};
}
/**
* Transformer API with core prediction and training methods
*/
interface TransformerAPI {
// Core prediction methods
getTokenPrediction(token: string): TokenPredictionResult;
getTokenSequencePrediction(input: string, sequenceLength?: number): SequencePredictionResult;
getCompletions(input: string): CompletionsResult;
// Training and context methods
train(dataset: Dataset): Promise<void>;
createContext(embeddings: EmbeddingsObject): void;
ingest(text: string): void;
}