or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

index.mdlanguage-model.mdtext-prediction.mdtraining-system.mdvector-operations.md
tile.json

tessl/npm-next-token-prediction

JavaScript library for creating language models with next-token prediction capabilities including autocomplete, text completion, and AI-powered text generation.

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
npmpkg:npm/next-token-prediction@1.1.x

To install, run

npx @tessl/cli install tessl/npm-next-token-prediction@1.1.0

index.mddocs/

Next Token Prediction

Next Token Prediction is a JavaScript library for creating and training language models with next-token prediction capabilities. It provides transformer-based architecture with support for custom training data, offering autocomplete, text completion, and AI-powered text generation functionality in pure JavaScript without external API dependencies.

Package Information

  • Package Name: next-token-prediction
  • Package Type: npm
  • Language: JavaScript (Node.js)
  • Installation: npm install next-token-prediction

Core Imports

const { Language } = require('next-token-prediction');

Basic Usage

const { Language } = require('next-token-prediction');

// Simple bootstrap approach with built-in training data
const model = await Language({
  bootstrap: true
});

// Predict next token
const nextWord = model.getTokenPrediction('hello');

// Complete a phrase
const completion = model.complete('The weather is');

// Get multiple completion alternatives
const completions = model.getCompletions('JavaScript is');

Architecture

Next Token Prediction is built around several key components:

  • Language Model: High-level factory function that provides training and prediction capabilities
  • Transformer Engine: Core tokenization, n-gram analysis, and prediction engine
  • Vector System: High-dimensional embedding vectors for semantic token relationships
  • Training Pipeline: Comprehensive training system with multiple metrics and embedding generation
  • Dataset Management: Built-in datasets and support for custom training documents

Capabilities

Language Model Creation

Factory function for creating language model instances with various initialization options including bootstrap training, custom datasets, or file-based training.

/**
 * Create a language model instance
 * @param {Object} options - Configuration options
 * @param {string} [options.name] - Dataset name
 * @param {Object} [options.dataset] - Pre-existing dataset with name and files
 * @param {string[]} [options.files] - Training document filenames (without .txt extension)
 * @param {boolean} [options.bootstrap=false] - Use built-in default training data
 * @returns {Promise<LanguageModel>} Language model API
 */
async function Language(options = {});

Language Model

Text Prediction

Core prediction capabilities for single tokens, token sequences, and multiple completion alternatives with ranking and confidence scoring.

/**
 * Predict the next single token
 * @param {string} token - Input token or phrase
 * @returns {Object} Prediction result with token and alternatives
 */
getTokenPrediction(token);

/**
 * Predict a sequence of tokens
 * @param {string} input - Input text
 * @param {number} [sequenceLength=2] - Number of tokens to predict
 * @returns {Object} Sequence prediction with completion and metadata
 */
getTokenSequencePrediction(input, sequenceLength);

/**
 * Get multiple completion alternatives
 * @param {string} input - Input text
 * @returns {Object} Multiple completions with ranking information
 */
getCompletions(input);

Text Prediction

Training System

Advanced training capabilities for creating custom models from text documents with comprehensive embedding generation and n-gram analysis.

/**
 * Train model on dataset
 * @param {Object} dataset - Training dataset
 * @param {string} dataset.name - Dataset identifier
 * @param {string[]} dataset.files - Document filenames (without .txt extension)
 * @returns {Promise<void>} Completes when training finished
 */
train(dataset);

/**
 * Create model context from pre-computed embeddings
 * @param {Object} embeddings - Token embeddings object
 */
createContext(embeddings);

Training System

Vector Operations

Internal vector system for embedding representations and similarity calculations. The Vector class is used internally by the library for high-dimensional token embeddings but is not directly exported from the main package.

Vector Operations

Types

Core Types

/**
 * Language model instance with prediction and training capabilities
 */
interface LanguageModel {
  // Prediction methods
  getTokenPrediction(token: string): TokenPredictionResult;
  getTokenSequencePrediction(input: string, sequenceLength?: number): SequencePredictionResult;
  getCompletions(input: string): CompletionsResult;
  complete(query: string): string;

  // Training methods
  train(dataset: Dataset): Promise<void>;
  createContext(embeddings: EmbeddingsObject): void;
  ingest(text: string): void;

  // Factory methods
  fromTrainingData(trainingData: TrainingData): TransformerAPI;
  fromFiles(files: string[]): Promise<TransformerAPI>;
}

/**
 * Training dataset configuration
 */
interface Dataset {
  name: string;
  files: string[]; // Document filenames without .txt extension
}

/**
 * Pre-computed training data with text and embeddings
 */
interface TrainingData {
  text: string;
  embeddings: EmbeddingsObject;
}

/**
 * Token prediction result with alternatives
 */
interface TokenPredictionResult {
  token: string;
  rankedTokenList: string[];
  error?: { message: string };
}

/**
 * Sequence prediction result with completion details
 */
interface SequencePredictionResult {
  completion: string;
  sequenceLength: number;
  token: string;
  rankedTokenList: string[];
}

/**
 * Multiple completions result with ranking
 */
interface CompletionsResult {
  completion: string;
  token: string;
  rankedTokenList: string[];
  completions: string[];
}

/**
 * Nested embeddings structure
 */
interface EmbeddingsObject {
  [token: string]: {
    [nextToken: string]: number[]; // Vector of DIMENSIONS length
  };
}

/**
 * Transformer API with core prediction and training methods
 */
interface TransformerAPI {
  // Core prediction methods
  getTokenPrediction(token: string): TokenPredictionResult;
  getTokenSequencePrediction(input: string, sequenceLength?: number): SequencePredictionResult;
  getCompletions(input: string): CompletionsResult;

  // Training and context methods
  train(dataset: Dataset): Promise<void>;
  createContext(embeddings: EmbeddingsObject): void;
  ingest(text: string): void;
}