CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/npm-next-token-prediction

JavaScript library for creating language models with next-token prediction capabilities including autocomplete, text completion, and AI-powered text generation.

Pending
Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Pending

The risk profile of this skill

Overview
Eval results
Files

Next Token Prediction

Next Token Prediction is a JavaScript library for creating and training language models with next-token prediction capabilities. It provides transformer-based architecture with support for custom training data, offering autocomplete, text completion, and AI-powered text generation functionality in pure JavaScript without external API dependencies.

Package Information

  • Package Name: next-token-prediction
  • Package Type: npm
  • Language: JavaScript (Node.js)
  • Installation: npm install next-token-prediction

Core Imports

const { Language } = require('next-token-prediction');

Basic Usage

const { Language } = require('next-token-prediction');

// Simple bootstrap approach with built-in training data
const model = await Language({
  bootstrap: true
});

// Predict next token
const nextWord = model.getTokenPrediction('hello');

// Complete a phrase
const completion = model.complete('The weather is');

// Get multiple completion alternatives
const completions = model.getCompletions('JavaScript is');

Architecture

Next Token Prediction is built around several key components:

  • Language Model: High-level factory function that provides training and prediction capabilities
  • Transformer Engine: Core tokenization, n-gram analysis, and prediction engine
  • Vector System: High-dimensional embedding vectors for semantic token relationships
  • Training Pipeline: Comprehensive training system with multiple metrics and embedding generation
  • Dataset Management: Built-in datasets and support for custom training documents

Capabilities

Language Model Creation

Factory function for creating language model instances with various initialization options including bootstrap training, custom datasets, or file-based training.

/**
 * Create a language model instance
 * @param {Object} options - Configuration options
 * @param {string} [options.name] - Dataset name
 * @param {Object} [options.dataset] - Pre-existing dataset with name and files
 * @param {string[]} [options.files] - Training document filenames (without .txt extension)
 * @param {boolean} [options.bootstrap=false] - Use built-in default training data
 * @returns {Promise<LanguageModel>} Language model API
 */
async function Language(options = {});

Language Model

Text Prediction

Core prediction capabilities for single tokens, token sequences, and multiple completion alternatives with ranking and confidence scoring.

/**
 * Predict the next single token
 * @param {string} token - Input token or phrase
 * @returns {Object} Prediction result with token and alternatives
 */
getTokenPrediction(token);

/**
 * Predict a sequence of tokens
 * @param {string} input - Input text
 * @param {number} [sequenceLength=2] - Number of tokens to predict
 * @returns {Object} Sequence prediction with completion and metadata
 */
getTokenSequencePrediction(input, sequenceLength);

/**
 * Get multiple completion alternatives
 * @param {string} input - Input text
 * @returns {Object} Multiple completions with ranking information
 */
getCompletions(input);

Text Prediction

Training System

Advanced training capabilities for creating custom models from text documents with comprehensive embedding generation and n-gram analysis.

/**
 * Train model on dataset
 * @param {Object} dataset - Training dataset
 * @param {string} dataset.name - Dataset identifier
 * @param {string[]} dataset.files - Document filenames (without .txt extension)
 * @returns {Promise<void>} Completes when training finished
 */
train(dataset);

/**
 * Create model context from pre-computed embeddings
 * @param {Object} embeddings - Token embeddings object
 */
createContext(embeddings);

Training System

Vector Operations

Internal vector system for embedding representations and similarity calculations. The Vector class is used internally by the library for high-dimensional token embeddings but is not directly exported from the main package.

Vector Operations

Types

Core Types

/**
 * Language model instance with prediction and training capabilities
 */
interface LanguageModel {
  // Prediction methods
  getTokenPrediction(token: string): TokenPredictionResult;
  getTokenSequencePrediction(input: string, sequenceLength?: number): SequencePredictionResult;
  getCompletions(input: string): CompletionsResult;
  complete(query: string): string;

  // Training methods
  train(dataset: Dataset): Promise<void>;
  createContext(embeddings: EmbeddingsObject): void;
  ingest(text: string): void;

  // Factory methods
  fromTrainingData(trainingData: TrainingData): TransformerAPI;
  fromFiles(files: string[]): Promise<TransformerAPI>;
}

/**
 * Training dataset configuration
 */
interface Dataset {
  name: string;
  files: string[]; // Document filenames without .txt extension
}

/**
 * Pre-computed training data with text and embeddings
 */
interface TrainingData {
  text: string;
  embeddings: EmbeddingsObject;
}

/**
 * Token prediction result with alternatives
 */
interface TokenPredictionResult {
  token: string;
  rankedTokenList: string[];
  error?: { message: string };
}

/**
 * Sequence prediction result with completion details
 */
interface SequencePredictionResult {
  completion: string;
  sequenceLength: number;
  token: string;
  rankedTokenList: string[];
}

/**
 * Multiple completions result with ranking
 */
interface CompletionsResult {
  completion: string;
  token: string;
  rankedTokenList: string[];
  completions: string[];
}

/**
 * Nested embeddings structure
 */
interface EmbeddingsObject {
  [token: string]: {
    [nextToken: string]: number[]; // Vector of DIMENSIONS length
  };
}

/**
 * Transformer API with core prediction and training methods
 */
interface TransformerAPI {
  // Core prediction methods
  getTokenPrediction(token: string): TokenPredictionResult;
  getTokenSequencePrediction(input: string, sequenceLength?: number): SequencePredictionResult;
  getCompletions(input: string): CompletionsResult;

  // Training and context methods
  train(dataset: Dataset): Promise<void>;
  createContext(embeddings: EmbeddingsObject): void;
  ingest(text: string): void;
}
Workspace
tessl
Visibility
Public
Created
Last updated
Describes
npmpkg:npm/next-token-prediction@1.1.x
Publish Source
CLI
Badge
tessl/npm-next-token-prediction badge