CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/npm-next-token-prediction

JavaScript library for creating language models with next-token prediction capabilities including autocomplete, text completion, and AI-powered text generation.

Overview
Eval results
Files

language-model.mddocs/

Language Model

The Language Model is the primary interface for creating and managing next-token prediction models. It provides a factory function that handles initialization, training, and returns a comprehensive API for text prediction tasks.

Capabilities

Language Factory Function

Creates a language model instance with various initialization options including bootstrap training, custom datasets, or file-based training.

/**
 * Create a language model instance
 * @param {Object} options - Configuration options
 * @param {string} [options.name] - Dataset name for identification
 * @param {Object} [options.dataset] - Pre-existing dataset with name and files
 * @param {string[]} [options.files] - Training document filenames (without .txt extension)
 * @param {boolean} [options.bootstrap=false] - Use built-in default training data
 * @returns {Promise<LanguageModel>} Language model API with prediction and training methods
 */
async function Language(options = {});

Usage Examples:

const { Language } = require('next-token-prediction');

// Bootstrap with default training data
const defaultModel = await Language({
  bootstrap: true
});

// Use pre-existing dataset
const Dataset = require('./training/datasets/OpenSourceBooks');
const bookModel = await Language({
  dataset: Dataset
});

// Train on custom files
const customModel = await Language({
  name: 'my-dataset',
  files: ['document1', 'document2', 'document3']
});

Language Model Instance

The created language model instance provides both high-level convenience methods and full access to the underlying transformer capabilities.

/**
 * Language model instance with prediction and training capabilities
 */
interface LanguageModel {
  // High-level prediction methods
  complete(query: string): string;

  // Full transformer API access
  getTokenPrediction(token: string): TokenPredictionResult;
  getTokenSequencePrediction(input: string, sequenceLength?: number): SequencePredictionResult;
  getCompletions(input: string): CompletionsResult;

  // Training and model management
  train(dataset: Dataset): Promise<void>;
  createContext(embeddings: EmbeddingsObject): void;
  ingest(text: string): void;
  fromTrainingData(trainingData: TrainingData): TransformerAPI;
  fromFiles(files: string[]): Promise<TransformerAPI>;
}

Complete Method

High-level convenience method that returns the single best completion for a given input query.

/**
 * Get the highest-ranked completion for input text
 * @param {string} query - Input text to complete
 * @returns {string} Best completion prediction
 */
complete(query);

Usage Examples:

// Simple completion
const result1 = model.complete('The weather today is');
// Returns: "beautiful" (or other highest-ranked prediction)

// Phrase completion
const result2 = model.complete('JavaScript is a programming');
// Returns: "language" (or similar contextual completion)

Factory Methods

Methods for creating transformer instances from different data sources.

/**
 * Create transformer from pre-computed training data
 * @param {TrainingData} trainingData - Object with text and embeddings
 * @returns {TransformerAPI} Transformer instance ready for predictions
 */
fromTrainingData(trainingData);

/**
 * Create transformer from text files with full training process
 * @param {string[]} files - Document filenames (without .txt extension)
 * @returns {Promise<TransformerAPI>} Trained transformer instance
 */
fromFiles(files);

Usage Examples:

// Using pre-computed embeddings
const trainingData = {
  text: "Combined document text...",
  embeddings: { /* pre-computed embeddings */ }
};
const transformer1 = model.fromTrainingData(trainingData);

// Training from files
const transformer2 = await model.fromFiles([
  'shakespeare-hamlet',
  'shakespeare-macbeth',
  'shakespeare-othello'
]);

Types

Language Model Configuration

/**
 * Configuration options for Language factory function
 */
interface LanguageOptions {
  name?: string;           // Dataset identifier
  dataset?: Dataset;       // Pre-existing dataset configuration
  files?: string[];        // Training document filenames
  bootstrap?: boolean;     // Use default training data
}

/**
 * Training dataset configuration
 */
interface Dataset {
  name: string;           // Dataset identifier
  files: string[];        // Document filenames without .txt extension
}

/**
 * Pre-computed training data structure
 */
interface TrainingData {
  text: string;           // Combined training text
  embeddings: EmbeddingsObject; // Token embedding vectors
}

Install with Tessl CLI

npx tessl i tessl/npm-next-token-prediction

docs

index.md

language-model.md

text-prediction.md

training-system.md

vector-operations.md

tile.json