tessl/npm-tensorflow-models--universal-sentence-encoder

Universal Sentence Encoder for generating text embeddings using TensorFlow.js

—

Pending

Overview

Eval results

Files

Standard Text Embeddings

Name: tessl/npm-tensorflow-models--universal-sentence-encoder
Author: tessl

Core Universal Sentence Encoder functionality for generating 512-dimensional embeddings from text using the Transformer architecture. Perfect for semantic similarity, text classification, clustering, and general natural language processing tasks.

Capabilities

Load Model

Creates and loads the main Universal Sentence Encoder model with optional custom configuration.

/**
 * Load the Universal Sentence Encoder model
 * @param config - Optional configuration for custom model and vocabulary URLs
 * @returns Promise that resolves to UniversalSentenceEncoder instance
 */
function load(config?: LoadConfig): Promise<UniversalSentenceEncoder>;

interface LoadConfig {
  /** Custom URL for the model files (defaults to TFHub model) */
  modelUrl?: string;
  /** Custom URL for the vocabulary file (defaults to built-in vocab) */
  vocabUrl?: string;
}

Usage Examples:

import * as use from '@tensorflow-models/universal-sentence-encoder';

// Load with default configuration
const model = await use.load();

// Load with custom model URL
const customModel = await use.load({
  modelUrl: 'https://example.com/my-custom-model',
  vocabUrl: 'https://example.com/my-vocab.json'
});

Universal Sentence Encoder Class

Main class for generating text embeddings with 512-dimensional output vectors.

class UniversalSentenceEncoder {
  /**
   * Load the TensorFlow.js GraphModel
   * @param modelUrl - Optional custom model URL
   * @returns Promise that resolves to the loaded GraphModel
   */
  loadModel(modelUrl?: string): Promise<tf.GraphModel>;
  
  /**
   * Initialize the model and tokenizer
   * @param config - Configuration object with optional URLs
   */
  load(config?: LoadConfig): Promise<void>;
  
  /**
   * Generate embeddings for input text(s)
   * Returns a 2D Tensor of shape [input.length, 512] containing embeddings
   * @param inputs - String or array of strings to embed
   * @returns Promise that resolves to 2D tensor with 512-dimensional embeddings
   */
  embed(inputs: string[] | string): Promise<tf.Tensor2D>;
}

Usage Examples:

import * as use from '@tensorflow-models/universal-sentence-encoder';

// Basic embedding
const model = await use.load();
const embeddings = await model.embed('Hello world');
// Shape: [1, 512]

// Batch embedding
const sentences = [
  'I like my phone.',
  'Your cellphone looks great.',
  'How old are you?',
  'What is your age?'
];
const batchEmbeddings = await model.embed(sentences);
// Shape: [4, 512]

// Calculate similarity
const similarity = tf.matMul(
  batchEmbeddings.slice([0, 0], [1, 512]), // First sentence
  batchEmbeddings.slice([1, 0], [1, 512]), // Second sentence
  false, 
  true
);
console.log(await similarity.data()); // Similarity score

Model Configuration

The standard USE model loads from TensorFlow Hub by default but supports custom configurations.

Default URLs:

Model: https://tfhub.dev/tensorflow/tfjs-model/universal-sentence-encoder-lite/1/default/1
Vocabulary: https://storage.googleapis.com/tfjs-models/savedmodel/universal_sentence_encoder/vocab.json

Custom Configuration Example:

// Using custom model and vocabulary
const model = await use.load({
  modelUrl: 'https://my-server.com/custom-use-model',
  vocabUrl: 'https://my-server.com/custom-vocab.json'
});

Types

import * as tf from '@tensorflow/tfjs-core';
import * as tfconv from '@tensorflow/tfjs-converter';

interface LoadConfig {
  modelUrl?: string;
  vocabUrl?: string;
}

// Internal interface for model inputs
interface ModelInputs extends tf.NamedTensorMap {
  indices: tf.Tensor;
  values: tf.Tensor;
}

Install with Tessl CLI

npx tessl i tessl/npm-tensorflow-models--universal-sentence-encoder

docs

index.md

question-answering.md

standard-embeddings.md

tokenization.md

tile.json