CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/npm-tensorflow-models--universal-sentence-encoder

Universal Sentence Encoder for generating text embeddings using TensorFlow.js

Pending
Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Pending

The risk profile of this skill

Overview
Eval results
Files

Universal Sentence Encoder

The Universal Sentence Encoder provides TensorFlow.js implementations for converting text into high-dimensional embeddings. It includes both the standard USE model that generates 512-dimensional embeddings for general text similarity and clustering tasks, and the USE QnA model that creates 100-dimensional embeddings specifically optimized for question-answering applications.

Package Information

  • Package Name: @tensorflow-models/universal-sentence-encoder
  • Package Type: npm
  • Language: TypeScript
  • Installation: npm install @tensorflow/tfjs @tensorflow-models/universal-sentence-encoder

Core Imports

import * as use from '@tensorflow-models/universal-sentence-encoder';

For CommonJS:

const use = require('@tensorflow-models/universal-sentence-encoder');

Basic Usage

import * as use from '@tensorflow-models/universal-sentence-encoder';

// Load the model
const model = await use.load();

// Embed sentences
const sentences = [
  'Hello.',
  'How are you?'
];

const embeddings = await model.embed(sentences);
// embeddings is a 2D tensor with shape [2, 512]
embeddings.print();

Architecture

Universal Sentence Encoder is built around several key components:

  • Main USE Model: Generates 512-dimensional embeddings using the Transformer architecture
  • USE QnA Model: Specialized 100-dimensional embeddings for question-answering tasks
  • Tokenizer: SentencePiece tokenization with 8k word piece vocabulary using Trie data structure
  • Model Loading: Supports custom model and vocabulary URLs for flexibility
  • TensorFlow.js Integration: Built on tfjs-converter and tfjs-core for browser and Node.js compatibility

Capabilities

Standard Text Embeddings

Core Universal Sentence Encoder functionality for generating 512-dimensional embeddings from text. Ideal for semantic similarity, clustering, and general NLP tasks.

function load(config?: LoadConfig): Promise<UniversalSentenceEncoder>;

interface LoadConfig {
  modelUrl?: string;
  vocabUrl?: string;
}

class UniversalSentenceEncoder {
  embed(inputs: string[] | string): Promise<tf.Tensor2D>;
}

Standard Embeddings

Question-Answering Embeddings

Specialized Universal Sentence Encoder for question-answering applications, generating 100-dimensional embeddings optimized for matching questions with answers.

function loadQnA(): Promise<UniversalSentenceEncoderQnA>;

class UniversalSentenceEncoderQnA {
  embed(input: ModelInput): ModelOutput;
}

interface ModelInput {
  queries: string[];
  responses: string[];
  contexts?: string[];
}

interface ModelOutput {
  queryEmbedding: tf.Tensor;
  responseEmbedding: tf.Tensor;
}

Question-Answering

Text Tokenization

Independent tokenizer functionality using SentencePiece algorithm for converting text into token sequences. Can be used separately from the embedding models.

function loadTokenizer(pathToVocabulary?: string): Promise<Tokenizer>;
function loadVocabulary(pathToVocabulary: string): Promise<Vocabulary>;
function stringToChars(input: string): string[];

class Tokenizer {
  constructor(vocabulary: Vocabulary, reservedSymbolsCount?: number);
  encode(input: string): number[];
}

class Trie {
  constructor();
  insert(word: string, score: number, index: number): void;
  commonPrefixSearch(symbols: string[]): Array<[string[], number, number]>;
}

Tokenization

Types

// TensorFlow.js tensors
import * as tf from '@tensorflow/tfjs-core';

// Core types
type Vocabulary = Array<[string, number]>;

// Version information
const version: string;
Workspace
tessl
Visibility
Public
Created
Last updated
Describes
npmpkg:npm/@tensorflow-models/universal-sentence-encoder@1.3.x
Publish Source
CLI
Badge
tessl/npm-tensorflow-models--universal-sentence-encoder badge