Tessl Tile for npm/@tensorflow-models/universal-sentence-encoder@1.3.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

tessl/npm-tensorflow-models--universal-sentence-encoder

Universal Sentence Encoder for generating text embeddings using TensorFlow.js

Workspace: tessl
Visibility: Public
Created: 3 months ago
Last updated: 3 months ago
Describes: pkg:npm/@tensorflow-models/universal-sentence-encoder@1.3.x

To install, run

npx @tessl/cli install tessl/npm-tensorflow-models--universal-sentence-encoder@1.3.0

0
# Universal Sentence Encoder
1

2
The Universal Sentence Encoder provides TensorFlow.js implementations for converting text into high-dimensional embeddings. It includes both the standard USE model that generates 512-dimensional embeddings for general text similarity and clustering tasks, and the USE QnA model that creates 100-dimensional embeddings specifically optimized for question-answering applications.
3

4
## Package Information
5

6
- **Package Name**: @tensorflow-models/universal-sentence-encoder
7
- **Package Type**: npm
8
- **Language**: TypeScript
9
- **Installation**: `npm install @tensorflow/tfjs @tensorflow-models/universal-sentence-encoder`
10

11
## Core Imports
12

13
```typescript
14
import * as use from '@tensorflow-models/universal-sentence-encoder';
15
```
16

17
For CommonJS:
18

19
```javascript
20
const use = require('@tensorflow-models/universal-sentence-encoder');
21
```
22

23
## Basic Usage
24

25
```typescript
26
import * as use from '@tensorflow-models/universal-sentence-encoder';
27

28
// Load the model
29
const model = await use.load();
30

31
// Embed sentences
32
const sentences = [
33
  'Hello.',
34
  'How are you?'
35
];
36

37
const embeddings = await model.embed(sentences);
38
// embeddings is a 2D tensor with shape [2, 512]
39
embeddings.print();
40
```
41

42
## Architecture
43

44
Universal Sentence Encoder is built around several key components:
45

46
- **Main USE Model**: Generates 512-dimensional embeddings using the Transformer architecture
47
- **USE QnA Model**: Specialized 100-dimensional embeddings for question-answering tasks
48
- **Tokenizer**: SentencePiece tokenization with 8k word piece vocabulary using Trie data structure
49
- **Model Loading**: Supports custom model and vocabulary URLs for flexibility
50
- **TensorFlow.js Integration**: Built on tfjs-converter and tfjs-core for browser and Node.js compatibility
51

52
## Capabilities
53

54
### Standard Text Embeddings
55

56
Core Universal Sentence Encoder functionality for generating 512-dimensional embeddings from text. Ideal for semantic similarity, clustering, and general NLP tasks.
57

58
```typescript { .api }
59
function load(config?: LoadConfig): Promise<UniversalSentenceEncoder>;
60

61
interface LoadConfig {
62
  modelUrl?: string;
63
  vocabUrl?: string;
64
}
65

66
class UniversalSentenceEncoder {
67
  embed(inputs: string[] | string): Promise<tf.Tensor2D>;
68
}
69
```
70

71
[Standard Embeddings](./standard-embeddings.md)
72

73
### Question-Answering Embeddings
74

75
Specialized Universal Sentence Encoder for question-answering applications, generating 100-dimensional embeddings optimized for matching questions with answers.
76

77
```typescript { .api }
78
function loadQnA(): Promise<UniversalSentenceEncoderQnA>;
79

80
class UniversalSentenceEncoderQnA {
81
  embed(input: ModelInput): ModelOutput;
82
}
83

84
interface ModelInput {
85
  queries: string[];
86
  responses: string[];
87
  contexts?: string[];
88
}
89

90
interface ModelOutput {
91
  queryEmbedding: tf.Tensor;
92
  responseEmbedding: tf.Tensor;
93
}
94
```
95

96
[Question-Answering](./question-answering.md)
97

98
### Text Tokenization
99

100
Independent tokenizer functionality using SentencePiece algorithm for converting text into token sequences. Can be used separately from the embedding models.
101

102
```typescript { .api }
103
function loadTokenizer(pathToVocabulary?: string): Promise<Tokenizer>;
104
function loadVocabulary(pathToVocabulary: string): Promise<Vocabulary>;
105
function stringToChars(input: string): string[];
106

107
class Tokenizer {
108
  constructor(vocabulary: Vocabulary, reservedSymbolsCount?: number);
109
  encode(input: string): number[];
110
}
111

112
class Trie {
113
  constructor();
114
  insert(word: string, score: number, index: number): void;
115
  commonPrefixSearch(symbols: string[]): Array<[string[], number, number]>;
116
}
117
```
118

119
[Tokenization](./tokenization.md)
120

121
## Types
122

123
```typescript { .api }
124
// TensorFlow.js tensors
125
import * as tf from '@tensorflow/tfjs-core';
126

127
// Core types
128
type Vocabulary = Array<[string, number]>;
129

130
// Version information
131
const version: string;
132
```