0
# Universal Sentence Encoder
1
2
The Universal Sentence Encoder provides TensorFlow.js implementations for converting text into high-dimensional embeddings. It includes both the standard USE model that generates 512-dimensional embeddings for general text similarity and clustering tasks, and the USE QnA model that creates 100-dimensional embeddings specifically optimized for question-answering applications.
3
4
## Package Information
5
6
- **Package Name**: @tensorflow-models/universal-sentence-encoder
7
- **Package Type**: npm
8
- **Language**: TypeScript
9
- **Installation**: `npm install @tensorflow/tfjs @tensorflow-models/universal-sentence-encoder`
10
11
## Core Imports
12
13
```typescript
14
import * as use from '@tensorflow-models/universal-sentence-encoder';
15
```
16
17
For CommonJS:
18
19
```javascript
20
const use = require('@tensorflow-models/universal-sentence-encoder');
21
```
22
23
## Basic Usage
24
25
```typescript
26
import * as use from '@tensorflow-models/universal-sentence-encoder';
27
28
// Load the model
29
const model = await use.load();
30
31
// Embed sentences
32
const sentences = [
33
'Hello.',
34
'How are you?'
35
];
36
37
const embeddings = await model.embed(sentences);
38
// embeddings is a 2D tensor with shape [2, 512]
39
embeddings.print();
40
```
41
42
## Architecture
43
44
Universal Sentence Encoder is built around several key components:
45
46
- **Main USE Model**: Generates 512-dimensional embeddings using the Transformer architecture
47
- **USE QnA Model**: Specialized 100-dimensional embeddings for question-answering tasks
48
- **Tokenizer**: SentencePiece tokenization with 8k word piece vocabulary using Trie data structure
49
- **Model Loading**: Supports custom model and vocabulary URLs for flexibility
50
- **TensorFlow.js Integration**: Built on tfjs-converter and tfjs-core for browser and Node.js compatibility
51
52
## Capabilities
53
54
### Standard Text Embeddings
55
56
Core Universal Sentence Encoder functionality for generating 512-dimensional embeddings from text. Ideal for semantic similarity, clustering, and general NLP tasks.
57
58
```typescript { .api }
59
function load(config?: LoadConfig): Promise<UniversalSentenceEncoder>;
60
61
interface LoadConfig {
62
modelUrl?: string;
63
vocabUrl?: string;
64
}
65
66
class UniversalSentenceEncoder {
67
embed(inputs: string[] | string): Promise<tf.Tensor2D>;
68
}
69
```
70
71
[Standard Embeddings](./standard-embeddings.md)
72
73
### Question-Answering Embeddings
74
75
Specialized Universal Sentence Encoder for question-answering applications, generating 100-dimensional embeddings optimized for matching questions with answers.
76
77
```typescript { .api }
78
function loadQnA(): Promise<UniversalSentenceEncoderQnA>;
79
80
class UniversalSentenceEncoderQnA {
81
embed(input: ModelInput): ModelOutput;
82
}
83
84
interface ModelInput {
85
queries: string[];
86
responses: string[];
87
contexts?: string[];
88
}
89
90
interface ModelOutput {
91
queryEmbedding: tf.Tensor;
92
responseEmbedding: tf.Tensor;
93
}
94
```
95
96
[Question-Answering](./question-answering.md)
97
98
### Text Tokenization
99
100
Independent tokenizer functionality using SentencePiece algorithm for converting text into token sequences. Can be used separately from the embedding models.
101
102
```typescript { .api }
103
function loadTokenizer(pathToVocabulary?: string): Promise<Tokenizer>;
104
function loadVocabulary(pathToVocabulary: string): Promise<Vocabulary>;
105
function stringToChars(input: string): string[];
106
107
class Tokenizer {
108
constructor(vocabulary: Vocabulary, reservedSymbolsCount?: number);
109
encode(input: string): number[];
110
}
111
112
class Trie {
113
constructor();
114
insert(word: string, score: number, index: number): void;
115
commonPrefixSearch(symbols: string[]): Array<[string[], number, number]>;
116
}
117
```
118
119
[Tokenization](./tokenization.md)
120
121
## Types
122
123
```typescript { .api }
124
// TensorFlow.js tensors
125
import * as tf from '@tensorflow/tfjs-core';
126
127
// Core types
128
type Vocabulary = Array<[string, number]>;
129
130
// Version information
131
const version: string;
132
```