0
# Core Transformers
1
2
The `SentenceTransformer` class is the main interface for loading, using, and customizing bi-encoder models that map sentences and text to dense vector embeddings.
3
4
## SentenceTransformer Class
5
6
### Constructor
7
8
```python
9
SentenceTransformer(
10
model_name_or_path: str | None = None,
11
modules: Iterable[nn.Module] | None = None,
12
device: str | None = None,
13
prompts: dict[str, str] | None = None,
14
default_prompt_name: str | None = None,
15
similarity_fn_name: str | SimilarityFunction | None = None,
16
cache_folder: str | None = None,
17
trust_remote_code: bool = False,
18
revision: str | None = None,
19
local_files_only: bool = False,
20
token: bool | str | None = None,
21
use_auth_token: bool | str | None = None,
22
truncate_dim: int | None = None,
23
model_kwargs: dict[str, Any] | None = None,
24
tokenizer_kwargs: dict[str, Any] | None = None,
25
config_kwargs: dict[str, Any] | None = None,
26
model_card_data: SentenceTransformerModelCardData | None = None,
27
backend: Literal["torch", "onnx", "openvino"] = "torch"
28
)
29
```
30
`{ .api }`
31
32
Initialize a SentenceTransformer model.
33
34
**Parameters**:
35
- `model_name_or_path`: Model identifier from HuggingFace Hub or local path
36
- `modules`: Iterable of PyTorch modules to create custom model architecture
37
- `device`: Device to run the model on ('cpu', 'cuda', 'mps', 'npu', etc.)
38
- `prompts`: Dictionary of prompts for different tasks
39
- `default_prompt_name`: Default prompt to use
40
- `similarity_fn_name`: Similarity function for embeddings comparison
41
- `cache_folder`: Custom cache directory for models
42
- `trust_remote_code`: Allow custom code execution from remote models
43
- `revision`: Specific model revision/branch to load
44
- `local_files_only`: Only use locally cached files
45
- `token`: HuggingFace authentication token
46
- `use_auth_token`: Deprecated argument, use `token` instead
47
- `truncate_dim`: Truncate embeddings to this dimension
48
- `model_kwargs`: Additional model configuration parameters
49
- `tokenizer_kwargs`: Additional tokenizer configuration parameters
50
- `config_kwargs`: Additional model configuration parameters
51
- `model_card_data`: Model card data object for generating model cards
52
- `backend`: Backend to use for inference ("torch", "onnx", "openvino")
53
54
### Core Encoding Methods
55
56
```python
57
def encode(
58
sentences: str | list[str] | np.ndarray,
59
prompt_name: str | None = None,
60
prompt: str | None = None,
61
batch_size: int = 32,
62
show_progress_bar: bool | None = None,
63
output_value: Literal["sentence_embedding", "token_embeddings"] | None = "sentence_embedding",
64
precision: Literal["float32", "int8", "uint8", "binary", "ubinary"] = "float32",
65
convert_to_numpy: bool = True,
66
convert_to_tensor: bool = False,
67
device: str | list[str | torch.device] | None = None,
68
normalize_embeddings: bool = False,
69
truncate_dim: int | None = None,
70
pool: dict[Literal["input", "output", "processes"], Any] | None = None,
71
chunk_size: int | None = None,
72
**kwargs
73
) -> list[Tensor] | np.ndarray | Tensor | dict[str, Tensor] | list[dict[str, Tensor]]
74
```
75
`{ .api }`
76
77
Encode sentences into embeddings.
78
79
**Parameters**:
80
- `sentences`: Input text(s) to encode
81
- `prompt_name`: Name of the prompt to use for encoding
82
- `prompt`: The prompt to use for encoding
83
- `batch_size`: Batch size for processing
84
- `show_progress_bar`: Display progress bar during encoding
85
- `output_value`: Type of embeddings to return ('sentence_embedding', 'token_embeddings', or None for all)
86
- `precision`: Precision to use for embeddings ("float32", "int8", "uint8", "binary", "ubinary")
87
- `convert_to_numpy`: Return numpy arrays instead of tensors
88
- `convert_to_tensor`: Return PyTorch tensors
89
- `device`: Device(s) for computation (single device or list for multi-process)
90
- `normalize_embeddings`: L2 normalize the embeddings
91
- `truncate_dim`: Dimension to truncate sentence embeddings to
92
- `pool`: Multi-process pool for encoding
93
- `chunk_size`: Size of chunks for multi-process encoding
94
- `**kwargs`: Additional keyword arguments
95
96
**Returns**: Embeddings as numpy arrays, tensors, or lists
97
98
```python
99
def encode_query(
100
sentences: str | list[str] | np.ndarray,
101
prompt_name: str | None = None,
102
prompt: str | None = None,
103
batch_size: int = 32,
104
show_progress_bar: bool | None = None,
105
output_value: Literal["sentence_embedding", "token_embeddings"] | None = "sentence_embedding",
106
precision: Literal["float32", "int8", "uint8", "binary", "ubinary"] = "float32",
107
convert_to_numpy: bool = True,
108
convert_to_tensor: bool = False,
109
device: str | list[str | torch.device] | None = None,
110
normalize_embeddings: bool = False,
111
truncate_dim: int | None = None,
112
pool: dict[Literal["input", "output", "processes"], Any] | None = None,
113
chunk_size: int | None = None,
114
**kwargs
115
) -> list[Tensor] | np.ndarray | Tensor | dict[str, Tensor] | list[dict[str, Tensor]]
116
```
117
`{ .api }`
118
119
Encode queries for retrieval tasks with query-specific prompt.
120
121
```python
122
def encode_document(
123
sentences: str | list[str] | np.ndarray,
124
prompt_name: str | None = None,
125
prompt: str | None = None,
126
batch_size: int = 32,
127
show_progress_bar: bool | None = None,
128
output_value: Literal["sentence_embedding", "token_embeddings"] | None = "sentence_embedding",
129
precision: Literal["float32", "int8", "uint8", "binary", "ubinary"] = "float32",
130
convert_to_numpy: bool = True,
131
convert_to_tensor: bool = False,
132
device: str | list[str | torch.device] | None = None,
133
normalize_embeddings: bool = False,
134
truncate_dim: int | None = None,
135
pool: dict[Literal["input", "output", "processes"], Any] | None = None,
136
chunk_size: int | None = None,
137
**kwargs
138
) -> list[Tensor] | np.ndarray | Tensor | dict[str, Tensor] | list[dict[str, Tensor]]
139
```
140
`{ .api }`
141
142
Encode documents for retrieval tasks with document-specific prompt.
143
144
### Similarity Methods
145
146
```python
147
def similarity(
148
embeddings1: Tensor | npt.NDArray[np.float32],
149
embeddings2: Tensor | npt.NDArray[np.float32]
150
) -> Tensor
151
```
152
`{ .api }`
153
154
Compute similarity between two sets of embeddings using the model's similarity function.
155
156
```python
157
def similarity_pairwise(
158
embeddings1: Tensor | npt.NDArray[np.float32],
159
embeddings2: Tensor | npt.NDArray[np.float32]
160
) -> Tensor
161
```
162
`{ .api }`
163
164
Compute pairwise similarities between all embeddings in two sets.
165
166
### Model Inspection Methods
167
168
```python
169
def get_sentence_embedding_dimension() -> int | None
170
```
171
`{ .api }`
172
173
Get the dimension of sentence embeddings.
174
175
```python
176
def get_max_seq_length() -> int | None
177
```
178
`{ .api }`
179
180
Get the maximum sequence length the model can handle.
181
182
```python
183
def tokenize(
184
texts: list[str] | list[dict] | list[tuple[str, str]],
185
**kwargs
186
) -> dict[str, Tensor]
187
```
188
`{ .api }`
189
190
Tokenize input texts using the model's tokenizer.
191
192
### Model Persistence
193
194
```python
195
def save(
196
path: str,
197
model_name: str | None = None,
198
create_model_card: bool = True,
199
train_datasets: list[str] | None = None,
200
safe_serialization: bool = True
201
) -> None
202
```
203
`{ .api }`
204
205
Save the model to a local directory.
206
207
```python
208
def save_pretrained(
209
save_directory: str,
210
**kwargs
211
) -> None
212
```
213
`{ .api }`
214
215
Save model using HuggingFace format.
216
217
```python
218
def save_to_hub(
219
repo_id: str,
220
organization: str | None = None,
221
token: str | None = None,
222
private: bool | None = None,
223
safe_serialization: bool = True,
224
commit_message: str = "Add new SentenceTransformer model.",
225
local_model_path: str | None = None,
226
exist_ok: bool = False,
227
replace_model_card: bool = False,
228
train_datasets: list[str] | None = None
229
) -> str
230
```
231
`{ .api }`
232
233
Save and push model to HuggingFace Hub.
234
235
```python
236
def push_to_hub(
237
repo_id: str,
238
token: str | None = None,
239
private: bool | None = None,
240
safe_serialization: bool = True,
241
commit_message: str | None = None,
242
local_model_path: str | None = None,
243
exist_ok: bool = False,
244
replace_model_card: bool = False,
245
train_datasets: list[str] | None = None,
246
revision: str | None = None,
247
create_pr: bool = False
248
) -> str
249
```
250
`{ .api }`
251
252
Push existing model to HuggingFace Hub.
253
254
### Evaluation and Processing
255
256
```python
257
def evaluate(
258
evaluator: SentenceEvaluator,
259
output_path: str | None = None
260
) -> float | dict[str, float]
261
```
262
`{ .api }`
263
264
Evaluate the model using a provided evaluator.
265
266
```python
267
def forward(
268
input: dict[str, torch.Tensor],
269
**kwargs
270
) -> dict[str, torch.Tensor]
271
```
272
`{ .api }`
273
274
Forward pass through the model.
275
276
### Multi-Processing Support
277
278
```python
279
def start_multi_process_pool(
280
target_devices: list[str] | None = None
281
) -> dict[Literal["input", "output", "processes"], Any]
282
```
283
`{ .api }`
284
285
Start a multi-process pool for parallel encoding.
286
287
```python
288
@staticmethod
289
def stop_multi_process_pool(pool: dict[Literal["input", "output", "processes"], Any]) -> None
290
```
291
`{ .api }`
292
293
Stop a multi-process pool.
294
295
```python
296
def encode_multi_process(
297
sentences: list[str],
298
pool: dict[Literal["input", "output", "processes"], Any],
299
prompt_name: str | None = None,
300
prompt: str | None = None,
301
batch_size: int = 32,
302
chunk_size: int | None = None,
303
show_progress_bar: bool | None = None,
304
precision: Literal["float32", "int8", "uint8", "binary", "ubinary"] = "float32",
305
normalize_embeddings: bool = False,
306
truncate_dim: int | None = None
307
) -> np.ndarray
308
```
309
`{ .api }`
310
311
Encode sentences using multi-processing for improved performance.
312
313
### Properties
314
315
```python
316
@property
317
def device() -> torch.device
318
```
319
`{ .api }`
320
321
Current device of the model.
322
323
```python
324
@property
325
def tokenizer() -> PreTrainedTokenizer
326
```
327
`{ .api }`
328
329
Access to the model's tokenizer.
330
331
```python
332
@property
333
def max_seq_length() -> int
334
```
335
`{ .api }`
336
337
Maximum sequence length supported by the model.
338
339
```python
340
@property
341
def similarity_fn_name() -> Literal["cosine", "dot", "euclidean", "manhattan"]
342
```
343
`{ .api }`
344
345
Name of the similarity function used by the model.
346
347
```python
348
@property
349
def transformers_model() -> PreTrainedModel | None
350
```
351
`{ .api }`
352
353
Access to the underlying transformer model.
354
355
## Usage Examples
356
357
### Basic Encoding
358
359
```python
360
from sentence_transformers import SentenceTransformer
361
362
# Load pre-trained model
363
model = SentenceTransformer('all-MiniLM-L6-v2')
364
365
# Encode single sentence
366
embedding = model.encode("Hello world")
367
print(f"Embedding shape: {embedding.shape}")
368
369
# Encode multiple sentences
370
sentences = [
371
"The cat sits on the mat",
372
"A feline rests on a rug",
373
"Dogs are great pets"
374
]
375
embeddings = model.encode(sentences)
376
print(f"Embeddings shape: {embeddings.shape}")
377
```
378
379
### Similarity Computation
380
381
```python
382
# Compute similarity between two sentences
383
sentence1 = "The weather is nice today"
384
sentence2 = "Today has beautiful weather"
385
386
emb1 = model.encode(sentence1)
387
emb2 = model.encode(sentence2)
388
389
similarity = model.similarity(emb1, emb2)
390
print(f"Similarity: {similarity.item():.4f}")
391
392
# Pairwise similarities
393
embeddings = model.encode([
394
"Python is a programming language",
395
"Java is used for software development",
396
"I love pizza",
397
"Pasta is delicious"
398
])
399
400
# Compute all pairwise similarities
401
similarities = model.similarity_pairwise(embeddings, embeddings)
402
print(f"Similarity matrix shape: {similarities.shape}")
403
```
404
405
### Asymmetric Retrieval
406
407
```python
408
# For retrieval tasks with different prompts
409
queries = ["What is machine learning?", "How does neural networks work?"]
410
documents = [
411
"Machine learning is a subset of artificial intelligence",
412
"Neural networks are computational models inspired by biological neurons",
413
"Pizza recipes vary by region and preference"
414
]
415
416
# Encode with task-specific methods
417
query_embeddings = model.encode_query(queries)
418
doc_embeddings = model.encode_document(documents)
419
420
# Compute retrieval similarities
421
similarities = model.similarity(query_embeddings, doc_embeddings)
422
```
423
424
### Custom Model Creation
425
426
```python
427
from sentence_transformers import SentenceTransformer
428
from sentence_transformers.models import Transformer, Pooling, Dense
429
430
# Create custom model architecture
431
transformer = Transformer('distilbert-base-uncased', max_seq_length=256)
432
pooling = Pooling(transformer.get_word_embedding_dimension(), pooling_mode='mean')
433
dense = Dense(pooling.get_sentence_embedding_dimension(), 256, activation_function='tanh')
434
435
# Combine modules
436
model = SentenceTransformer(modules=[transformer, pooling, dense])
437
438
# Use the custom model
439
embeddings = model.encode(["Custom model example"])
440
```
441
442
### Performance Optimization
443
444
```python
445
# Multi-process encoding for large datasets
446
sentences = ["sentence " + str(i) for i in range(10000)]
447
448
# Start multi-process pool
449
pool = model.start_multi_process_pool(['cuda:0', 'cuda:1'])
450
451
# Encode using multiple GPUs
452
embeddings = model.encode_multi_process(sentences, pool, batch_size=64)
453
454
# Clean up
455
model.stop_multi_process_pool(pool)
456
457
# Normalized embeddings for cosine similarity
458
embeddings = model.encode(sentences, normalize_embeddings=True)
459
```
460
461
### Model Persistence
462
463
```python
464
# Save model locally
465
model.save('./my-sentence-transformer')
466
467
# Save to HuggingFace Hub
468
model.save_to_hub('my-username/my-sentence-transformer')
469
470
# Load saved model
471
loaded_model = SentenceTransformer('./my-sentence-transformer')
472
```
473
474
## SimilarityFunction Enum
475
476
```python
477
from sentence_transformers import SimilarityFunction
478
479
class SimilarityFunction(Enum):
480
COSINE = "cosine"
481
DOT_PRODUCT = "dot"
482
DOT = "dot" # Alias for DOT_PRODUCT
483
EUCLIDEAN = "euclidean"
484
MANHATTAN = "manhattan"
485
```
486
`{ .api }`
487
488
Enumeration of available similarity functions for comparing embeddings.
489
490
### Usage with SentenceTransformer
491
492
```python
493
# Set similarity function during initialization
494
model = SentenceTransformer(
495
'all-MiniLM-L6-v2',
496
similarity_fn_name=SimilarityFunction.COSINE
497
)
498
499
# Or use string names
500
model = SentenceTransformer(
501
'all-MiniLM-L6-v2',
502
similarity_fn_name='euclidean'
503
)
504
```
505
506
## Best Practices
507
508
1. **Batch Processing**: Use appropriate batch sizes for your hardware
509
2. **Device Management**: Specify device explicitly for consistent behavior
510
3. **Normalization**: Use normalized embeddings when comparing with cosine similarity
511
4. **Model Selection**: Choose models appropriate for your task and domain
512
5. **Caching**: Enable caching for repeated model loading
513
6. **Multi-Processing**: Use multi-process encoding for large datasets