0
# Utilities
1
2
The sentence-transformers package provides various utility functions for model optimization, quantization, export to different formats, similarity computation, and training enhancements.
3
4
## Model Quantization
5
6
### quantize_embeddings
7
8
```python
9
def quantize_embeddings(
10
embeddings: Tensor | np.ndarray,
11
precision: Literal["float32", "int8", "uint8", "binary", "ubinary"],
12
ranges: np.ndarray | None = None,
13
calibration_embeddings: np.ndarray | None = None
14
) -> np.ndarray
15
```
16
`{ .api }`
17
18
Quantize embeddings to reduce memory usage and improve inference speed.
19
20
**Parameters**:
21
- `embeddings`: Unquantized (e.g. float) embeddings to quantize to a given precision
22
- `precision`: The precision to convert to ("float32", "int8", "uint8", "binary", "ubinary")
23
- `ranges`: Ranges for quantization of embeddings. Used for int8 quantization, where the ranges refer to the minimum and maximum values for each dimension. 2D array with shape (2, embedding_dim)
24
- `calibration_embeddings`: Embeddings used for calibration during quantization. Used for int8 quantization to compute ranges
25
26
**Returns**: Quantized embeddings with the specified precision
27
28
**Usage Examples**:
29
30
```python
31
import numpy as np
32
from sentence_transformers import quantize_embeddings, SentenceTransformer
33
34
# Generate sample embeddings
35
model = SentenceTransformer('all-MiniLM-L6-v2')
36
sentences = ["Hello world", "How are you?", "Machine learning is great"]
37
embeddings = model.encode(sentences)
38
39
# Float32 quantization (no change, returns same embeddings)
40
quantized_embs = quantize_embeddings(embeddings, precision="float32")
41
print(f"Original size: {embeddings.nbytes} bytes")
42
print(f"Quantized size: {quantized_embs.nbytes} bytes")
43
44
# Int8 quantization with calibration
45
calibration_data = model.encode(["Sample sentence " + str(i) for i in range(100)])
46
quantized_int8 = quantize_embeddings(
47
embeddings,
48
precision="int8",
49
calibration_embeddings=calibration_data
50
)
51
52
# Binary quantization (extreme compression)
53
binary_embs = quantize_embeddings(embeddings, precision="binary")
54
```
55
56
## Model Export
57
58
### export_optimized_onnx_model
59
60
```python
61
def export_optimized_onnx_model(
62
model: SentenceTransformer,
63
onnx_model_path: str,
64
opset_version: int = 14,
65
optimization_level: str = "O2"
66
) -> None
67
```
68
`{ .api }`
69
70
Export SentenceTransformer model to optimized ONNX format for deployment.
71
72
**Parameters**:
73
- `model`: SentenceTransformer model to export
74
- `onnx_model_path`: Output path for ONNX model
75
- `opset_version`: ONNX opset version to use
76
- `optimization_level`: Optimization level ("O1", "O2", "O3")
77
78
### export_dynamic_quantized_onnx_model
79
80
```python
81
def export_dynamic_quantized_onnx_model(
82
model: SentenceTransformer,
83
onnx_model_path: str,
84
quantization_mode: str = "IntegerOps"
85
) -> None
86
```
87
`{ .api }`
88
89
Export model to dynamically quantized ONNX format.
90
91
**Parameters**:
92
- `model`: SentenceTransformer model to export
93
- `onnx_model_path`: Output path for quantized ONNX model
94
- `quantization_mode`: Quantization mode ("IntegerOps", "QLinearOps")
95
96
### export_static_quantized_openvino_model
97
98
```python
99
def export_static_quantized_openvino_model(
100
model: SentenceTransformer,
101
openvino_model_path: str,
102
calibration_dataset: list[str] | None = None
103
) -> None
104
```
105
`{ .api }`
106
107
Export model to statically quantized OpenVINO format for Intel hardware optimization.
108
109
**Parameters**:
110
- `model`: SentenceTransformer model to export
111
- `openvino_model_path`: Output path for OpenVINO model
112
- `calibration_dataset`: Dataset for static quantization calibration
113
114
**Usage Examples**:
115
116
```python
117
from sentence_transformers.backend import (
118
export_optimized_onnx_model,
119
export_dynamic_quantized_onnx_model,
120
export_static_quantized_openvino_model
121
)
122
123
# Load model
124
model = SentenceTransformer('all-MiniLM-L6-v2')
125
126
# Export to optimized ONNX
127
export_optimized_onnx_model(
128
model=model,
129
onnx_model_path="./optimized_model.onnx",
130
opset_version=14,
131
optimization_level="O2"
132
)
133
134
# Export to quantized ONNX for even faster inference
135
export_dynamic_quantized_onnx_model(
136
model=model,
137
onnx_model_path="./quantized_model.onnx",
138
quantization_mode="IntegerOps"
139
)
140
141
# Export to OpenVINO for Intel hardware
142
calibration_texts = ["Sample text " + str(i) for i in range(100)]
143
export_static_quantized_openvino_model(
144
model=model,
145
openvino_model_path="./openvino_model",
146
calibration_dataset=calibration_texts
147
)
148
149
# Use exported ONNX model with ONNX Runtime
150
import onnxruntime as ort
151
import numpy as np
152
153
# Load ONNX model
154
ort_session = ort.InferenceSession("./optimized_model.onnx")
155
156
# Tokenize input
157
inputs = model.tokenizer("Hello world", return_tensors="np", padding=True, truncation=True)
158
159
# Run inference
160
onnx_outputs = ort_session.run(None, {
161
"input_ids": inputs["input_ids"].astype(np.int64),
162
"attention_mask": inputs["attention_mask"].astype(np.int64)
163
})
164
165
print(f"ONNX embedding shape: {onnx_outputs[0].shape}")
166
```
167
168
## Training Utilities
169
170
### mine_hard_negatives
171
172
```python
173
def mine_hard_negatives(
174
model: SentenceTransformer,
175
sentences: list[str],
176
labels: list[int],
177
batch_size: int = 32,
178
top_k: int = 10,
179
margin: float = 0.2
180
) -> list[dict[str, Any]]
181
```
182
`{ .api }`
183
184
Mine hard negative examples for improved contrastive training.
185
186
**Parameters**:
187
- `model`: SentenceTransformer model for encoding
188
- `sentences`: List of sentences to mine from
189
- `labels`: Corresponding labels for sentences
190
- `batch_size`: Batch size for encoding
191
- `top_k`: Number of hard negatives to return per positive
192
- `margin`: Margin for hard negative selection
193
194
**Returns**: List of dictionaries with anchor, positive, and hard negative examples
195
196
**Usage Examples**:
197
198
```python
199
from sentence_transformers import mine_hard_negatives
200
201
# Prepare labeled data
202
sentences = [
203
"Python is a programming language",
204
"Java is used for software development",
205
"Machine learning uses algorithms",
206
"Deep learning is a subset of ML",
207
"Cars are vehicles",
208
"Trucks are large vehicles"
209
]
210
211
labels = [0, 0, 1, 1, 2, 2] # Programming, ML, Vehicles
212
213
# Mine hard negatives
214
hard_negatives = mine_hard_negatives(
215
model=model,
216
sentences=sentences,
217
labels=labels,
218
top_k=2,
219
margin=0.3
220
)
221
222
print("Hard negative examples:")
223
for example in hard_negatives[:3]: # Show first 3
224
print(f"Anchor: {example['anchor']}")
225
print(f"Positive: {example['positive']}")
226
print(f"Hard Negative: {example['negative']}")
227
print(f"Similarity: {example['similarity']:.4f}")
228
print()
229
230
# Use hard negatives in training
231
from sentence_transformers.losses import TripletLoss
232
from datasets import Dataset
233
234
# Convert to training format
235
train_examples = [
236
{
237
"anchor": ex["anchor"],
238
"positive": ex["positive"],
239
"negative": ex["negative"]
240
}
241
for ex in hard_negatives
242
]
243
244
train_dataset = Dataset.from_list(train_examples)
245
triplet_loss = TripletLoss(model)
246
247
# Train with hard negatives (improves model performance)
248
```
249
250
## Similarity Functions
251
252
The `SimilarityFunction` enum provides standardized similarity computation methods:
253
254
```python
255
from sentence_transformers import SimilarityFunction
256
257
class SimilarityFunction(Enum):
258
COSINE = "cosine"
259
DOT_PRODUCT = "dot"
260
DOT = "dot" # Alias for DOT_PRODUCT
261
EUCLIDEAN = "euclidean"
262
MANHATTAN = "manhattan"
263
```
264
`{ .api }`
265
266
**Usage Examples**:
267
268
```python
269
# Use with SentenceTransformer
270
model = SentenceTransformer('all-MiniLM-L6-v2', similarity_fn_name=SimilarityFunction.COSINE)
271
272
# Manual similarity computation
273
import torch
274
import torch.nn.functional as F
275
276
def compute_similarity(embeddings1, embeddings2, similarity_fn):
277
"""Compute similarity between two sets of embeddings."""
278
if similarity_fn == SimilarityFunction.COSINE:
279
return F.cosine_similarity(embeddings1, embeddings2, dim=-1)
280
elif similarity_fn == SimilarityFunction.DOT_PRODUCT:
281
return torch.sum(embeddings1 * embeddings2, dim=-1)
282
elif similarity_fn == SimilarityFunction.EUCLIDEAN:
283
return -torch.cdist(embeddings1, embeddings2, p=2)
284
elif similarity_fn == SimilarityFunction.MANHATTAN:
285
return -torch.cdist(embeddings1, embeddings2, p=1)
286
287
# Example usage
288
emb1 = model.encode(["First sentence"])
289
emb2 = model.encode(["Second sentence"])
290
291
for sim_fn in SimilarityFunction:
292
if sim_fn != SimilarityFunction.DOT: # Skip alias
293
sim_score = compute_similarity(
294
torch.tensor(emb1),
295
torch.tensor(emb2),
296
sim_fn
297
)
298
print(f"{sim_fn.value}: {sim_score.item():.4f}")
299
```
300
301
## Batch Samplers
302
303
### DefaultBatchSampler
304
305
```python
306
class DefaultBatchSampler:
307
def __init__(
308
self,
309
dataset: Dataset,
310
batch_size: int,
311
drop_last: bool = False,
312
generator: torch.Generator | None = None
313
)
314
```
315
`{ .api }`
316
317
Standard batch sampler for single dataset training.
318
319
### MultiDatasetDefaultBatchSampler
320
321
```python
322
class MultiDatasetDefaultBatchSampler:
323
def __init__(
324
self,
325
datasets: dict[str, Dataset],
326
batch_sizes: dict[str, int] | int,
327
sampling_strategy: str = "proportional",
328
generator: torch.Generator | None = None
329
)
330
```
331
`{ .api }`
332
333
Batch sampler for multi-dataset training with different sampling strategies.
334
335
**Parameters**:
336
- `datasets`: Dictionary of dataset names to Dataset objects
337
- `batch_sizes`: Batch size per dataset or single batch size
338
- `sampling_strategy`: "proportional" or "round_robin"
339
- `generator`: Random generator for reproducibility
340
341
**Usage Examples**:
342
343
```python
344
from sentence_transformers import DefaultBatchSampler, MultiDatasetDefaultBatchSampler
345
from datasets import Dataset
346
347
# Single dataset sampler
348
dataset = Dataset.from_list([{"text": f"Example {i}"} for i in range(1000)])
349
sampler = DefaultBatchSampler(
350
dataset=dataset,
351
batch_size=32,
352
drop_last=True
353
)
354
355
# Multi-dataset sampler
356
dataset1 = Dataset.from_list([{"text": f"Dataset1 {i}"} for i in range(500)])
357
dataset2 = Dataset.from_list([{"text": f"Dataset2 {i}"} for i in range(300)])
358
359
multi_sampler = MultiDatasetDefaultBatchSampler(
360
datasets={"ds1": dataset1, "ds2": dataset2},
361
batch_sizes={"ds1": 32, "ds2": 16},
362
sampling_strategy="proportional"
363
)
364
365
# Use in training
366
from sentence_transformers import SentenceTransformerTrainer
367
368
trainer = SentenceTransformerTrainer(
369
model=model,
370
args=args,
371
train_dataset={"ds1": dataset1, "ds2": dataset2},
372
# Sampler is automatically configured based on datasets
373
)
374
```
375
376
## Model Components
377
378
The `sentence_transformers.models` module provides modular components for building custom architectures:
379
380
### Core Components
381
382
```python
383
from sentence_transformers.models import (
384
Transformer, # BERT, RoBERTa, etc.
385
Pooling, # Mean, max, CLS pooling
386
Dense, # Linear transformation
387
Normalize # L2 normalization
388
)
389
```
390
391
**Usage Examples**:
392
393
```python
394
from sentence_transformers import SentenceTransformer
395
from sentence_transformers.models import Transformer, Pooling, Dense, Normalize
396
397
# Build custom model architecture
398
transformer = Transformer('distilbert-base-uncased', max_seq_length=256)
399
pooling = Pooling(
400
word_embedding_dimension=transformer.get_word_embedding_dimension(),
401
pooling_mode='mean'
402
)
403
dense = Dense(
404
in_features=pooling.get_sentence_embedding_dimension(),
405
out_features=256,
406
activation_function='tanh'
407
)
408
normalize = Normalize()
409
410
# Combine components
411
custom_model = SentenceTransformer(modules=[transformer, pooling, dense, normalize])
412
413
# Use custom model
414
embeddings = custom_model.encode(["Custom architecture example"])
415
print(f"Custom embedding shape: {embeddings.shape}")
416
```
417
418
### Additional Components
419
420
```python
421
from sentence_transformers.models import (
422
CNN, # Convolutional layers
423
LSTM, # LSTM layers
424
BoW, # Bag of words
425
WordEmbeddings, # Word embeddings layer
426
WordWeights, # TF-IDF weighting
427
StaticEmbedding, # Static embeddings (Word2Vec, GloVe)
428
WeightedLayerPooling, # Weighted pooling across layers
429
CLIPModel, # CLIP integration
430
Router, # Multi-encoder routing
431
Dropout, # Dropout layer
432
LayerNorm # Layer normalization
433
)
434
```
435
436
## Performance Optimization
437
438
### Memory-Efficient Training
439
440
```python
441
def create_memory_efficient_model(base_model_name, target_dim=256):
442
"""Create memory-efficient model with reduced dimensions."""
443
from sentence_transformers.models import Transformer, Pooling, Dense, Normalize
444
445
transformer = Transformer(base_model_name, max_seq_length=256)
446
pooling = Pooling(transformer.get_word_embedding_dimension(), pooling_mode='mean')
447
448
# Add dimension reduction for memory efficiency
449
dense = Dense(
450
in_features=pooling.get_sentence_embedding_dimension(),
451
out_features=target_dim,
452
activation_function='tanh'
453
)
454
normalize = Normalize()
455
456
return SentenceTransformer(modules=[transformer, pooling, dense, normalize])
457
458
# Create efficient model
459
efficient_model = create_memory_efficient_model('bert-base-uncased', target_dim=128)
460
```
461
462
### Inference Optimization
463
464
```python
465
def optimize_for_inference(model, sentences, batch_size=64):
466
"""Optimized inference with batching and no gradients."""
467
import torch
468
469
model.eval() # Set to evaluation mode
470
embeddings = []
471
472
with torch.no_grad(): # Disable gradient computation
473
for i in range(0, len(sentences), batch_size):
474
batch = sentences[i:i + batch_size]
475
batch_embeddings = model.encode(
476
batch,
477
batch_size=len(batch),
478
show_progress_bar=False,
479
convert_to_tensor=False,
480
normalize_embeddings=True # For cosine similarity
481
)
482
embeddings.extend(batch_embeddings)
483
484
return embeddings
485
486
# Optimized inference
487
sentences = [f"Sentence {i}" for i in range(1000)]
488
fast_embeddings = optimize_for_inference(model, sentences)
489
```
490
491
## Debugging and Logging
492
493
### LoggingHandler
494
495
```python
496
from sentence_transformers import LoggingHandler
497
import logging
498
499
class LoggingHandler(logging.Handler):
500
def emit(self, record: logging.LogRecord) -> None:
501
"""Emit log record without interfering with tqdm progress bars."""
502
pass
503
```
504
`{ .api }`
505
506
Custom logging handler that works seamlessly with tqdm progress bars.
507
508
**Usage Examples**:
509
510
```python
511
import logging
512
from sentence_transformers import LoggingHandler
513
514
# Set up logging
515
logging.basicConfig(
516
format='%(asctime)s - %(message)s',
517
datefmt='%Y-%m-%d %H:%M:%S',
518
level=logging.INFO,
519
handlers=[LoggingHandler()]
520
)
521
522
logger = logging.getLogger(__name__)
523
524
# Use with training
525
def train_with_logging(model, trainer):
526
logger.info("Starting training...")
527
528
trainer.train()
529
530
logger.info("Training completed!")
531
logger.info(f"Model saved to {trainer.args.output_dir}")
532
```
533
534
## Data Processing Utilities
535
536
### Legacy Dataset Classes (Deprecated)
537
538
```python
539
# Note: These are deprecated in favor of HuggingFace Datasets
540
from sentence_transformers.datasets import SentencesDataset, ParallelSentencesDataset
541
from sentence_transformers.readers import InputExample
542
```
543
544
### Modern Data Processing
545
546
```python
547
def create_training_dataset(examples, format_type="triplet"):
548
"""Create training dataset in various formats."""
549
from datasets import Dataset
550
551
if format_type == "triplet":
552
# Format: anchor, positive, negative
553
formatted_examples = [
554
{
555
"anchor": ex["anchor"],
556
"positive": ex["positive"],
557
"negative": ex["negative"]
558
}
559
for ex in examples
560
]
561
elif format_type == "pairs":
562
# Format: sentence1, sentence2, label
563
formatted_examples = [
564
{
565
"sentence1": ex["sentence1"],
566
"sentence2": ex["sentence2"],
567
"label": ex["label"]
568
}
569
for ex in examples
570
]
571
572
return Dataset.from_list(formatted_examples)
573
574
# Example usage
575
examples = [
576
{
577
"anchor": "Python programming",
578
"positive": "Coding in Python",
579
"negative": "Java development"
580
}
581
]
582
583
dataset = create_training_dataset(examples, format_type="triplet")
584
```
585
586
## Utility Functions for Analysis
587
588
```python
589
def analyze_model_performance(model, test_sentences):
590
"""Analyze model performance characteristics."""
591
import time
592
import numpy as np
593
594
# Encoding speed test
595
start_time = time.time()
596
embeddings = model.encode(test_sentences, batch_size=32)
597
encoding_time = time.time() - start_time
598
599
# Embedding analysis
600
embedding_dim = embeddings.shape[1]
601
embedding_norms = np.linalg.norm(embeddings, axis=1)
602
603
# Similarity analysis
604
similarities = np.dot(embeddings, embeddings.T)
605
606
results = {
607
"encoding_speed": len(test_sentences) / encoding_time,
608
"embedding_dimension": embedding_dim,
609
"avg_embedding_norm": np.mean(embedding_norms),
610
"std_embedding_norm": np.std(embedding_norms),
611
"avg_similarity": np.mean(similarities[np.triu_indices_from(similarities, k=1)]),
612
"similarity_std": np.std(similarities[np.triu_indices_from(similarities, k=1)])
613
}
614
615
return results
616
617
# Analyze model
618
test_texts = ["Sample sentence " + str(i) for i in range(100)]
619
performance = analyze_model_performance(model, test_texts)
620
621
for metric, value in performance.items():
622
print(f"{metric}: {value:.4f}")
623
```
624
625
## Logging and Debugging
626
627
### LoggingHandler
628
629
Custom logging handler that integrates with tqdm progress bars for clean output during training and inference.
630
631
```python { .api }
632
class LoggingHandler(logging.Handler):
633
def __init__(self, level=logging.NOTSET) -> None: ...
634
def emit(self, record) -> None: ...
635
```
636
637
**Usage Example**:
638
639
```python
640
import logging
641
from sentence_transformers import LoggingHandler
642
643
# Set up logging with tqdm-compatible handler
644
logger = logging.getLogger("sentence_transformers")
645
logger.setLevel(logging.INFO)
646
logger.addHandler(LoggingHandler())
647
648
# Now logging output won't interfere with progress bars
649
logger.info("Training started")
650
```
651
652
## Batch Sampling (Modern Training)
653
654
### DefaultBatchSampler
655
656
Default batch sampler used in the SentenceTransformer library, equivalent to PyTorch's BatchSampler with epoch support.
657
658
```python { .api }
659
class DefaultBatchSampler(BatchSampler):
660
def __init__(
661
self,
662
sampler,
663
batch_size: int,
664
drop_last: bool = False
665
) -> None: ...
666
667
def set_epoch(self, epoch: int) -> None: ...
668
```
669
670
### MultiDatasetDefaultBatchSampler
671
672
Batch sampler for training on multiple datasets simultaneously with balanced sampling.
673
674
```python { .api }
675
class MultiDatasetDefaultBatchSampler(BatchSampler):
676
def __init__(
677
self,
678
samplers,
679
batch_sizes: list[int],
680
drop_last: bool = False
681
) -> None: ...
682
683
def set_epoch(self, epoch: int) -> None: ...
684
```
685
686
## Legacy Components (Deprecated)
687
688
These components are included for backwards compatibility but are deprecated in favor of the modern training framework.
689
690
### Legacy Dataset Classes
691
692
```python { .api }
693
class SentencesDataset:
694
"""Deprecated: Use SentenceTransformerTrainer instead"""
695
def __init__(self, examples: list, model) -> None: ...
696
697
class ParallelSentencesDataset:
698
"""Deprecated: Use SentenceTransformerTrainer instead"""
699
def __init__(self, student_model, teacher_model) -> None: ...
700
```
701
702
### Legacy Input Format
703
704
```python { .api }
705
class InputExample:
706
"""Deprecated: Use standard data formats instead"""
707
def __init__(
708
self,
709
guid: str = "",
710
texts: list[str] = None,
711
label: int | float = 0
712
) -> None: ...
713
```
714
715
**Migration Note**: These legacy components exist for compatibility with the old `model.fit()` training approach. For new projects, use the modern `SentenceTransformerTrainer` class instead.
716
717
## Best Practices
718
719
1. **Quantization**: Use float16 for balanced performance and quality
720
2. **Export**: Export to ONNX for deployment and cross-platform compatibility
721
3. **Hard Negatives**: Use hard negative mining to improve contrastive learning
722
4. **Batch Processing**: Process in batches for memory efficiency
723
5. **Caching**: Cache embeddings for repeated use
724
6. **Monitoring**: Use LoggingHandler for training monitoring
725
7. **Profiling**: Profile inference speed and memory usage for optimization
726
8. **Testing**: Test exported models match original model outputs