or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

core-transformers.mdcross-encoder.mdevaluation.mdindex.mdloss-functions.mdsparse-encoder.mdtraining.mdutilities.md

utilities.mddocs/

0

# Utilities

1

2

The sentence-transformers package provides various utility functions for model optimization, quantization, export to different formats, similarity computation, and training enhancements.

3

4

## Model Quantization

5

6

### quantize_embeddings

7

8

```python

9

def quantize_embeddings(

10

embeddings: Tensor | np.ndarray,

11

precision: Literal["float32", "int8", "uint8", "binary", "ubinary"],

12

ranges: np.ndarray | None = None,

13

calibration_embeddings: np.ndarray | None = None

14

) -> np.ndarray

15

```

16

`{ .api }`

17

18

Quantize embeddings to reduce memory usage and improve inference speed.

19

20

**Parameters**:

21

- `embeddings`: Unquantized (e.g. float) embeddings to quantize to a given precision

22

- `precision`: The precision to convert to ("float32", "int8", "uint8", "binary", "ubinary")

23

- `ranges`: Ranges for quantization of embeddings. Used for int8 quantization, where the ranges refer to the minimum and maximum values for each dimension. 2D array with shape (2, embedding_dim)

24

- `calibration_embeddings`: Embeddings used for calibration during quantization. Used for int8 quantization to compute ranges

25

26

**Returns**: Quantized embeddings with the specified precision

27

28

**Usage Examples**:

29

30

```python

31

import numpy as np

32

from sentence_transformers import quantize_embeddings, SentenceTransformer

33

34

# Generate sample embeddings

35

model = SentenceTransformer('all-MiniLM-L6-v2')

36

sentences = ["Hello world", "How are you?", "Machine learning is great"]

37

embeddings = model.encode(sentences)

38

39

# Float32 quantization (no change, returns same embeddings)

40

quantized_embs = quantize_embeddings(embeddings, precision="float32")

41

print(f"Original size: {embeddings.nbytes} bytes")

42

print(f"Quantized size: {quantized_embs.nbytes} bytes")

43

44

# Int8 quantization with calibration

45

calibration_data = model.encode(["Sample sentence " + str(i) for i in range(100)])

46

quantized_int8 = quantize_embeddings(

47

embeddings,

48

precision="int8",

49

calibration_embeddings=calibration_data

50

)

51

52

# Binary quantization (extreme compression)

53

binary_embs = quantize_embeddings(embeddings, precision="binary")

54

```

55

56

## Model Export

57

58

### export_optimized_onnx_model

59

60

```python

61

def export_optimized_onnx_model(

62

model: SentenceTransformer,

63

onnx_model_path: str,

64

opset_version: int = 14,

65

optimization_level: str = "O2"

66

) -> None

67

```

68

`{ .api }`

69

70

Export SentenceTransformer model to optimized ONNX format for deployment.

71

72

**Parameters**:

73

- `model`: SentenceTransformer model to export

74

- `onnx_model_path`: Output path for ONNX model

75

- `opset_version`: ONNX opset version to use

76

- `optimization_level`: Optimization level ("O1", "O2", "O3")

77

78

### export_dynamic_quantized_onnx_model

79

80

```python

81

def export_dynamic_quantized_onnx_model(

82

model: SentenceTransformer,

83

onnx_model_path: str,

84

quantization_mode: str = "IntegerOps"

85

) -> None

86

```

87

`{ .api }`

88

89

Export model to dynamically quantized ONNX format.

90

91

**Parameters**:

92

- `model`: SentenceTransformer model to export

93

- `onnx_model_path`: Output path for quantized ONNX model

94

- `quantization_mode`: Quantization mode ("IntegerOps", "QLinearOps")

95

96

### export_static_quantized_openvino_model

97

98

```python

99

def export_static_quantized_openvino_model(

100

model: SentenceTransformer,

101

openvino_model_path: str,

102

calibration_dataset: list[str] | None = None

103

) -> None

104

```

105

`{ .api }`

106

107

Export model to statically quantized OpenVINO format for Intel hardware optimization.

108

109

**Parameters**:

110

- `model`: SentenceTransformer model to export

111

- `openvino_model_path`: Output path for OpenVINO model

112

- `calibration_dataset`: Dataset for static quantization calibration

113

114

**Usage Examples**:

115

116

```python

117

from sentence_transformers.backend import (

118

export_optimized_onnx_model,

119

export_dynamic_quantized_onnx_model,

120

export_static_quantized_openvino_model

121

)

122

123

# Load model

124

model = SentenceTransformer('all-MiniLM-L6-v2')

125

126

# Export to optimized ONNX

127

export_optimized_onnx_model(

128

model=model,

129

onnx_model_path="./optimized_model.onnx",

130

opset_version=14,

131

optimization_level="O2"

132

)

133

134

# Export to quantized ONNX for even faster inference

135

export_dynamic_quantized_onnx_model(

136

model=model,

137

onnx_model_path="./quantized_model.onnx",

138

quantization_mode="IntegerOps"

139

)

140

141

# Export to OpenVINO for Intel hardware

142

calibration_texts = ["Sample text " + str(i) for i in range(100)]

143

export_static_quantized_openvino_model(

144

model=model,

145

openvino_model_path="./openvino_model",

146

calibration_dataset=calibration_texts

147

)

148

149

# Use exported ONNX model with ONNX Runtime

150

import onnxruntime as ort

151

import numpy as np

152

153

# Load ONNX model

154

ort_session = ort.InferenceSession("./optimized_model.onnx")

155

156

# Tokenize input

157

inputs = model.tokenizer("Hello world", return_tensors="np", padding=True, truncation=True)

158

159

# Run inference

160

onnx_outputs = ort_session.run(None, {

161

"input_ids": inputs["input_ids"].astype(np.int64),

162

"attention_mask": inputs["attention_mask"].astype(np.int64)

163

})

164

165

print(f"ONNX embedding shape: {onnx_outputs[0].shape}")

166

```

167

168

## Training Utilities

169

170

### mine_hard_negatives

171

172

```python

173

def mine_hard_negatives(

174

model: SentenceTransformer,

175

sentences: list[str],

176

labels: list[int],

177

batch_size: int = 32,

178

top_k: int = 10,

179

margin: float = 0.2

180

) -> list[dict[str, Any]]

181

```

182

`{ .api }`

183

184

Mine hard negative examples for improved contrastive training.

185

186

**Parameters**:

187

- `model`: SentenceTransformer model for encoding

188

- `sentences`: List of sentences to mine from

189

- `labels`: Corresponding labels for sentences

190

- `batch_size`: Batch size for encoding

191

- `top_k`: Number of hard negatives to return per positive

192

- `margin`: Margin for hard negative selection

193

194

**Returns**: List of dictionaries with anchor, positive, and hard negative examples

195

196

**Usage Examples**:

197

198

```python

199

from sentence_transformers import mine_hard_negatives

200

201

# Prepare labeled data

202

sentences = [

203

"Python is a programming language",

204

"Java is used for software development",

205

"Machine learning uses algorithms",

206

"Deep learning is a subset of ML",

207

"Cars are vehicles",

208

"Trucks are large vehicles"

209

]

210

211

labels = [0, 0, 1, 1, 2, 2] # Programming, ML, Vehicles

212

213

# Mine hard negatives

214

hard_negatives = mine_hard_negatives(

215

model=model,

216

sentences=sentences,

217

labels=labels,

218

top_k=2,

219

margin=0.3

220

)

221

222

print("Hard negative examples:")

223

for example in hard_negatives[:3]: # Show first 3

224

print(f"Anchor: {example['anchor']}")

225

print(f"Positive: {example['positive']}")

226

print(f"Hard Negative: {example['negative']}")

227

print(f"Similarity: {example['similarity']:.4f}")

228

print()

229

230

# Use hard negatives in training

231

from sentence_transformers.losses import TripletLoss

232

from datasets import Dataset

233

234

# Convert to training format

235

train_examples = [

236

{

237

"anchor": ex["anchor"],

238

"positive": ex["positive"],

239

"negative": ex["negative"]

240

}

241

for ex in hard_negatives

242

]

243

244

train_dataset = Dataset.from_list(train_examples)

245

triplet_loss = TripletLoss(model)

246

247

# Train with hard negatives (improves model performance)

248

```

249

250

## Similarity Functions

251

252

The `SimilarityFunction` enum provides standardized similarity computation methods:

253

254

```python

255

from sentence_transformers import SimilarityFunction

256

257

class SimilarityFunction(Enum):

258

COSINE = "cosine"

259

DOT_PRODUCT = "dot"

260

DOT = "dot" # Alias for DOT_PRODUCT

261

EUCLIDEAN = "euclidean"

262

MANHATTAN = "manhattan"

263

```

264

`{ .api }`

265

266

**Usage Examples**:

267

268

```python

269

# Use with SentenceTransformer

270

model = SentenceTransformer('all-MiniLM-L6-v2', similarity_fn_name=SimilarityFunction.COSINE)

271

272

# Manual similarity computation

273

import torch

274

import torch.nn.functional as F

275

276

def compute_similarity(embeddings1, embeddings2, similarity_fn):

277

"""Compute similarity between two sets of embeddings."""

278

if similarity_fn == SimilarityFunction.COSINE:

279

return F.cosine_similarity(embeddings1, embeddings2, dim=-1)

280

elif similarity_fn == SimilarityFunction.DOT_PRODUCT:

281

return torch.sum(embeddings1 * embeddings2, dim=-1)

282

elif similarity_fn == SimilarityFunction.EUCLIDEAN:

283

return -torch.cdist(embeddings1, embeddings2, p=2)

284

elif similarity_fn == SimilarityFunction.MANHATTAN:

285

return -torch.cdist(embeddings1, embeddings2, p=1)

286

287

# Example usage

288

emb1 = model.encode(["First sentence"])

289

emb2 = model.encode(["Second sentence"])

290

291

for sim_fn in SimilarityFunction:

292

if sim_fn != SimilarityFunction.DOT: # Skip alias

293

sim_score = compute_similarity(

294

torch.tensor(emb1),

295

torch.tensor(emb2),

296

sim_fn

297

)

298

print(f"{sim_fn.value}: {sim_score.item():.4f}")

299

```

300

301

## Batch Samplers

302

303

### DefaultBatchSampler

304

305

```python

306

class DefaultBatchSampler:

307

def __init__(

308

self,

309

dataset: Dataset,

310

batch_size: int,

311

drop_last: bool = False,

312

generator: torch.Generator | None = None

313

)

314

```

315

`{ .api }`

316

317

Standard batch sampler for single dataset training.

318

319

### MultiDatasetDefaultBatchSampler

320

321

```python

322

class MultiDatasetDefaultBatchSampler:

323

def __init__(

324

self,

325

datasets: dict[str, Dataset],

326

batch_sizes: dict[str, int] | int,

327

sampling_strategy: str = "proportional",

328

generator: torch.Generator | None = None

329

)

330

```

331

`{ .api }`

332

333

Batch sampler for multi-dataset training with different sampling strategies.

334

335

**Parameters**:

336

- `datasets`: Dictionary of dataset names to Dataset objects

337

- `batch_sizes`: Batch size per dataset or single batch size

338

- `sampling_strategy`: "proportional" or "round_robin"

339

- `generator`: Random generator for reproducibility

340

341

**Usage Examples**:

342

343

```python

344

from sentence_transformers import DefaultBatchSampler, MultiDatasetDefaultBatchSampler

345

from datasets import Dataset

346

347

# Single dataset sampler

348

dataset = Dataset.from_list([{"text": f"Example {i}"} for i in range(1000)])

349

sampler = DefaultBatchSampler(

350

dataset=dataset,

351

batch_size=32,

352

drop_last=True

353

)

354

355

# Multi-dataset sampler

356

dataset1 = Dataset.from_list([{"text": f"Dataset1 {i}"} for i in range(500)])

357

dataset2 = Dataset.from_list([{"text": f"Dataset2 {i}"} for i in range(300)])

358

359

multi_sampler = MultiDatasetDefaultBatchSampler(

360

datasets={"ds1": dataset1, "ds2": dataset2},

361

batch_sizes={"ds1": 32, "ds2": 16},

362

sampling_strategy="proportional"

363

)

364

365

# Use in training

366

from sentence_transformers import SentenceTransformerTrainer

367

368

trainer = SentenceTransformerTrainer(

369

model=model,

370

args=args,

371

train_dataset={"ds1": dataset1, "ds2": dataset2},

372

# Sampler is automatically configured based on datasets

373

)

374

```

375

376

## Model Components

377

378

The `sentence_transformers.models` module provides modular components for building custom architectures:

379

380

### Core Components

381

382

```python

383

from sentence_transformers.models import (

384

Transformer, # BERT, RoBERTa, etc.

385

Pooling, # Mean, max, CLS pooling

386

Dense, # Linear transformation

387

Normalize # L2 normalization

388

)

389

```

390

391

**Usage Examples**:

392

393

```python

394

from sentence_transformers import SentenceTransformer

395

from sentence_transformers.models import Transformer, Pooling, Dense, Normalize

396

397

# Build custom model architecture

398

transformer = Transformer('distilbert-base-uncased', max_seq_length=256)

399

pooling = Pooling(

400

word_embedding_dimension=transformer.get_word_embedding_dimension(),

401

pooling_mode='mean'

402

)

403

dense = Dense(

404

in_features=pooling.get_sentence_embedding_dimension(),

405

out_features=256,

406

activation_function='tanh'

407

)

408

normalize = Normalize()

409

410

# Combine components

411

custom_model = SentenceTransformer(modules=[transformer, pooling, dense, normalize])

412

413

# Use custom model

414

embeddings = custom_model.encode(["Custom architecture example"])

415

print(f"Custom embedding shape: {embeddings.shape}")

416

```

417

418

### Additional Components

419

420

```python

421

from sentence_transformers.models import (

422

CNN, # Convolutional layers

423

LSTM, # LSTM layers

424

BoW, # Bag of words

425

WordEmbeddings, # Word embeddings layer

426

WordWeights, # TF-IDF weighting

427

StaticEmbedding, # Static embeddings (Word2Vec, GloVe)

428

WeightedLayerPooling, # Weighted pooling across layers

429

CLIPModel, # CLIP integration

430

Router, # Multi-encoder routing

431

Dropout, # Dropout layer

432

LayerNorm # Layer normalization

433

)

434

```

435

436

## Performance Optimization

437

438

### Memory-Efficient Training

439

440

```python

441

def create_memory_efficient_model(base_model_name, target_dim=256):

442

"""Create memory-efficient model with reduced dimensions."""

443

from sentence_transformers.models import Transformer, Pooling, Dense, Normalize

444

445

transformer = Transformer(base_model_name, max_seq_length=256)

446

pooling = Pooling(transformer.get_word_embedding_dimension(), pooling_mode='mean')

447

448

# Add dimension reduction for memory efficiency

449

dense = Dense(

450

in_features=pooling.get_sentence_embedding_dimension(),

451

out_features=target_dim,

452

activation_function='tanh'

453

)

454

normalize = Normalize()

455

456

return SentenceTransformer(modules=[transformer, pooling, dense, normalize])

457

458

# Create efficient model

459

efficient_model = create_memory_efficient_model('bert-base-uncased', target_dim=128)

460

```

461

462

### Inference Optimization

463

464

```python

465

def optimize_for_inference(model, sentences, batch_size=64):

466

"""Optimized inference with batching and no gradients."""

467

import torch

468

469

model.eval() # Set to evaluation mode

470

embeddings = []

471

472

with torch.no_grad(): # Disable gradient computation

473

for i in range(0, len(sentences), batch_size):

474

batch = sentences[i:i + batch_size]

475

batch_embeddings = model.encode(

476

batch,

477

batch_size=len(batch),

478

show_progress_bar=False,

479

convert_to_tensor=False,

480

normalize_embeddings=True # For cosine similarity

481

)

482

embeddings.extend(batch_embeddings)

483

484

return embeddings

485

486

# Optimized inference

487

sentences = [f"Sentence {i}" for i in range(1000)]

488

fast_embeddings = optimize_for_inference(model, sentences)

489

```

490

491

## Debugging and Logging

492

493

### LoggingHandler

494

495

```python

496

from sentence_transformers import LoggingHandler

497

import logging

498

499

class LoggingHandler(logging.Handler):

500

def emit(self, record: logging.LogRecord) -> None:

501

"""Emit log record without interfering with tqdm progress bars."""

502

pass

503

```

504

`{ .api }`

505

506

Custom logging handler that works seamlessly with tqdm progress bars.

507

508

**Usage Examples**:

509

510

```python

511

import logging

512

from sentence_transformers import LoggingHandler

513

514

# Set up logging

515

logging.basicConfig(

516

format='%(asctime)s - %(message)s',

517

datefmt='%Y-%m-%d %H:%M:%S',

518

level=logging.INFO,

519

handlers=[LoggingHandler()]

520

)

521

522

logger = logging.getLogger(__name__)

523

524

# Use with training

525

def train_with_logging(model, trainer):

526

logger.info("Starting training...")

527

528

trainer.train()

529

530

logger.info("Training completed!")

531

logger.info(f"Model saved to {trainer.args.output_dir}")

532

```

533

534

## Data Processing Utilities

535

536

### Legacy Dataset Classes (Deprecated)

537

538

```python

539

# Note: These are deprecated in favor of HuggingFace Datasets

540

from sentence_transformers.datasets import SentencesDataset, ParallelSentencesDataset

541

from sentence_transformers.readers import InputExample

542

```

543

544

### Modern Data Processing

545

546

```python

547

def create_training_dataset(examples, format_type="triplet"):

548

"""Create training dataset in various formats."""

549

from datasets import Dataset

550

551

if format_type == "triplet":

552

# Format: anchor, positive, negative

553

formatted_examples = [

554

{

555

"anchor": ex["anchor"],

556

"positive": ex["positive"],

557

"negative": ex["negative"]

558

}

559

for ex in examples

560

]

561

elif format_type == "pairs":

562

# Format: sentence1, sentence2, label

563

formatted_examples = [

564

{

565

"sentence1": ex["sentence1"],

566

"sentence2": ex["sentence2"],

567

"label": ex["label"]

568

}

569

for ex in examples

570

]

571

572

return Dataset.from_list(formatted_examples)

573

574

# Example usage

575

examples = [

576

{

577

"anchor": "Python programming",

578

"positive": "Coding in Python",

579

"negative": "Java development"

580

}

581

]

582

583

dataset = create_training_dataset(examples, format_type="triplet")

584

```

585

586

## Utility Functions for Analysis

587

588

```python

589

def analyze_model_performance(model, test_sentences):

590

"""Analyze model performance characteristics."""

591

import time

592

import numpy as np

593

594

# Encoding speed test

595

start_time = time.time()

596

embeddings = model.encode(test_sentences, batch_size=32)

597

encoding_time = time.time() - start_time

598

599

# Embedding analysis

600

embedding_dim = embeddings.shape[1]

601

embedding_norms = np.linalg.norm(embeddings, axis=1)

602

603

# Similarity analysis

604

similarities = np.dot(embeddings, embeddings.T)

605

606

results = {

607

"encoding_speed": len(test_sentences) / encoding_time,

608

"embedding_dimension": embedding_dim,

609

"avg_embedding_norm": np.mean(embedding_norms),

610

"std_embedding_norm": np.std(embedding_norms),

611

"avg_similarity": np.mean(similarities[np.triu_indices_from(similarities, k=1)]),

612

"similarity_std": np.std(similarities[np.triu_indices_from(similarities, k=1)])

613

}

614

615

return results

616

617

# Analyze model

618

test_texts = ["Sample sentence " + str(i) for i in range(100)]

619

performance = analyze_model_performance(model, test_texts)

620

621

for metric, value in performance.items():

622

print(f"{metric}: {value:.4f}")

623

```

624

625

## Logging and Debugging

626

627

### LoggingHandler

628

629

Custom logging handler that integrates with tqdm progress bars for clean output during training and inference.

630

631

```python { .api }

632

class LoggingHandler(logging.Handler):

633

def __init__(self, level=logging.NOTSET) -> None: ...

634

def emit(self, record) -> None: ...

635

```

636

637

**Usage Example**:

638

639

```python

640

import logging

641

from sentence_transformers import LoggingHandler

642

643

# Set up logging with tqdm-compatible handler

644

logger = logging.getLogger("sentence_transformers")

645

logger.setLevel(logging.INFO)

646

logger.addHandler(LoggingHandler())

647

648

# Now logging output won't interfere with progress bars

649

logger.info("Training started")

650

```

651

652

## Batch Sampling (Modern Training)

653

654

### DefaultBatchSampler

655

656

Default batch sampler used in the SentenceTransformer library, equivalent to PyTorch's BatchSampler with epoch support.

657

658

```python { .api }

659

class DefaultBatchSampler(BatchSampler):

660

def __init__(

661

self,

662

sampler,

663

batch_size: int,

664

drop_last: bool = False

665

) -> None: ...

666

667

def set_epoch(self, epoch: int) -> None: ...

668

```

669

670

### MultiDatasetDefaultBatchSampler

671

672

Batch sampler for training on multiple datasets simultaneously with balanced sampling.

673

674

```python { .api }

675

class MultiDatasetDefaultBatchSampler(BatchSampler):

676

def __init__(

677

self,

678

samplers,

679

batch_sizes: list[int],

680

drop_last: bool = False

681

) -> None: ...

682

683

def set_epoch(self, epoch: int) -> None: ...

684

```

685

686

## Legacy Components (Deprecated)

687

688

These components are included for backwards compatibility but are deprecated in favor of the modern training framework.

689

690

### Legacy Dataset Classes

691

692

```python { .api }

693

class SentencesDataset:

694

"""Deprecated: Use SentenceTransformerTrainer instead"""

695

def __init__(self, examples: list, model) -> None: ...

696

697

class ParallelSentencesDataset:

698

"""Deprecated: Use SentenceTransformerTrainer instead"""

699

def __init__(self, student_model, teacher_model) -> None: ...

700

```

701

702

### Legacy Input Format

703

704

```python { .api }

705

class InputExample:

706

"""Deprecated: Use standard data formats instead"""

707

def __init__(

708

self,

709

guid: str = "",

710

texts: list[str] = None,

711

label: int | float = 0

712

) -> None: ...

713

```

714

715

**Migration Note**: These legacy components exist for compatibility with the old `model.fit()` training approach. For new projects, use the modern `SentenceTransformerTrainer` class instead.

716

717

## Best Practices

718

719

1. **Quantization**: Use float16 for balanced performance and quality

720

2. **Export**: Export to ONNX for deployment and cross-platform compatibility

721

3. **Hard Negatives**: Use hard negative mining to improve contrastive learning

722

4. **Batch Processing**: Process in batches for memory efficiency

723

5. **Caching**: Cache embeddings for repeated use

724

6. **Monitoring**: Use LoggingHandler for training monitoring

725

7. **Profiling**: Profile inference speed and memory usage for optimization

726

8. **Testing**: Test exported models match original model outputs