or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/pypi-sentence-transformers

Embeddings, Retrieval, and Reranking framework for computing dense, sparse, and cross-encoder embeddings using state-of-the-art transformer models

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/sentence-transformers@5.1.x

To install, run

npx @tessl/cli install tessl/pypi-sentence-transformers@5.1.0

0

# sentence-transformers

1

2

The sentence-transformers package provides state-of-the-art sentence, text and image embeddings using transformer models. It supports training and fine-tuning of custom embedding models, offering a comprehensive toolkit for semantic search, clustering, and similarity tasks.

3

4

## Package Information

5

6

- **Version**: 5.1.0

7

- **Organization**: sentence-transformers

8

- **License**: Apache 2.0

9

- **Homepage**: https://www.sbert.net/

10

- **Repository**: https://github.com/UKPLab/sentence-transformers

11

12

## Core Imports

13

14

```python

15

# Main transformer classes

16

from sentence_transformers import SentenceTransformer, CrossEncoder, SparseEncoder

17

18

# Training components (top-level imports)

19

from sentence_transformers import (

20

SentenceTransformerTrainer,

21

SentenceTransformerTrainingArguments,

22

CrossEncoderTrainer,

23

CrossEncoderTrainingArguments,

24

SparseEncoderTrainer,

25

SparseEncoderTrainingArguments

26

)

27

28

# Additional components (top-level imports from __all__)

29

from sentence_transformers import (

30

LoggingHandler,

31

SentencesDataset,

32

ParallelSentencesDataset,

33

InputExample,

34

DefaultBatchSampler,

35

MultiDatasetDefaultBatchSampler

36

)

37

38

# Utility functions (top-level imports)

39

from sentence_transformers import (

40

SimilarityFunction,

41

quantize_embeddings,

42

export_optimized_onnx_model,

43

export_dynamic_quantized_onnx_model,

44

export_static_quantized_openvino_model

45

)

46

from sentence_transformers.util import mine_hard_negatives

47

48

# Loss functions (module-level imports)

49

from sentence_transformers.losses import (

50

CosineSimilarityLoss,

51

MultipleNegativesRankingLoss,

52

TripletLoss,

53

MatryoshkaLoss

54

)

55

56

# Model components (module-level imports)

57

from sentence_transformers.models import Transformer, Pooling, Dense, Normalize

58

59

# Evaluation (module-level imports)

60

from sentence_transformers.evaluation import (

61

EmbeddingSimilarityEvaluator,

62

InformationRetrievalEvaluator,

63

BinaryClassificationEvaluator

64

)

65

```

66

67

## Basic Usage

68

69

### Encoding Sentences

70

71

```python

72

from sentence_transformers import SentenceTransformer

73

74

# Load a pre-trained model

75

model = SentenceTransformer('all-MiniLM-L6-v2')

76

77

# Encode sentences

78

sentences = ['This is an example sentence', 'Each sentence is converted']

79

embeddings = model.encode(sentences)

80

81

# Calculate similarity

82

similarity = model.similarity(embeddings[0], embeddings[1])

83

print(f"Similarity: {similarity}")

84

```

85

86

### Cross-Encoder for Reranking

87

88

```python

89

from sentence_transformers import CrossEncoder

90

91

# Load cross-encoder model

92

cross_encoder = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')

93

94

# Score sentence pairs

95

pairs = [('How many people live in Berlin?', 'Berlin has a population of 3,520,031')]

96

scores = cross_encoder.predict(pairs)

97

print(f"Relevance score: {scores[0]}")

98

```

99

100

## Architecture

101

102

The sentence-transformers package is built around several core concepts:

103

104

### Transformer Types

105

- **Bi-Encoder (SentenceTransformer)**: Encodes sentences independently into dense vectors

106

- **Cross-Encoder**: Jointly processes sentence pairs for classification/ranking tasks

107

- **Sparse Encoder**: Produces sparse embeddings for efficient retrieval

108

109

### Model Components

110

Models are composed of modular components that can be stacked:

111

- **Transformer**: Core language model (BERT, RoBERTa, etc.)

112

- **Pooling**: Strategy for converting token embeddings to sentence embeddings

113

- **Dense**: Linear transformation layers

114

- **Normalize**: L2 normalization of embeddings

115

116

### Training Framework

117

Modern training uses the `SentenceTransformerTrainer` class with:

118

- Multiple loss functions for different tasks

119

- Multi-dataset training support

120

- Integration with HuggingFace Trainer

121

- Flexible batch sampling strategies

122

123

## Capabilities

124

125

### Core Transformers

126

Encode text, documents, and queries into dense vector representations using pre-trained or custom models. Supports batch processing, multi-GPU inference, and various similarity functions.

127

128

**Key APIs**: `SentenceTransformer.encode()` `{ .api }`

129

130

**[Learn more about Core Transformers →](./core-transformers.md)**

131

132

### Cross-Encoders

133

Joint encoding of sentence pairs for tasks requiring direct comparison like reranking, textual entailment, and semantic textual similarity. Typically more accurate than bi-encoders for pairwise tasks.

134

135

**Key APIs**: `CrossEncoder.predict()` `{ .api }`

136

137

**[Learn more about Cross-Encoders →](./cross-encoder.md)**

138

139

### Sparse Encoders

140

Generate sparse embeddings that combine the efficiency of traditional sparse retrieval with neural approaches. Ideal for large-scale retrieval systems where storage and computation efficiency are critical.

141

142

**Key APIs**: `SparseEncoder.encode()` `{ .api }`

143

144

**[Learn more about Sparse Encoders →](./sparse-encoder.md)**

145

146

### Training Framework

147

Comprehensive training system supporting supervised fine-tuning, contrastive learning, and multi-task training. Built on HuggingFace Trainer with specialized components for embedding models.

148

149

**Key APIs**: `SentenceTransformerTrainer.train()` `{ .api }`

150

151

**[Learn more about Training →](./training.md)**

152

153

### Loss Functions

154

Extensive collection of loss functions for different learning objectives including contrastive learning, triplet loss, multiple negatives ranking, and specialized losses for efficient training.

155

156

**Key APIs**: `MultipleNegativesRankingLoss()` `{ .api }`

157

158

**[Learn more about Loss Functions →](./loss-functions.md)**

159

160

### Evaluation Suite

161

Comprehensive evaluation framework for measuring model performance across various tasks including semantic similarity, information retrieval, classification, and clustering.

162

163

**Key APIs**: `EmbeddingSimilarityEvaluator()` `{ .api }`

164

165

**[Learn more about Evaluation →](./evaluation.md)**

166

167

### Utilities & Export

168

Tools for model optimization, quantization, export to different formats (ONNX, OpenVINO), similarity computation, and hard negative mining for improved training.

169

170

**Key APIs**: `quantize_embeddings()` `{ .api }`

171

172

**[Learn more about Utilities →](./utilities.md)**

173

174

## Model Hub Integration

175

176

The package integrates seamlessly with the HuggingFace Model Hub, allowing you to:

177

- Load thousands of pre-trained models

178

- Save and share custom models

179

- Automatic model card generation

180

- Version control and collaboration

181

182

## Performance Considerations

183

184

- Use batch processing with `batch_size` parameter for efficiency

185

- Enable `show_progress_bar=False` for production use

186

- Consider model quantization for deployment

187

- Use multi-process encoding for large datasets

188

- Choose appropriate pooling strategies based on your task