or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

core-transformers.mdcross-encoder.mdevaluation.mdindex.mdloss-functions.mdsparse-encoder.mdtraining.mdutilities.md

core-transformers.mddocs/

0

# Core Transformers

1

2

The `SentenceTransformer` class is the main interface for loading, using, and customizing bi-encoder models that map sentences and text to dense vector embeddings.

3

4

## SentenceTransformer Class

5

6

### Constructor

7

8

```python

9

SentenceTransformer(

10

model_name_or_path: str | None = None,

11

modules: Iterable[nn.Module] | None = None,

12

device: str | None = None,

13

prompts: dict[str, str] | None = None,

14

default_prompt_name: str | None = None,

15

similarity_fn_name: str | SimilarityFunction | None = None,

16

cache_folder: str | None = None,

17

trust_remote_code: bool = False,

18

revision: str | None = None,

19

local_files_only: bool = False,

20

token: bool | str | None = None,

21

use_auth_token: bool | str | None = None,

22

truncate_dim: int | None = None,

23

model_kwargs: dict[str, Any] | None = None,

24

tokenizer_kwargs: dict[str, Any] | None = None,

25

config_kwargs: dict[str, Any] | None = None,

26

model_card_data: SentenceTransformerModelCardData | None = None,

27

backend: Literal["torch", "onnx", "openvino"] = "torch"

28

)

29

```

30

`{ .api }`

31

32

Initialize a SentenceTransformer model.

33

34

**Parameters**:

35

- `model_name_or_path`: Model identifier from HuggingFace Hub or local path

36

- `modules`: Iterable of PyTorch modules to create custom model architecture

37

- `device`: Device to run the model on ('cpu', 'cuda', 'mps', 'npu', etc.)

38

- `prompts`: Dictionary of prompts for different tasks

39

- `default_prompt_name`: Default prompt to use

40

- `similarity_fn_name`: Similarity function for embeddings comparison

41

- `cache_folder`: Custom cache directory for models

42

- `trust_remote_code`: Allow custom code execution from remote models

43

- `revision`: Specific model revision/branch to load

44

- `local_files_only`: Only use locally cached files

45

- `token`: HuggingFace authentication token

46

- `use_auth_token`: Deprecated argument, use `token` instead

47

- `truncate_dim`: Truncate embeddings to this dimension

48

- `model_kwargs`: Additional model configuration parameters

49

- `tokenizer_kwargs`: Additional tokenizer configuration parameters

50

- `config_kwargs`: Additional model configuration parameters

51

- `model_card_data`: Model card data object for generating model cards

52

- `backend`: Backend to use for inference ("torch", "onnx", "openvino")

53

54

### Core Encoding Methods

55

56

```python

57

def encode(

58

sentences: str | list[str] | np.ndarray,

59

prompt_name: str | None = None,

60

prompt: str | None = None,

61

batch_size: int = 32,

62

show_progress_bar: bool | None = None,

63

output_value: Literal["sentence_embedding", "token_embeddings"] | None = "sentence_embedding",

64

precision: Literal["float32", "int8", "uint8", "binary", "ubinary"] = "float32",

65

convert_to_numpy: bool = True,

66

convert_to_tensor: bool = False,

67

device: str | list[str | torch.device] | None = None,

68

normalize_embeddings: bool = False,

69

truncate_dim: int | None = None,

70

pool: dict[Literal["input", "output", "processes"], Any] | None = None,

71

chunk_size: int | None = None,

72

**kwargs

73

) -> list[Tensor] | np.ndarray | Tensor | dict[str, Tensor] | list[dict[str, Tensor]]

74

```

75

`{ .api }`

76

77

Encode sentences into embeddings.

78

79

**Parameters**:

80

- `sentences`: Input text(s) to encode

81

- `prompt_name`: Name of the prompt to use for encoding

82

- `prompt`: The prompt to use for encoding

83

- `batch_size`: Batch size for processing

84

- `show_progress_bar`: Display progress bar during encoding

85

- `output_value`: Type of embeddings to return ('sentence_embedding', 'token_embeddings', or None for all)

86

- `precision`: Precision to use for embeddings ("float32", "int8", "uint8", "binary", "ubinary")

87

- `convert_to_numpy`: Return numpy arrays instead of tensors

88

- `convert_to_tensor`: Return PyTorch tensors

89

- `device`: Device(s) for computation (single device or list for multi-process)

90

- `normalize_embeddings`: L2 normalize the embeddings

91

- `truncate_dim`: Dimension to truncate sentence embeddings to

92

- `pool`: Multi-process pool for encoding

93

- `chunk_size`: Size of chunks for multi-process encoding

94

- `**kwargs`: Additional keyword arguments

95

96

**Returns**: Embeddings as numpy arrays, tensors, or lists

97

98

```python

99

def encode_query(

100

sentences: str | list[str] | np.ndarray,

101

prompt_name: str | None = None,

102

prompt: str | None = None,

103

batch_size: int = 32,

104

show_progress_bar: bool | None = None,

105

output_value: Literal["sentence_embedding", "token_embeddings"] | None = "sentence_embedding",

106

precision: Literal["float32", "int8", "uint8", "binary", "ubinary"] = "float32",

107

convert_to_numpy: bool = True,

108

convert_to_tensor: bool = False,

109

device: str | list[str | torch.device] | None = None,

110

normalize_embeddings: bool = False,

111

truncate_dim: int | None = None,

112

pool: dict[Literal["input", "output", "processes"], Any] | None = None,

113

chunk_size: int | None = None,

114

**kwargs

115

) -> list[Tensor] | np.ndarray | Tensor | dict[str, Tensor] | list[dict[str, Tensor]]

116

```

117

`{ .api }`

118

119

Encode queries for retrieval tasks with query-specific prompt.

120

121

```python

122

def encode_document(

123

sentences: str | list[str] | np.ndarray,

124

prompt_name: str | None = None,

125

prompt: str | None = None,

126

batch_size: int = 32,

127

show_progress_bar: bool | None = None,

128

output_value: Literal["sentence_embedding", "token_embeddings"] | None = "sentence_embedding",

129

precision: Literal["float32", "int8", "uint8", "binary", "ubinary"] = "float32",

130

convert_to_numpy: bool = True,

131

convert_to_tensor: bool = False,

132

device: str | list[str | torch.device] | None = None,

133

normalize_embeddings: bool = False,

134

truncate_dim: int | None = None,

135

pool: dict[Literal["input", "output", "processes"], Any] | None = None,

136

chunk_size: int | None = None,

137

**kwargs

138

) -> list[Tensor] | np.ndarray | Tensor | dict[str, Tensor] | list[dict[str, Tensor]]

139

```

140

`{ .api }`

141

142

Encode documents for retrieval tasks with document-specific prompt.

143

144

### Similarity Methods

145

146

```python

147

def similarity(

148

embeddings1: Tensor | npt.NDArray[np.float32],

149

embeddings2: Tensor | npt.NDArray[np.float32]

150

) -> Tensor

151

```

152

`{ .api }`

153

154

Compute similarity between two sets of embeddings using the model's similarity function.

155

156

```python

157

def similarity_pairwise(

158

embeddings1: Tensor | npt.NDArray[np.float32],

159

embeddings2: Tensor | npt.NDArray[np.float32]

160

) -> Tensor

161

```

162

`{ .api }`

163

164

Compute pairwise similarities between all embeddings in two sets.

165

166

### Model Inspection Methods

167

168

```python

169

def get_sentence_embedding_dimension() -> int | None

170

```

171

`{ .api }`

172

173

Get the dimension of sentence embeddings.

174

175

```python

176

def get_max_seq_length() -> int | None

177

```

178

`{ .api }`

179

180

Get the maximum sequence length the model can handle.

181

182

```python

183

def tokenize(

184

texts: list[str] | list[dict] | list[tuple[str, str]],

185

**kwargs

186

) -> dict[str, Tensor]

187

```

188

`{ .api }`

189

190

Tokenize input texts using the model's tokenizer.

191

192

### Model Persistence

193

194

```python

195

def save(

196

path: str,

197

model_name: str | None = None,

198

create_model_card: bool = True,

199

train_datasets: list[str] | None = None,

200

safe_serialization: bool = True

201

) -> None

202

```

203

`{ .api }`

204

205

Save the model to a local directory.

206

207

```python

208

def save_pretrained(

209

save_directory: str,

210

**kwargs

211

) -> None

212

```

213

`{ .api }`

214

215

Save model using HuggingFace format.

216

217

```python

218

def save_to_hub(

219

repo_id: str,

220

organization: str | None = None,

221

token: str | None = None,

222

private: bool | None = None,

223

safe_serialization: bool = True,

224

commit_message: str = "Add new SentenceTransformer model.",

225

local_model_path: str | None = None,

226

exist_ok: bool = False,

227

replace_model_card: bool = False,

228

train_datasets: list[str] | None = None

229

) -> str

230

```

231

`{ .api }`

232

233

Save and push model to HuggingFace Hub.

234

235

```python

236

def push_to_hub(

237

repo_id: str,

238

token: str | None = None,

239

private: bool | None = None,

240

safe_serialization: bool = True,

241

commit_message: str | None = None,

242

local_model_path: str | None = None,

243

exist_ok: bool = False,

244

replace_model_card: bool = False,

245

train_datasets: list[str] | None = None,

246

revision: str | None = None,

247

create_pr: bool = False

248

) -> str

249

```

250

`{ .api }`

251

252

Push existing model to HuggingFace Hub.

253

254

### Evaluation and Processing

255

256

```python

257

def evaluate(

258

evaluator: SentenceEvaluator,

259

output_path: str | None = None

260

) -> float | dict[str, float]

261

```

262

`{ .api }`

263

264

Evaluate the model using a provided evaluator.

265

266

```python

267

def forward(

268

input: dict[str, torch.Tensor],

269

**kwargs

270

) -> dict[str, torch.Tensor]

271

```

272

`{ .api }`

273

274

Forward pass through the model.

275

276

### Multi-Processing Support

277

278

```python

279

def start_multi_process_pool(

280

target_devices: list[str] | None = None

281

) -> dict[Literal["input", "output", "processes"], Any]

282

```

283

`{ .api }`

284

285

Start a multi-process pool for parallel encoding.

286

287

```python

288

@staticmethod

289

def stop_multi_process_pool(pool: dict[Literal["input", "output", "processes"], Any]) -> None

290

```

291

`{ .api }`

292

293

Stop a multi-process pool.

294

295

```python

296

def encode_multi_process(

297

sentences: list[str],

298

pool: dict[Literal["input", "output", "processes"], Any],

299

prompt_name: str | None = None,

300

prompt: str | None = None,

301

batch_size: int = 32,

302

chunk_size: int | None = None,

303

show_progress_bar: bool | None = None,

304

precision: Literal["float32", "int8", "uint8", "binary", "ubinary"] = "float32",

305

normalize_embeddings: bool = False,

306

truncate_dim: int | None = None

307

) -> np.ndarray

308

```

309

`{ .api }`

310

311

Encode sentences using multi-processing for improved performance.

312

313

### Properties

314

315

```python

316

@property

317

def device() -> torch.device

318

```

319

`{ .api }`

320

321

Current device of the model.

322

323

```python

324

@property

325

def tokenizer() -> PreTrainedTokenizer

326

```

327

`{ .api }`

328

329

Access to the model's tokenizer.

330

331

```python

332

@property

333

def max_seq_length() -> int

334

```

335

`{ .api }`

336

337

Maximum sequence length supported by the model.

338

339

```python

340

@property

341

def similarity_fn_name() -> Literal["cosine", "dot", "euclidean", "manhattan"]

342

```

343

`{ .api }`

344

345

Name of the similarity function used by the model.

346

347

```python

348

@property

349

def transformers_model() -> PreTrainedModel | None

350

```

351

`{ .api }`

352

353

Access to the underlying transformer model.

354

355

## Usage Examples

356

357

### Basic Encoding

358

359

```python

360

from sentence_transformers import SentenceTransformer

361

362

# Load pre-trained model

363

model = SentenceTransformer('all-MiniLM-L6-v2')

364

365

# Encode single sentence

366

embedding = model.encode("Hello world")

367

print(f"Embedding shape: {embedding.shape}")

368

369

# Encode multiple sentences

370

sentences = [

371

"The cat sits on the mat",

372

"A feline rests on a rug",

373

"Dogs are great pets"

374

]

375

embeddings = model.encode(sentences)

376

print(f"Embeddings shape: {embeddings.shape}")

377

```

378

379

### Similarity Computation

380

381

```python

382

# Compute similarity between two sentences

383

sentence1 = "The weather is nice today"

384

sentence2 = "Today has beautiful weather"

385

386

emb1 = model.encode(sentence1)

387

emb2 = model.encode(sentence2)

388

389

similarity = model.similarity(emb1, emb2)

390

print(f"Similarity: {similarity.item():.4f}")

391

392

# Pairwise similarities

393

embeddings = model.encode([

394

"Python is a programming language",

395

"Java is used for software development",

396

"I love pizza",

397

"Pasta is delicious"

398

])

399

400

# Compute all pairwise similarities

401

similarities = model.similarity_pairwise(embeddings, embeddings)

402

print(f"Similarity matrix shape: {similarities.shape}")

403

```

404

405

### Asymmetric Retrieval

406

407

```python

408

# For retrieval tasks with different prompts

409

queries = ["What is machine learning?", "How does neural networks work?"]

410

documents = [

411

"Machine learning is a subset of artificial intelligence",

412

"Neural networks are computational models inspired by biological neurons",

413

"Pizza recipes vary by region and preference"

414

]

415

416

# Encode with task-specific methods

417

query_embeddings = model.encode_query(queries)

418

doc_embeddings = model.encode_document(documents)

419

420

# Compute retrieval similarities

421

similarities = model.similarity(query_embeddings, doc_embeddings)

422

```

423

424

### Custom Model Creation

425

426

```python

427

from sentence_transformers import SentenceTransformer

428

from sentence_transformers.models import Transformer, Pooling, Dense

429

430

# Create custom model architecture

431

transformer = Transformer('distilbert-base-uncased', max_seq_length=256)

432

pooling = Pooling(transformer.get_word_embedding_dimension(), pooling_mode='mean')

433

dense = Dense(pooling.get_sentence_embedding_dimension(), 256, activation_function='tanh')

434

435

# Combine modules

436

model = SentenceTransformer(modules=[transformer, pooling, dense])

437

438

# Use the custom model

439

embeddings = model.encode(["Custom model example"])

440

```

441

442

### Performance Optimization

443

444

```python

445

# Multi-process encoding for large datasets

446

sentences = ["sentence " + str(i) for i in range(10000)]

447

448

# Start multi-process pool

449

pool = model.start_multi_process_pool(['cuda:0', 'cuda:1'])

450

451

# Encode using multiple GPUs

452

embeddings = model.encode_multi_process(sentences, pool, batch_size=64)

453

454

# Clean up

455

model.stop_multi_process_pool(pool)

456

457

# Normalized embeddings for cosine similarity

458

embeddings = model.encode(sentences, normalize_embeddings=True)

459

```

460

461

### Model Persistence

462

463

```python

464

# Save model locally

465

model.save('./my-sentence-transformer')

466

467

# Save to HuggingFace Hub

468

model.save_to_hub('my-username/my-sentence-transformer')

469

470

# Load saved model

471

loaded_model = SentenceTransformer('./my-sentence-transformer')

472

```

473

474

## SimilarityFunction Enum

475

476

```python

477

from sentence_transformers import SimilarityFunction

478

479

class SimilarityFunction(Enum):

480

COSINE = "cosine"

481

DOT_PRODUCT = "dot"

482

DOT = "dot" # Alias for DOT_PRODUCT

483

EUCLIDEAN = "euclidean"

484

MANHATTAN = "manhattan"

485

```

486

`{ .api }`

487

488

Enumeration of available similarity functions for comparing embeddings.

489

490

### Usage with SentenceTransformer

491

492

```python

493

# Set similarity function during initialization

494

model = SentenceTransformer(

495

'all-MiniLM-L6-v2',

496

similarity_fn_name=SimilarityFunction.COSINE

497

)

498

499

# Or use string names

500

model = SentenceTransformer(

501

'all-MiniLM-L6-v2',

502

similarity_fn_name='euclidean'

503

)

504

```

505

506

## Best Practices

507

508

1. **Batch Processing**: Use appropriate batch sizes for your hardware

509

2. **Device Management**: Specify device explicitly for consistent behavior

510

3. **Normalization**: Use normalized embeddings when comparing with cosine similarity

511

4. **Model Selection**: Choose models appropriate for your task and domain

512

5. **Caching**: Enable caching for repeated model loading

513

6. **Multi-Processing**: Use multi-process encoding for large datasets