Tessl Tile for pypi/sentence-transformers@5.1.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

core-transformers.md cross-encoder.md evaluation.md index.md loss-functions.md sparse-encoder.md training.md utilities.md

loss-functions.mddocs/

0
# Loss Functions
1

2
The sentence-transformers package provides an extensive collection of loss functions designed for different learning objectives and training scenarios. These losses enable contrastive learning, supervised fine-tuning, and specialized training approaches.
3

4
## Import Statement
5

6
```python
7
from sentence_transformers.losses import (
8
    CosineSimilarityLoss,
9
    MultipleNegativesRankingLoss, 
10
    TripletLoss,
11
    MatryoshkaLoss,
12
    # ... other loss functions
13
)
14
```
15

16
## Core Loss Functions
17

18
### CosineSimilarityLoss
19

20
```python
21
class CosineSimilarityLoss(torch.nn.Module):
22
    def __init__(
23
        self,
24
        model: SentenceTransformer,
25
        loss_fct: torch.nn.Module = torch.nn.MSELoss(),
26
        cos_score_transformation: torch.nn.Module = torch.nn.Identity()
27
    )
28
```
29
`{ .api }`
30

31
Loss function that measures cosine similarity between sentence pairs with target similarity scores.
32

33
**Parameters**:
34
- `model`: SentenceTransformer model
35
- `loss_fct`: Loss function to apply to cosine similarities (default: MSELoss)
36
- `cos_score_transformation`: Transformation applied to cosine scores
37

38
**Use Case**: Regression on similarity scores, semantic textual similarity tasks
39

40
### MultipleNegativesRankingLoss
41

42
```python
43
class MultipleNegativesRankingLoss(torch.nn.Module):
44
    def __init__(
45
        self,
46
        model: SentenceTransformer,
47
        scale: float = 20.0,
48
        similarity_fct: callable = cos_sim
49
    )
50
```
51
`{ .api }`
52

53
Contrastive loss using in-batch negatives. Optimizes for positive pairs while treating other examples in the batch as negatives.
54

55
**Parameters**:
56
- `model`: SentenceTransformer model
57
- `scale`: Scaling factor for similarities
58
- `similarity_fct`: Function to compute similarities
59

60
**Use Case**: Asymmetric retrieval tasks, contrastive learning with large batches
61

62
### MultipleNegativesSymmetricRankingLoss
63

64
```python
65
class MultipleNegativesSymmetricRankingLoss(torch.nn.Module):
66
    def __init__(
67
        self,
68
        model: SentenceTransformer,
69
        scale: float = 20.0,
70
        similarity_fct: callable = cos_sim
71
    )
72
```
73
`{ .api }`
74

75
Symmetric version of MultipleNegativesRankingLoss that optimizes both (A, B) and (B, A) directions.
76

77
**Parameters**:
78
- `model`: SentenceTransformer model  
79
- `scale`: Scaling factor for similarities
80
- `similarity_fct`: Function to compute similarities
81

82
**Use Case**: Symmetric retrieval tasks, bidirectional similarity learning
83

84
### TripletLoss
85

86
```python
87
class TripletLoss(torch.nn.Module):
88
    def __init__(
89
        self,
90
        model: SentenceTransformer,
91
        distance_metric: TripletDistanceMetric = TripletDistanceMetric.EUCLIDEAN,
92
        triplet_margin: float = 5
93
    )
94
```
95
`{ .api }`
96

97
Classic triplet loss with anchor, positive, and negative examples.
98

99
**Parameters**:
100
- `model`: SentenceTransformer model
101
- `distance_metric`: Distance metric for triplet computation
102
- `triplet_margin`: Margin between positive and negative distances
103

104
**Enum TripletDistanceMetric**:
105
- `COSINE`: Cosine distance
106
- `EUCLIDEAN`: Euclidean distance 
107
- `MANHATTAN`: Manhattan distance
108
- `DOT_PRODUCT`: Dot product distance
109

110
**Use Case**: Learning embeddings with explicit positive/negative relationships
111

112
## Advanced Loss Functions
113

114
### MatryoshkaLoss
115

116
```python
117
class MatryoshkaLoss(torch.nn.Module):
118
    def __init__(
119
        self,
120
        model: SentenceTransformer,
121
        loss: torch.nn.Module,
122
        matryoshka_dims: list[int],
123
        matryoshka_weights: list[float] | None = None
124
    )
125
```
126
`{ .api }`
127

128
Wrapper loss for Matryoshka Representation Learning, enabling models to produce useful embeddings at multiple dimensions.
129

130
**Parameters**:
131
- `model`: SentenceTransformer model
132
- `loss`: Base loss function to wrap
133
- `matryoshka_dims`: List of embedding dimensions to optimize
134
- `matryoshka_weights`: Weights for each dimension (uniform if None)
135

136
**Use Case**: Creating models that work well at multiple embedding dimensions
137

138
### Matryoshka2dLoss
139

140
```python
141
class Matryoshka2dLoss(torch.nn.Module):
142
    def __init__(
143
        self,
144
        model: SentenceTransformer,
145
        loss: torch.nn.Module,
146
        matryoshka_dims: list[int],
147
        n_layers_per_step: int = 1
148
    )
149
```
150
`{ .api }`
151

152
2D Matryoshka loss that optimizes across both embedding dimensions and transformer layers.
153

154
**Parameters**:
155
- `model`: SentenceTransformer model
156
- `loss`: Base loss function
157
- `matryoshka_dims`: Embedding dimensions to optimize
158
- `n_layers_per_step`: Number of layers per optimization step
159

160
**Use Case**: Early exit capabilities and progressive inference
161

162
### MSELoss
163

164
```python
165
class MSELoss(torch.nn.Module):
166
    def __init__(
167
        self,
168
        model: SentenceTransformer
169
    )
170
```
171
`{ .api }`
172

173
Mean Squared Error loss for regression tasks with continuous similarity scores.
174

175
**Use Case**: Direct regression on similarity scores, knowledge distillation
176

177
### MarginMSELoss
178

179
```python
180
class MarginMSELoss(torch.nn.Module):
181
    def __init__(
182
        self,
183
        model: SentenceTransformer
184
    )
185
```
186
`{ .api }`
187

188
MSE loss with margin-based formulation for triplet-like data.
189

190
**Use Case**: Triplet data with continuous similarity scores
191

192
## Specialized Loss Functions
193

194
### ContrastiveLoss
195

196
```python
197
class ContrastiveLoss(torch.nn.Module):
198
    def __init__(
199
        self,
200
        model: SentenceTransformer,
201
        distance_metric: SiameseDistanceMetric = SiameseDistanceMetric.EUCLIDEAN,
202
        margin: float = 0.5,
203
        size_average: bool = True
204
    )
205
```
206
`{ .api }`
207

208
Classic contrastive loss for siamese networks with binary similarity labels.
209

210
**Parameters**:
211
- `model`: SentenceTransformer model
212
- `distance_metric`: Distance metric to use
213
- `margin`: Margin for negative pairs
214
- `size_average`: Whether to average the loss
215

216
**Enum SiameseDistanceMetric**:
217
- `EUCLIDEAN`: Euclidean distance
218
- `MANHATTAN`: Manhattan distance  
219
- `COSINE_DISTANCE`: Cosine distance
220

221
**Use Case**: Binary similarity classification, siamese networks
222

223
### SoftmaxLoss
224

225
```python
226
class SoftmaxLoss(torch.nn.Module):
227
    def __init__(
228
        self,
229
        model: SentenceTransformer,
230
        sentence_embedding_dimension: int,
231
        num_labels: int,
232
        concatenation_sent_rep: bool = True,
233
        concatenation_sent_difference: bool = True,
234
        concatenation_sent_multiplication: bool = False
235
    )
236
```
237
`{ .api }`
238

239
Classification loss using softmax over sentence pair representations.
240

241
**Parameters**:
242
- `model`: SentenceTransformer model
243
- `sentence_embedding_dimension`: Dimension of sentence embeddings
244
- `num_labels`: Number of classification labels
245
- `concatenation_sent_rep`: Include individual sentence representations
246
- `concatenation_sent_difference`: Include element-wise difference
247
- `concatenation_sent_multiplication`: Include element-wise product
248

249
**Use Case**: Natural language inference, text classification
250

251
## Batch-Based Triplet Losses
252

253
### BatchHardTripletLoss
254

255
```python
256
class BatchHardTripletLoss(torch.nn.Module):
257
    def __init__(
258
        self,
259
        model: SentenceTransformer,
260
        distance_function: BatchHardTripletLossDistanceFunction = BatchHardTripletLossDistanceFunction.cosine_distance,
261
        margin: float = 5
262
    )
263
```
264
`{ .api }`
265

266
Batch hard triplet loss that mines the hardest positive and negative pairs within each batch.
267

268
**Parameters**:
269
- `model`: SentenceTransformer model
270
- `distance_function`: Distance function for triplet mining
271
- `margin`: Triplet margin
272

273
**Enum BatchHardTripletLossDistanceFunction**:
274
- `cosine_distance`: Cosine distance
275
- `euclidean_distance`: Euclidean distance
276

277
**Use Case**: Metric learning with automatic hard negative mining
278

279
### BatchSemiHardTripletLoss
280

281
```python
282
class BatchSemiHardTripletLoss(torch.nn.Module):
283
    def __init__(
284
        self,
285
        model: SentenceTransformer,
286
        distance_function: BatchHardTripletLossDistanceFunction = BatchHardTripletLossDistanceFunction.cosine_distance,
287
        margin: float = 5
288
    )
289
```
290
`{ .api }`
291

292
Batch semi-hard triplet loss that mines semi-hard negatives (harder than positive but within margin).
293

294
**Use Case**: More stable training than hard negative mining
295

296
### BatchHardSoftMarginTripletLoss
297

298
```python
299
class BatchHardSoftMarginTripletLoss(torch.nn.Module):
300
    def __init__(
301
        self,
302
        model: SentenceTransformer,
303
        distance_function: BatchHardTripletLossDistanceFunction = BatchHardTripletLossDistanceFunction.cosine_distance
304
    )
305
```
306
`{ .api }`
307

308
Batch hard triplet loss with soft margin (no explicit margin parameter).
309

310
**Use Case**: Triplet learning without manual margin tuning
311

312
### BatchAllTripletLoss
313

314
```python
315
class BatchAllTripletLoss(torch.nn.Module):
316
    def __init__(
317
        self,
318
        model: SentenceTransformer,
319
        distance_function: BatchHardTripletLossDistanceFunction = BatchHardTripletLossDistanceFunction.cosine_distance,
320
        margin: float = 5
321
    )
322
```
323
`{ .api }`
324

325
Uses all valid triplets in a batch for training.
326

327
**Use Case**: Comprehensive triplet learning when computational resources allow
328

329
## Contrastive and Tension Losses
330

331
### OnlineContrastiveLoss
332

333
```python
334
class OnlineContrastiveLoss(torch.nn.Module):
335
    def __init__(
336
        self,
337
        model: SentenceTransformer,
338
        distance_metric: SiameseDistanceMetric = SiameseDistanceMetric.COSINE_DISTANCE,
339
        margin: float = 0.5,
340
        size_average: bool = True
341
    )
342
```
343
`{ .api }`
344

345
Online version of contrastive loss for streaming/online learning scenarios.
346

347
**Use Case**: Incremental learning, online adaptation
348

349
### ContrastiveTensionLoss
350

351
```python
352
class ContrastiveTensionLoss(torch.nn.Module):
353
    def __init__(
354
        self,
355
        model: SentenceTransformer,
356
        scale: float = 20.0,
357
        similarity_fct: callable = cos_sim
358
    )
359
```
360
`{ .api }`
361

362
Contrastive loss using tension-based sampling for better negative selection.
363

364
**Use Case**: Improved contrastive learning with better negative sampling
365

366
### ContrastiveTensionLossInBatchNegatives
367

368
```python
369
class ContrastiveTensionLossInBatchNegatives(torch.nn.Module):
370
    def __init__(
371
        self,
372
        model: SentenceTransformer,
373
        scale: float = 20.0,
374
        similarity_fct: callable = cos_sim
375
    )
376
```
377
`{ .api }`
378

379
In-batch version of contrastive tension loss.
380

381
**Use Case**: Efficient contrastive learning with in-batch negatives
382

383
### ContrastiveTensionDataLoader
384

385
```python
386
class ContrastiveTensionDataLoader:
387
    def __init__(
388
        self,
389
        examples: list,
390
        batch_size: int = 32,
391
        pos_neg_ratio: int = 4
392
    )
393
```
394
`{ .api }`
395

396
Specialized data loader for contrastive tension training.
397

398
**Parameters**:
399
- `examples`: Training examples
400
- `batch_size`: Batch size
401
- `pos_neg_ratio`: Ratio of positives to negatives
402

403
## Advanced and Specialized Losses
404

405
### AnglELoss
406

407
```python
408
class AnglELoss(torch.nn.Module):
409
    def __init__(
410
        self,
411
        model: SentenceTransformer,
412
        angle_w: float = 1.0,
413
        angle_tau: float = 1.0,
414
        cosine_w: float = 1.0,
415
        cosine_tau: float = 1.0,
416
        ibn_w: float = 1.0,
417
        pooling_strategy: str = "cls"
418
    )
419
```
420
`{ .api }`
421

422
AnglE (Angle-optimized Text Embeddings) loss function that optimizes both angle and magnitude of embeddings.
423

424
**Use Case**: State-of-the-art performance on text embedding benchmarks
425

426
### CoSENTLoss
427

428
```python
429
class CoSENTLoss(torch.nn.Module):
430
    def __init__(
431
        self,
432
        model: SentenceTransformer,
433
        scale: float = 20.0,
434
        similarity_fct: callable = cos_sim
435
    )
436
```
437
`{ .api }`
438

439
CoSENT (Cosine Sentence) loss for optimized sentence embeddings.
440

441
**Use Case**: Improved sentence similarity learning
442

443
### GISTEmbedLoss
444

445
```python
446
class GISTEmbedLoss(torch.nn.Module):
447
    def __init__(
448
        self,
449
        model: SentenceTransformer,
450
        guide: SentenceTransformer
451
    )
452
```
453
`{ .api }`
454

455
GIST (Guided In-context Selection of Training-data) embedding loss for knowledge distillation.
456

457
**Parameters**:
458
- `model`: Student model to train
459
- `guide`: Teacher model for guidance
460

461
**Use Case**: Knowledge distillation, model compression
462

463
### CachedGISTEmbedLoss
464

465
```python
466
class CachedGISTEmbedLoss(torch.nn.Module):
467
    def __init__(
468
        self,
469
        model: SentenceTransformer,
470
        guide: SentenceTransformer,
471
        mini_batch_size: int = 32
472
    )
473
```
474
`{ .api }`
475

476
Cached version of GIST loss for memory efficiency with large datasets.
477

478
**Use Case**: Memory-efficient knowledge distillation
479

480
### DenoisingAutoEncoderLoss
481

482
```python
483
class DenoisingAutoEncoderLoss(torch.nn.Module):
484
    def __init__(
485
        self,
486
        model: SentenceTransformer,
487
        decoder_name_or_path: str = None,
488
        tie_encoder_decoder: bool = True
489
    )
490
```
491
`{ .api }`
492

493
Denoising autoencoder loss for self-supervised learning.
494

495
**Parameters**:
496
- `model`: SentenceTransformer encoder
497
- `decoder_name_or_path`: Decoder model path
498
- `tie_encoder_decoder`: Whether to tie encoder and decoder weights
499

500
**Use Case**: Self-supervised pre-training, unsupervised learning
501

502
### MegaBatchMarginLoss
503

504
```python
505
class MegaBatchMarginLoss(torch.nn.Module):
506
    def __init__(
507
        self,
508
        model: SentenceTransformer,
509
        scale: float = 1.0,
510
        similarity_fct: callable = cos_sim
511
    )
512
```
513
`{ .api }`
514

515
Margin-based loss designed for very large batch training.
516

517
**Use Case**: Large-scale contrastive learning with massive batches
518

519
### DistillKLDivLoss
520

521
```python
522
class DistillKLDivLoss(torch.nn.Module):
523
    def __init__(
524
        self,
525
        model: SentenceTransformer,
526
        teacher_model: SentenceTransformer
527
    )
528
```
529
`{ .api }`
530

531
Knowledge distillation using KL divergence between student and teacher embeddings.
532

533
**Use Case**: Model distillation, compression
534

535
### AdaptiveLayerLoss
536

537
```python
538
class AdaptiveLayerLoss(torch.nn.Module):
539
    def __init__(
540
        self,
541
        model: SentenceTransformer,
542
        loss: torch.nn.Module,
543
        n_layers_per_step: int = 1
544
    )
545
```
546
`{ .api }`
547

548
Adaptive loss that progressively uses more transformer layers during training.
549

550
**Use Case**: Progressive training, computational efficiency
551

552
## Cached Loss Functions
553

554
### CachedMultipleNegativesRankingLoss
555

556
```python
557
class CachedMultipleNegativesRankingLoss(torch.nn.Module):
558
    def __init__(
559
        self,
560
        model: SentenceTransformer,
561
        scale: float = 20.0,
562
        similarity_fct: callable = cos_sim,
563
        mini_batch_size: int = 32
564
    )
565
```
566
`{ .api }`
567

568
Memory-efficient cached version of MultipleNegativesRankingLoss for large datasets.
569

570
### CachedMultipleNegativesSymmetricRankingLoss
571

572
```python
573
class CachedMultipleNegativesSymmetricRankingLoss(torch.nn.Module):
574
    def __init__(
575
        self,
576
        model: SentenceTransformer,
577
        scale: float = 20.0,
578
        similarity_fct: callable = cos_sim,
579
        mini_batch_size: int = 32
580
    )
581
```
582
`{ .api }`
583

584
Cached symmetric version for memory efficiency.
585

586
## Usage Examples
587

588
### Basic Contrastive Learning
589

590
```python
591
from sentence_transformers import SentenceTransformer
592
from sentence_transformers.losses import MultipleNegativesRankingLoss
593
from datasets import Dataset
594

595
# Initialize model and loss
596
model = SentenceTransformer('distilbert-base-uncased')
597
loss = MultipleNegativesRankingLoss(model, scale=20.0)
598

599
# Prepare data (anchor-positive pairs)
600
train_data = [
601
    {"anchor": "The cat sits on the mat", "positive": "A feline rests on a rug"},
602
    {"anchor": "Python programming language", "positive": "Coding with Python"}
603
]
604

605
train_dataset = Dataset.from_list(train_data)
606

607
# Training with contrastive loss
608
from sentence_transformers import SentenceTransformerTrainer, SentenceTransformerTrainingArguments
609

610
args = SentenceTransformerTrainingArguments(
611
    output_dir='./contrastive-training',
612
    per_device_train_batch_size=64,  # Larger batches work better
613
    num_train_epochs=3
614
)
615

616
trainer = SentenceTransformerTrainer(
617
    model=model,
618
    args=args,
619
    train_dataset=train_dataset,
620
    loss=loss
621
)
622

623
trainer.train()
624
```
625

626
### Triplet Learning
627

628
```python
629
from sentence_transformers.losses import TripletLoss, TripletDistanceMetric
630

631
# Triplet loss with cosine distance
632
triplet_loss = TripletLoss(
633
    model=model,
634
    distance_metric=TripletDistanceMetric.COSINE,
635
    triplet_margin=0.5
636
)
637

638
# Prepare triplet data
639
triplet_data = [
640
    {
641
        "anchor": "The cat sits on the mat",
642
        "positive": "A feline rests on a rug", 
643
        "negative": "Dogs are great pets"
644
    }
645
]
646

647
triplet_dataset = Dataset.from_list(triplet_data)
648

649
trainer = SentenceTransformerTrainer(
650
    model=model,
651
    args=args,
652
    train_dataset=triplet_dataset,
653
    loss=triplet_loss
654
)
655

656
trainer.train()
657
```
658

659
### Matryoshka Representation Learning
660

661
```python
662
from sentence_transformers.losses import MatryoshkaLoss
663

664
# Base loss
665
base_loss = MultipleNegativesRankingLoss(model)
666

667
# Matryoshka loss with multiple dimensions
668
matryoshka_loss = MatryoshkaLoss(
669
    model=model,
670
    loss=base_loss,
671
    matryoshka_dims=[768, 512, 256, 128, 64],
672
    matryoshka_weights=[1, 1, 1, 1, 1]  # Equal weights
673
)
674

675
trainer = SentenceTransformerTrainer(
676
    model=model,
677
    args=args,
678
    train_dataset=train_dataset,
679
    loss=matryoshka_loss
680
)
681

682
trainer.train()
683

684
# Test at different dimensions
685
embeddings_full = model.encode(["Test"], truncate_dim=None)
686
embeddings_256 = model.encode(["Test"], truncate_dim=256)
687
embeddings_64 = model.encode(["Test"], truncate_dim=64)
688
```
689

690
### Similarity Regression
691

692
```python
693
from sentence_transformers.losses import CosineSimilarityLoss
694
import torch.nn as nn
695

696
# Cosine similarity loss with different transformations
697
mse_loss = CosineSimilarityLoss(
698
    model=model,
699
    loss_fct=nn.MSELoss(),
700
    cos_score_transformation=nn.Identity()
701
)
702

703
# For scores in [0, 1] range
704
sigmoid_loss = CosineSimilarityLoss(
705
    model=model,
706
    loss_fct=nn.MSELoss(),
707
    cos_score_transformation=nn.Sigmoid()
708
)
709

710
# Prepare similarity data
711
similarity_data = [
712
    {"sentence1": "The cat sits", "sentence2": "A cat is sitting", "label": 0.9},
713
    {"sentence1": "Dogs bark", "sentence2": "Cars are fast", "label": 0.1}
714
]
715

716
similarity_dataset = Dataset.from_list(similarity_data)
717

718
trainer = SentenceTransformerTrainer(
719
    model=model,
720
    args=args,
721
    train_dataset=similarity_dataset,
722
    loss=mse_loss
723
)
724

725
trainer.train()
726
```
727

728
### Knowledge Distillation
729

730
```python
731
from sentence_transformers.losses import DistillKLDivLoss
732

733
# Teacher model (larger, pre-trained)
734
teacher_model = SentenceTransformer('all-mpnet-base-v2')
735

736
# Student model (smaller)
737
student_model = SentenceTransformer('distilbert-base-uncased')
738

739
# Distillation loss
740
distill_loss = DistillKLDivLoss(
741
    model=student_model,
742
    teacher_model=teacher_model
743
)
744

745
trainer = SentenceTransformerTrainer(
746
    model=student_model,
747
    args=args,
748
    train_dataset=train_dataset,
749
    loss=distill_loss
750
)
751

752
trainer.train()
753
```
754

755
### Multi-Task Learning
756

757
```python
758
from sentence_transformers.losses import SoftmaxLoss
759

760
# Combine different losses for multi-task learning
761
contrastive_loss = MultipleNegativesRankingLoss(model)
762
classification_loss = SoftmaxLoss(
763
    model=model,
764
    sentence_embedding_dimension=768,
765
    num_labels=3  # For NLI: entailment, contradiction, neutral
766
)
767

768
# Multi-dataset training
769
datasets = {
770
    "similarity": similarity_dataset,
771
    "classification": nli_dataset
772
}
773

774
losses = {
775
    "similarity": contrastive_loss,
776
    "classification": classification_loss
777
}
778

779
trainer = SentenceTransformerTrainer(
780
    model=model,
781
    args=args,
782
    train_dataset=datasets,
783
    loss=losses
784
)
785

786
trainer.train()
787
```
788

789
### Advanced Batch Mining
790

791
```python
792
from sentence_transformers.losses import BatchHardTripletLoss, BatchHardTripletLossDistanceFunction
793

794
# Hard negative mining within batches
795
batch_hard_loss = BatchHardTripletLoss(
796
    model=model,
797
    distance_function=BatchHardTripletLossDistanceFunction.cosine_distance,
798
    margin=0.2
799
)
800

801
# Use with datasets that have class labels
802
class_data = [
803
    {"text": "Python programming", "label": 0},
804
    {"text": "Coding in Python", "label": 0},
805
    {"text": "Machine learning", "label": 1},
806
    {"text": "AI algorithms", "label": 1}
807
]
808

809
class_dataset = Dataset.from_list(class_data)
810

811
trainer = SentenceTransformerTrainer(
812
    model=model,
813
    args=args,
814
    train_dataset=class_dataset,
815
    loss=batch_hard_loss
816
)
817

818
trainer.train()
819
```
820

821
## Best Practices
822

823
1. **Loss Selection**: Choose loss functions based on your data format and task
824
2. **Batch Size**: Use larger batches (64+) for contrastive losses when possible
825
3. **Scaling**: Adjust scale parameters based on your similarity function
826
4. **Negative Sampling**: Consider hard negative mining for improved performance
827
5. **Multi-Task**: Combine different losses for comprehensive training
828
6. **Progressive Training**: Use Matryoshka or adaptive losses for efficiency
829
7. **Evaluation**: Monitor performance on validation sets during training
830
8. **Hyperparameter Tuning**: Experiment with margins, scales, and learning rates

Version

Tile

Files

loss-functions.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

loss-functions.mddocs/