Embeddings, Retrieval, and Reranking framework for computing dense, sparse, and cross-encoder embeddings using state-of-the-art transformer models
—
The sentence-transformers package provides an extensive collection of loss functions designed for different learning objectives and training scenarios. These losses enable contrastive learning, supervised fine-tuning, and specialized training approaches.
from sentence_transformers.losses import (
CosineSimilarityLoss,
MultipleNegativesRankingLoss,
TripletLoss,
MatryoshkaLoss,
# ... other loss functions
)class CosineSimilarityLoss(torch.nn.Module):
def __init__(
self,
model: SentenceTransformer,
loss_fct: torch.nn.Module = torch.nn.MSELoss(),
cos_score_transformation: torch.nn.Module = torch.nn.Identity()
){ .api }
Loss function that measures cosine similarity between sentence pairs with target similarity scores.
Parameters:
model: SentenceTransformer modelloss_fct: Loss function to apply to cosine similarities (default: MSELoss)cos_score_transformation: Transformation applied to cosine scoresUse Case: Regression on similarity scores, semantic textual similarity tasks
class MultipleNegativesRankingLoss(torch.nn.Module):
def __init__(
self,
model: SentenceTransformer,
scale: float = 20.0,
similarity_fct: callable = cos_sim
){ .api }
Contrastive loss using in-batch negatives. Optimizes for positive pairs while treating other examples in the batch as negatives.
Parameters:
model: SentenceTransformer modelscale: Scaling factor for similaritiessimilarity_fct: Function to compute similaritiesUse Case: Asymmetric retrieval tasks, contrastive learning with large batches
class MultipleNegativesSymmetricRankingLoss(torch.nn.Module):
def __init__(
self,
model: SentenceTransformer,
scale: float = 20.0,
similarity_fct: callable = cos_sim
){ .api }
Symmetric version of MultipleNegativesRankingLoss that optimizes both (A, B) and (B, A) directions.
Parameters:
model: SentenceTransformer modelscale: Scaling factor for similaritiessimilarity_fct: Function to compute similaritiesUse Case: Symmetric retrieval tasks, bidirectional similarity learning
class TripletLoss(torch.nn.Module):
def __init__(
self,
model: SentenceTransformer,
distance_metric: TripletDistanceMetric = TripletDistanceMetric.EUCLIDEAN,
triplet_margin: float = 5
){ .api }
Classic triplet loss with anchor, positive, and negative examples.
Parameters:
model: SentenceTransformer modeldistance_metric: Distance metric for triplet computationtriplet_margin: Margin between positive and negative distancesEnum TripletDistanceMetric:
COSINE: Cosine distanceEUCLIDEAN: Euclidean distanceMANHATTAN: Manhattan distanceDOT_PRODUCT: Dot product distanceUse Case: Learning embeddings with explicit positive/negative relationships
class MatryoshkaLoss(torch.nn.Module):
def __init__(
self,
model: SentenceTransformer,
loss: torch.nn.Module,
matryoshka_dims: list[int],
matryoshka_weights: list[float] | None = None
){ .api }
Wrapper loss for Matryoshka Representation Learning, enabling models to produce useful embeddings at multiple dimensions.
Parameters:
model: SentenceTransformer modelloss: Base loss function to wrapmatryoshka_dims: List of embedding dimensions to optimizematryoshka_weights: Weights for each dimension (uniform if None)Use Case: Creating models that work well at multiple embedding dimensions
class Matryoshka2dLoss(torch.nn.Module):
def __init__(
self,
model: SentenceTransformer,
loss: torch.nn.Module,
matryoshka_dims: list[int],
n_layers_per_step: int = 1
){ .api }
2D Matryoshka loss that optimizes across both embedding dimensions and transformer layers.
Parameters:
model: SentenceTransformer modelloss: Base loss functionmatryoshka_dims: Embedding dimensions to optimizen_layers_per_step: Number of layers per optimization stepUse Case: Early exit capabilities and progressive inference
class MSELoss(torch.nn.Module):
def __init__(
self,
model: SentenceTransformer
){ .api }
Mean Squared Error loss for regression tasks with continuous similarity scores.
Use Case: Direct regression on similarity scores, knowledge distillation
class MarginMSELoss(torch.nn.Module):
def __init__(
self,
model: SentenceTransformer
){ .api }
MSE loss with margin-based formulation for triplet-like data.
Use Case: Triplet data with continuous similarity scores
class ContrastiveLoss(torch.nn.Module):
def __init__(
self,
model: SentenceTransformer,
distance_metric: SiameseDistanceMetric = SiameseDistanceMetric.EUCLIDEAN,
margin: float = 0.5,
size_average: bool = True
){ .api }
Classic contrastive loss for siamese networks with binary similarity labels.
Parameters:
model: SentenceTransformer modeldistance_metric: Distance metric to usemargin: Margin for negative pairssize_average: Whether to average the lossEnum SiameseDistanceMetric:
EUCLIDEAN: Euclidean distanceMANHATTAN: Manhattan distanceCOSINE_DISTANCE: Cosine distanceUse Case: Binary similarity classification, siamese networks
class SoftmaxLoss(torch.nn.Module):
def __init__(
self,
model: SentenceTransformer,
sentence_embedding_dimension: int,
num_labels: int,
concatenation_sent_rep: bool = True,
concatenation_sent_difference: bool = True,
concatenation_sent_multiplication: bool = False
){ .api }
Classification loss using softmax over sentence pair representations.
Parameters:
model: SentenceTransformer modelsentence_embedding_dimension: Dimension of sentence embeddingsnum_labels: Number of classification labelsconcatenation_sent_rep: Include individual sentence representationsconcatenation_sent_difference: Include element-wise differenceconcatenation_sent_multiplication: Include element-wise productUse Case: Natural language inference, text classification
class BatchHardTripletLoss(torch.nn.Module):
def __init__(
self,
model: SentenceTransformer,
distance_function: BatchHardTripletLossDistanceFunction = BatchHardTripletLossDistanceFunction.cosine_distance,
margin: float = 5
){ .api }
Batch hard triplet loss that mines the hardest positive and negative pairs within each batch.
Parameters:
model: SentenceTransformer modeldistance_function: Distance function for triplet miningmargin: Triplet marginEnum BatchHardTripletLossDistanceFunction:
cosine_distance: Cosine distanceeuclidean_distance: Euclidean distanceUse Case: Metric learning with automatic hard negative mining
class BatchSemiHardTripletLoss(torch.nn.Module):
def __init__(
self,
model: SentenceTransformer,
distance_function: BatchHardTripletLossDistanceFunction = BatchHardTripletLossDistanceFunction.cosine_distance,
margin: float = 5
){ .api }
Batch semi-hard triplet loss that mines semi-hard negatives (harder than positive but within margin).
Use Case: More stable training than hard negative mining
class BatchHardSoftMarginTripletLoss(torch.nn.Module):
def __init__(
self,
model: SentenceTransformer,
distance_function: BatchHardTripletLossDistanceFunction = BatchHardTripletLossDistanceFunction.cosine_distance
){ .api }
Batch hard triplet loss with soft margin (no explicit margin parameter).
Use Case: Triplet learning without manual margin tuning
class BatchAllTripletLoss(torch.nn.Module):
def __init__(
self,
model: SentenceTransformer,
distance_function: BatchHardTripletLossDistanceFunction = BatchHardTripletLossDistanceFunction.cosine_distance,
margin: float = 5
){ .api }
Uses all valid triplets in a batch for training.
Use Case: Comprehensive triplet learning when computational resources allow
class OnlineContrastiveLoss(torch.nn.Module):
def __init__(
self,
model: SentenceTransformer,
distance_metric: SiameseDistanceMetric = SiameseDistanceMetric.COSINE_DISTANCE,
margin: float = 0.5,
size_average: bool = True
){ .api }
Online version of contrastive loss for streaming/online learning scenarios.
Use Case: Incremental learning, online adaptation
class ContrastiveTensionLoss(torch.nn.Module):
def __init__(
self,
model: SentenceTransformer,
scale: float = 20.0,
similarity_fct: callable = cos_sim
){ .api }
Contrastive loss using tension-based sampling for better negative selection.
Use Case: Improved contrastive learning with better negative sampling
class ContrastiveTensionLossInBatchNegatives(torch.nn.Module):
def __init__(
self,
model: SentenceTransformer,
scale: float = 20.0,
similarity_fct: callable = cos_sim
){ .api }
In-batch version of contrastive tension loss.
Use Case: Efficient contrastive learning with in-batch negatives
class ContrastiveTensionDataLoader:
def __init__(
self,
examples: list,
batch_size: int = 32,
pos_neg_ratio: int = 4
){ .api }
Specialized data loader for contrastive tension training.
Parameters:
examples: Training examplesbatch_size: Batch sizepos_neg_ratio: Ratio of positives to negativesclass AnglELoss(torch.nn.Module):
def __init__(
self,
model: SentenceTransformer,
angle_w: float = 1.0,
angle_tau: float = 1.0,
cosine_w: float = 1.0,
cosine_tau: float = 1.0,
ibn_w: float = 1.0,
pooling_strategy: str = "cls"
){ .api }
AnglE (Angle-optimized Text Embeddings) loss function that optimizes both angle and magnitude of embeddings.
Use Case: State-of-the-art performance on text embedding benchmarks
class CoSENTLoss(torch.nn.Module):
def __init__(
self,
model: SentenceTransformer,
scale: float = 20.0,
similarity_fct: callable = cos_sim
){ .api }
CoSENT (Cosine Sentence) loss for optimized sentence embeddings.
Use Case: Improved sentence similarity learning
class GISTEmbedLoss(torch.nn.Module):
def __init__(
self,
model: SentenceTransformer,
guide: SentenceTransformer
){ .api }
GIST (Guided In-context Selection of Training-data) embedding loss for knowledge distillation.
Parameters:
model: Student model to trainguide: Teacher model for guidanceUse Case: Knowledge distillation, model compression
class CachedGISTEmbedLoss(torch.nn.Module):
def __init__(
self,
model: SentenceTransformer,
guide: SentenceTransformer,
mini_batch_size: int = 32
){ .api }
Cached version of GIST loss for memory efficiency with large datasets.
Use Case: Memory-efficient knowledge distillation
class DenoisingAutoEncoderLoss(torch.nn.Module):
def __init__(
self,
model: SentenceTransformer,
decoder_name_or_path: str = None,
tie_encoder_decoder: bool = True
){ .api }
Denoising autoencoder loss for self-supervised learning.
Parameters:
model: SentenceTransformer encoderdecoder_name_or_path: Decoder model pathtie_encoder_decoder: Whether to tie encoder and decoder weightsUse Case: Self-supervised pre-training, unsupervised learning
class MegaBatchMarginLoss(torch.nn.Module):
def __init__(
self,
model: SentenceTransformer,
scale: float = 1.0,
similarity_fct: callable = cos_sim
){ .api }
Margin-based loss designed for very large batch training.
Use Case: Large-scale contrastive learning with massive batches
class DistillKLDivLoss(torch.nn.Module):
def __init__(
self,
model: SentenceTransformer,
teacher_model: SentenceTransformer
){ .api }
Knowledge distillation using KL divergence between student and teacher embeddings.
Use Case: Model distillation, compression
class AdaptiveLayerLoss(torch.nn.Module):
def __init__(
self,
model: SentenceTransformer,
loss: torch.nn.Module,
n_layers_per_step: int = 1
){ .api }
Adaptive loss that progressively uses more transformer layers during training.
Use Case: Progressive training, computational efficiency
class CachedMultipleNegativesRankingLoss(torch.nn.Module):
def __init__(
self,
model: SentenceTransformer,
scale: float = 20.0,
similarity_fct: callable = cos_sim,
mini_batch_size: int = 32
){ .api }
Memory-efficient cached version of MultipleNegativesRankingLoss for large datasets.
class CachedMultipleNegativesSymmetricRankingLoss(torch.nn.Module):
def __init__(
self,
model: SentenceTransformer,
scale: float = 20.0,
similarity_fct: callable = cos_sim,
mini_batch_size: int = 32
){ .api }
Cached symmetric version for memory efficiency.
from sentence_transformers import SentenceTransformer
from sentence_transformers.losses import MultipleNegativesRankingLoss
from datasets import Dataset
# Initialize model and loss
model = SentenceTransformer('distilbert-base-uncased')
loss = MultipleNegativesRankingLoss(model, scale=20.0)
# Prepare data (anchor-positive pairs)
train_data = [
{"anchor": "The cat sits on the mat", "positive": "A feline rests on a rug"},
{"anchor": "Python programming language", "positive": "Coding with Python"}
]
train_dataset = Dataset.from_list(train_data)
# Training with contrastive loss
from sentence_transformers import SentenceTransformerTrainer, SentenceTransformerTrainingArguments
args = SentenceTransformerTrainingArguments(
output_dir='./contrastive-training',
per_device_train_batch_size=64, # Larger batches work better
num_train_epochs=3
)
trainer = SentenceTransformerTrainer(
model=model,
args=args,
train_dataset=train_dataset,
loss=loss
)
trainer.train()from sentence_transformers.losses import TripletLoss, TripletDistanceMetric
# Triplet loss with cosine distance
triplet_loss = TripletLoss(
model=model,
distance_metric=TripletDistanceMetric.COSINE,
triplet_margin=0.5
)
# Prepare triplet data
triplet_data = [
{
"anchor": "The cat sits on the mat",
"positive": "A feline rests on a rug",
"negative": "Dogs are great pets"
}
]
triplet_dataset = Dataset.from_list(triplet_data)
trainer = SentenceTransformerTrainer(
model=model,
args=args,
train_dataset=triplet_dataset,
loss=triplet_loss
)
trainer.train()from sentence_transformers.losses import MatryoshkaLoss
# Base loss
base_loss = MultipleNegativesRankingLoss(model)
# Matryoshka loss with multiple dimensions
matryoshka_loss = MatryoshkaLoss(
model=model,
loss=base_loss,
matryoshka_dims=[768, 512, 256, 128, 64],
matryoshka_weights=[1, 1, 1, 1, 1] # Equal weights
)
trainer = SentenceTransformerTrainer(
model=model,
args=args,
train_dataset=train_dataset,
loss=matryoshka_loss
)
trainer.train()
# Test at different dimensions
embeddings_full = model.encode(["Test"], truncate_dim=None)
embeddings_256 = model.encode(["Test"], truncate_dim=256)
embeddings_64 = model.encode(["Test"], truncate_dim=64)from sentence_transformers.losses import CosineSimilarityLoss
import torch.nn as nn
# Cosine similarity loss with different transformations
mse_loss = CosineSimilarityLoss(
model=model,
loss_fct=nn.MSELoss(),
cos_score_transformation=nn.Identity()
)
# For scores in [0, 1] range
sigmoid_loss = CosineSimilarityLoss(
model=model,
loss_fct=nn.MSELoss(),
cos_score_transformation=nn.Sigmoid()
)
# Prepare similarity data
similarity_data = [
{"sentence1": "The cat sits", "sentence2": "A cat is sitting", "label": 0.9},
{"sentence1": "Dogs bark", "sentence2": "Cars are fast", "label": 0.1}
]
similarity_dataset = Dataset.from_list(similarity_data)
trainer = SentenceTransformerTrainer(
model=model,
args=args,
train_dataset=similarity_dataset,
loss=mse_loss
)
trainer.train()from sentence_transformers.losses import DistillKLDivLoss
# Teacher model (larger, pre-trained)
teacher_model = SentenceTransformer('all-mpnet-base-v2')
# Student model (smaller)
student_model = SentenceTransformer('distilbert-base-uncased')
# Distillation loss
distill_loss = DistillKLDivLoss(
model=student_model,
teacher_model=teacher_model
)
trainer = SentenceTransformerTrainer(
model=student_model,
args=args,
train_dataset=train_dataset,
loss=distill_loss
)
trainer.train()from sentence_transformers.losses import SoftmaxLoss
# Combine different losses for multi-task learning
contrastive_loss = MultipleNegativesRankingLoss(model)
classification_loss = SoftmaxLoss(
model=model,
sentence_embedding_dimension=768,
num_labels=3 # For NLI: entailment, contradiction, neutral
)
# Multi-dataset training
datasets = {
"similarity": similarity_dataset,
"classification": nli_dataset
}
losses = {
"similarity": contrastive_loss,
"classification": classification_loss
}
trainer = SentenceTransformerTrainer(
model=model,
args=args,
train_dataset=datasets,
loss=losses
)
trainer.train()from sentence_transformers.losses import BatchHardTripletLoss, BatchHardTripletLossDistanceFunction
# Hard negative mining within batches
batch_hard_loss = BatchHardTripletLoss(
model=model,
distance_function=BatchHardTripletLossDistanceFunction.cosine_distance,
margin=0.2
)
# Use with datasets that have class labels
class_data = [
{"text": "Python programming", "label": 0},
{"text": "Coding in Python", "label": 0},
{"text": "Machine learning", "label": 1},
{"text": "AI algorithms", "label": 1}
]
class_dataset = Dataset.from_list(class_data)
trainer = SentenceTransformerTrainer(
model=model,
args=args,
train_dataset=class_dataset,
loss=batch_hard_loss
)
trainer.train()Install with Tessl CLI
npx tessl i tessl/pypi-sentence-transformers