CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/pypi-sentence-transformers

Embeddings, Retrieval, and Reranking framework for computing dense, sparse, and cross-encoder embeddings using state-of-the-art transformer models

Pending
Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Pending

The risk profile of this skill

Overview
Eval results
Files

sentence-transformers

The sentence-transformers package provides state-of-the-art sentence, text and image embeddings using transformer models. It supports training and fine-tuning of custom embedding models, offering a comprehensive toolkit for semantic search, clustering, and similarity tasks.

Package Information

  • Version: 5.1.0
  • Organization: sentence-transformers
  • License: Apache 2.0
  • Homepage: https://www.sbert.net/
  • Repository: https://github.com/UKPLab/sentence-transformers

Core Imports

# Main transformer classes
from sentence_transformers import SentenceTransformer, CrossEncoder, SparseEncoder

# Training components (top-level imports)
from sentence_transformers import (
    SentenceTransformerTrainer,
    SentenceTransformerTrainingArguments,
    CrossEncoderTrainer,
    CrossEncoderTrainingArguments,  
    SparseEncoderTrainer,
    SparseEncoderTrainingArguments
)

# Additional components (top-level imports from __all__)
from sentence_transformers import (
    LoggingHandler,
    SentencesDataset,
    ParallelSentencesDataset,
    InputExample,
    DefaultBatchSampler,
    MultiDatasetDefaultBatchSampler
)

# Utility functions (top-level imports)
from sentence_transformers import (
    SimilarityFunction,
    quantize_embeddings,
    export_optimized_onnx_model,
    export_dynamic_quantized_onnx_model,
    export_static_quantized_openvino_model
)
from sentence_transformers.util import mine_hard_negatives

# Loss functions (module-level imports)
from sentence_transformers.losses import (
    CosineSimilarityLoss,
    MultipleNegativesRankingLoss,
    TripletLoss,
    MatryoshkaLoss
)

# Model components (module-level imports)
from sentence_transformers.models import Transformer, Pooling, Dense, Normalize

# Evaluation (module-level imports)
from sentence_transformers.evaluation import (
    EmbeddingSimilarityEvaluator,
    InformationRetrievalEvaluator,
    BinaryClassificationEvaluator
)

Basic Usage

Encoding Sentences

from sentence_transformers import SentenceTransformer

# Load a pre-trained model
model = SentenceTransformer('all-MiniLM-L6-v2')

# Encode sentences
sentences = ['This is an example sentence', 'Each sentence is converted']
embeddings = model.encode(sentences)

# Calculate similarity
similarity = model.similarity(embeddings[0], embeddings[1])
print(f"Similarity: {similarity}")

Cross-Encoder for Reranking

from sentence_transformers import CrossEncoder

# Load cross-encoder model
cross_encoder = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')

# Score sentence pairs
pairs = [('How many people live in Berlin?', 'Berlin has a population of 3,520,031')]
scores = cross_encoder.predict(pairs)
print(f"Relevance score: {scores[0]}")

Architecture

The sentence-transformers package is built around several core concepts:

Transformer Types

  • Bi-Encoder (SentenceTransformer): Encodes sentences independently into dense vectors
  • Cross-Encoder: Jointly processes sentence pairs for classification/ranking tasks
  • Sparse Encoder: Produces sparse embeddings for efficient retrieval

Model Components

Models are composed of modular components that can be stacked:

  • Transformer: Core language model (BERT, RoBERTa, etc.)
  • Pooling: Strategy for converting token embeddings to sentence embeddings
  • Dense: Linear transformation layers
  • Normalize: L2 normalization of embeddings

Training Framework

Modern training uses the SentenceTransformerTrainer class with:

  • Multiple loss functions for different tasks
  • Multi-dataset training support
  • Integration with HuggingFace Trainer
  • Flexible batch sampling strategies

Capabilities

Core Transformers

Encode text, documents, and queries into dense vector representations using pre-trained or custom models. Supports batch processing, multi-GPU inference, and various similarity functions.

Key APIs: SentenceTransformer.encode() { .api }

Learn more about Core Transformers →

Cross-Encoders

Joint encoding of sentence pairs for tasks requiring direct comparison like reranking, textual entailment, and semantic textual similarity. Typically more accurate than bi-encoders for pairwise tasks.

Key APIs: CrossEncoder.predict() { .api }

Learn more about Cross-Encoders →

Sparse Encoders

Generate sparse embeddings that combine the efficiency of traditional sparse retrieval with neural approaches. Ideal for large-scale retrieval systems where storage and computation efficiency are critical.

Key APIs: SparseEncoder.encode() { .api }

Learn more about Sparse Encoders →

Training Framework

Comprehensive training system supporting supervised fine-tuning, contrastive learning, and multi-task training. Built on HuggingFace Trainer with specialized components for embedding models.

Key APIs: SentenceTransformerTrainer.train() { .api }

Learn more about Training →

Loss Functions

Extensive collection of loss functions for different learning objectives including contrastive learning, triplet loss, multiple negatives ranking, and specialized losses for efficient training.

Key APIs: MultipleNegativesRankingLoss() { .api }

Learn more about Loss Functions →

Evaluation Suite

Comprehensive evaluation framework for measuring model performance across various tasks including semantic similarity, information retrieval, classification, and clustering.

Key APIs: EmbeddingSimilarityEvaluator() { .api }

Learn more about Evaluation →

Utilities & Export

Tools for model optimization, quantization, export to different formats (ONNX, OpenVINO), similarity computation, and hard negative mining for improved training.

Key APIs: quantize_embeddings() { .api }

Learn more about Utilities →

Model Hub Integration

The package integrates seamlessly with the HuggingFace Model Hub, allowing you to:

  • Load thousands of pre-trained models
  • Save and share custom models
  • Automatic model card generation
  • Version control and collaboration

Performance Considerations

  • Use batch processing with batch_size parameter for efficiency
  • Enable show_progress_bar=False for production use
  • Consider model quantization for deployment
  • Use multi-process encoding for large datasets
  • Choose appropriate pooling strategies based on your task
Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/sentence-transformers@5.1.x
Publish Source
CLI
Badge
tessl/pypi-sentence-transformers badge