tessl/pypi-transformers

State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow

Overview

Eval results

Files

Pipelines

Name: tessl/pypi-transformers
Author: tessl

High-level, task-oriented interface for common machine learning operations. Pipelines abstract away model selection, preprocessing, and postprocessing, providing immediate access to state-of-the-art capabilities across text, vision, audio, and multimodal domains.

Capabilities

Pipeline Factory

Central factory function that creates task-specific pipeline instances with automatic model and tokenizer selection.

def pipeline(
    task: str = None,
    model: str = None,
    config: str = None,
    tokenizer: str = None,
    feature_extractor: str = None,
    image_processor: str = None,
    processor: str = None,
    framework: str = None,
    revision: str = None,
    use_fast: bool = True,
    token: Union[str, bool] = None,
    device: Union[int, str, torch.device] = None,
    device_map: Union[str, Dict] = None,
    dtype: Union[str, torch.dtype] = "auto",
    trust_remote_code: bool = False,
    model_kwargs: Dict[str, Any] = None,
    pipeline_class = None,
    **kwargs
) -> Pipeline:
    """
    Create a pipeline for a specific ML task.
    
    Args:
        task: The task name (e.g., "text-classification", "question-answering")
        model: Model name or path (defaults to task-specific default)
        config: Configuration name or path (auto-detected if None)
        tokenizer: Tokenizer name or path (defaults to model's tokenizer)
        feature_extractor: Feature extractor for audio/vision tasks
        image_processor: Image processor for vision tasks
        processor: Processor combining tokenizer and feature extraction
        framework: "pt" (PyTorch), "tf" (TensorFlow), or auto-detect
        revision: Model revision/branch to use
        use_fast: Use fast tokenizer implementation when available
        token: Hugging Face authentication token
        device: Device for inference (int, str, or torch.device)
        device_map: Advanced device mapping for multi-GPU
        dtype: Data type for model weights ("auto", torch.float16, etc.)
        trust_remote_code: Allow custom code from model repos
        model_kwargs: Additional arguments for model initialization
        pipeline_class: Custom pipeline class to use
    
    Returns:
        Task-specific pipeline instance
    """

Text Classification

Classify text into predefined categories with confidence scores and label mapping.

class TextClassificationPipeline(Pipeline):
    def __call__(
        self,
        inputs: Union[str, List[str]],
        top_k: int = None,
        function_to_apply: str = "default"
    ) -> Union[Dict, List[Dict]]:
        """
        Classify input text(s).
        
        Args:
            inputs: Text string or list of strings to classify
            top_k: Return top-k predictions (None for all)
            function_to_apply: "softmax", "sigmoid", or "none"
        
        Returns:
            Dictionary with 'label' and 'score' keys, or list of such dicts
        """

Usage example:

classifier = pipeline("text-classification")
result = classifier("I love this movie!")
# Output: {'label': 'POSITIVE', 'score': 0.9998}

# Multi-class with top-k
classifier = pipeline("text-classification", model="cardiffnlp/twitter-roberta-base-emotion")
results = classifier("I'm so excited about this!", top_k=3)
# Output: [{'label': 'joy', 'score': 0.8}, {'label': 'optimism', 'score': 0.15}, ...]

Token Classification

Identify and classify individual tokens in text for named entity recognition, part-of-speech tagging, and other token-level tasks.

class TokenClassificationPipeline(Pipeline):
    def __call__(
        self,
        inputs: Union[str, List[str]],
        aggregation_strategy: str = "simple",
        ignore_labels: List[str] = None
    ) -> Union[List[Dict], List[List[Dict]]]:
        """
        Classify tokens in input text(s).
        
        Args:
            inputs: Text string or list of strings
            aggregation_strategy: "simple", "first", "average", "max", or "none"
            ignore_labels: Labels to filter out from results
        
        Returns:
            List of entity dictionaries with 'entity', 'score', 'index', 'word', 'start', 'end'
        """

Usage example:

ner = pipeline("ner", aggregation_strategy="simple")
entities = ner("Apple Inc. was founded by Steve Jobs in Cupertino.")
# Output: [
#     {'entity': 'ORG', 'score': 0.999, 'word': 'Apple Inc.', 'start': 0, 'end': 10},
#     {'entity': 'PER', 'score': 0.998, 'word': 'Steve Jobs', 'start': 28, 'end': 38},
#     {'entity': 'LOC', 'score': 0.992, 'word': 'Cupertino', 'start': 42, 'end': 51}
# ]

Question Answering

Extract answers from context text given a question, with confidence scores and answer span positions.

class QuestionAnsweringPipeline(Pipeline):
    def __call__(
        self,
        question: str,
        context: str,
        top_k: int = 1,
        doc_stride: int = 128,
        max_answer_len: int = 15,
        max_seq_len: int = 384,
        max_question_len: int = 64,
        handle_impossible_answer: bool = False
    ) -> Union[Dict, List[Dict]]:
        """
        Extract answers from context given a question.
        
        Args:
            question: Question to answer
            context: Context text containing the answer
            top_k: Number of answers to return
            doc_stride: Overlap between context chunks
            max_answer_len: Maximum answer length in tokens
            max_seq_len: Maximum sequence length
            max_question_len: Maximum question length
            handle_impossible_answer: Allow "impossible to answer" responses
        
        Returns:
            Dictionary with 'answer', 'score', 'start', 'end' keys
        """

Usage example:

qa = pipeline("question-answering")
result = qa(
    question="Where was Apple founded?",
    context="Apple Inc. was founded by Steve Jobs in Cupertino, California."
)
# Output: {'answer': 'Cupertino, California', 'score': 0.95, 'start': 42, 'end': 63}

Text Generation

Generate text continuations using autoregressive language models with extensive control over generation parameters.

class TextGenerationPipeline(Pipeline):
    def __call__(
        self,
        text_inputs: Union[str, List[str]],
        return_full_text: bool = True,
        clean_up_tokenization_spaces: bool = False,
        **generate_kwargs
    ) -> Union[List[Dict], List[List[Dict]]]:
        """
        Generate text continuations.
        
        Args:
            text_inputs: Input text(s) to continue
            return_full_text: Include input in output
            clean_up_tokenization_spaces: Clean tokenization artifacts
            **generate_kwargs: Additional generation parameters (max_length, temperature, etc.)
        
        Returns:
            List of dictionaries with 'generated_text' key
        """

Usage example:

generator = pipeline("text-generation", model="gpt2")
outputs = generator(
    "The future of artificial intelligence is",
    max_length=50,
    num_return_sequences=2,
    temperature=0.8
)
# Output: [
#     {'generated_text': 'The future of artificial intelligence is bright and full of possibilities...'},
#     {'generated_text': 'The future of artificial intelligence is uncertain but promising...'}
# ]

Text Summarization

Generate concise summaries of longer texts using sequence-to-sequence models.

class SummarizationPipeline(Pipeline):
    def __call__(
        self,
        documents: Union[str, List[str]],
        return_text: bool = True,
        return_tensors: bool = False,
        clean_up_tokenization_spaces: bool = False,
        **generate_kwargs
    ) -> Union[List[Dict], List[List[Dict]]]:
        """
        Summarize input documents.
        
        Args:
            documents: Text(s) to summarize
            return_text: Return text summaries
            return_tensors: Return tensor outputs
            clean_up_tokenization_spaces: Clean tokenization artifacts
            **generate_kwargs: Generation parameters (max_length, min_length, etc.)
        
        Returns:
            List of dictionaries with 'summary_text' key
        """

Translation

Translate text between languages using sequence-to-sequence models.

class TranslationPipeline(Pipeline):
    def __call__(
        self,
        text: Union[str, List[str]],
        return_text: bool = True,
        clean_up_tokenization_spaces: bool = False,
        **generate_kwargs
    ) -> Union[List[Dict], List[List[Dict]]]:
        """
        Translate input text.
        
        Args:
            text: Text(s) to translate
            return_text: Return translated text
            clean_up_tokenization_spaces: Clean tokenization artifacts
            **generate_kwargs: Generation parameters
        
        Returns:
            List of dictionaries with 'translation_text' key
        """

Image Classification

Classify images into predefined categories with confidence scores.

class ImageClassificationPipeline(Pipeline):
    def __call__(
        self,
        images: Union[str, "PIL.Image", List],
        top_k: int = 5
    ) -> Union[List[Dict], List[List[Dict]]]:
        """
        Classify input image(s).
        
        Args:
            images: Image path, PIL Image, or list of images
            top_k: Number of top predictions to return
        
        Returns:
            List of dictionaries with 'label' and 'score' keys
        """

Object Detection

Detect and locate objects in images with bounding boxes and confidence scores.

class ObjectDetectionPipeline(Pipeline):
    def __call__(
        self,
        images: Union[str, "PIL.Image", List],
        threshold: float = 0.9
    ) -> Union[List[Dict], List[List[Dict]]]:
        """
        Detect objects in image(s).
        
        Args:
            images: Image path, PIL Image, or list of images
            threshold: Confidence threshold for detections
        
        Returns:
            List of dictionaries with 'score', 'label', 'box' keys
        """

Automatic Speech Recognition

Convert speech audio to text with support for various audio formats and languages.

class AutomaticSpeechRecognitionPipeline(Pipeline):
    def __call__(
        self,
        inputs: Union[np.ndarray, bytes, str],
        return_timestamps: Union[bool, str] = False,
        generate_kwargs: Dict = None
    ) -> Union[Dict, List[Dict]]:
        """
        Transcribe speech to text.
        
        Args:
            inputs: Audio array, bytes, or file path
            return_timestamps: Include word-level timestamps
            generate_kwargs: Additional generation parameters
        
        Returns:
            Dictionary with 'text' key and optional timestamps
        """

Zero-Shot Classification

Classify text into arbitrary categories without task-specific training.

class ZeroShotClassificationPipeline(Pipeline):
    def __call__(
        self,
        sequences: Union[str, List[str]],
        candidate_labels: List[str],
        hypothesis_template: str = "This example is {}.",
        multi_label: bool = False
    ) -> Union[Dict, List[Dict]]:
        """
        Classify text into arbitrary categories.
        
        Args:
            sequences: Text(s) to classify
            candidate_labels: Possible classification labels
            hypothesis_template: Template for label hypotheses
            multi_label: Allow multiple labels per input
        
        Returns:
            Dictionary with 'sequence', 'labels', 'scores' keys
        """

Usage example:

classifier = pipeline("zero-shot-classification")
result = classifier(
    "This is a movie review about a great film.",
    candidate_labels=["movie", "sports", "technology", "politics"]
)
# Output: {
#     'sequence': 'This is a movie review about a great film.',
#     'labels': ['movie', 'technology', 'politics', 'sports'],
#     'scores': [0.85, 0.08, 0.04, 0.03]
# }

Fill Mask

Predict masked tokens in text using masked language models.

class FillMaskPipeline(Pipeline):
    def __call__(
        self,
        inputs: Union[str, List[str]],
        top_k: int = 5
    ) -> Union[List[Dict], List[List[Dict]]]:
        """
        Fill masked tokens in text.
        
        Args:
            inputs: Text with [MASK] tokens or list of such texts
            top_k: Number of predictions per mask
        
        Returns:
            List of dictionaries with 'score', 'token', 'token_str', 'sequence' keys
        """

Image Text To Text

Generate text descriptions from images with optional text prompts, supporting multimodal understanding tasks.

class ImageTextToTextPipeline(Pipeline):
    def __call__(
        self,
        images,
        prompt: str = None,
        **kwargs
    ) -> Union[str, List[str]]:
        """
        Generate text from images with optional prompts.
        
        Args:
            images: Single image or list of images (PIL, numpy array, or paths)
            prompt: Optional text prompt to guide generation
            
        Returns:
            Generated text string or list of strings
        """

Video Classification

Classify video content into predefined categories with temporal understanding.

class VideoClassificationPipeline(Pipeline):
    def __call__(
        self,
        videos,
        top_k: int = 5
    ) -> Union[List[Dict], List[List[Dict]]]:
        """
        Classify video content.
        
        Args:
            videos: Video file path(s) or video tensor(s)
            top_k: Number of top predictions to return
            
        Returns:
            List of classification results with 'label' and 'score'
        """

Depth Estimation

Estimate depth information from single images for 3D scene understanding.

class DepthEstimationPipeline(Pipeline):
    def __call__(
        self,
        images
    ) -> Union[Dict, List[Dict]]:
        """
        Estimate depth from images.
        
        Args:
            images: Single image or list of images
            
        Returns:
            Dictionary with 'predicted_depth' and 'depth' keys
        """

Conversational

Engage in multi-turn conversations with context-aware response generation.

class ConversationalPipeline(Pipeline):
    def __call__(
        self,
        conversations,
        clean_up_tokenization_spaces: bool = False,
        **generate_kwargs
    ) -> Union[Conversation, List[Conversation]]:
        """
        Generate conversational responses.
        
        Args:
            conversations: Conversation object(s) with history
            clean_up_tokenization_spaces: Remove extra spaces in output
            **generate_kwargs: Additional generation parameters
            
        Returns:
            Updated Conversation object(s) with new responses
        """

Pipeline Base Class

All pipelines inherit from the base Pipeline class:

class Pipeline:
    def __init__(
        self,
        model: PreTrainedModel,
        tokenizer: PreTrainedTokenizer = None,
        feature_extractor = None,
        modelcard: ModelCard = None,
        framework: str = None,
        task: str = "",
        args_parser = None,
        device: int = -1,
        torch_dtype = None,
        binary_output: bool = False
    )
    
    def save_pretrained(
        self,
        save_directory: str,
        safe_serialization: bool = True,
        **kwargs
    ) -> None:
        """Save pipeline components to directory."""
    
    def __call__(self, inputs, **kwargs):
        """Process inputs through the pipeline."""
    
    def predict(self, inputs, **kwargs):
        """Alias for __call__."""
    
    def transform(self, inputs, **kwargs):
        """Alias for __call__."""

Available Pipeline Tasks

Complete list of supported pipeline tasks:

Text: "text-classification", "token-classification", "question-answering", "fill-mask", "summarization", "translation", "text2text-generation", "text-generation", "zero-shot-classification", "conversational"
Vision: "image-classification", "image-segmentation", "image-to-text", "image-to-image", "object-detection", "depth-estimation", "zero-shot-image-classification", "zero-shot-object-detection", "keypoint-matching", "mask-generation"
Audio: "automatic-speech-recognition", "audio-classification", "text-to-audio", "zero-shot-audio-classification"
Video: "video-classification"
Multimodal: "visual-question-answering", "document-question-answering", "image-text-to-text", "feature-extraction"

Each task automatically selects appropriate default models when no specific model is provided.

Install with Tessl CLI