or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

auto-classes.mdbase-classes.mdbert-models.mdfile-utilities.mdgpt2-models.mdindex.mdoptimization.mdother-models.md
tile.json

tessl/pypi-pytorch-transformers

Repository of pre-trained NLP Transformer models: BERT & RoBERTa, GPT & GPT-2, Transformer-XL, XLNet and XLM

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/pytorch-transformers@1.2.x

To install, run

npx @tessl/cli install tessl/pypi-pytorch-transformers@1.2.0

index.mddocs/

PyTorch Transformers

A comprehensive Python library providing state-of-the-art pre-trained transformer models for Natural Language Processing (NLP) tasks. PyTorch Transformers includes PyTorch implementations, pre-trained model weights, usage scripts and conversion utilities for major transformer architectures including BERT, GPT/GPT-2, Transformer-XL, XLNet, XLM, RoBERTa, and DistilBERT.

Package Information

  • Package Name: pytorch-transformers
  • Package Type: Library
  • Language: Python
  • Installation: pip install pytorch-transformers

Core Imports

import pytorch_transformers

Common patterns for working with models and tokenizers:

from pytorch_transformers import AutoModel, AutoTokenizer
from pytorch_transformers import BertModel, BertTokenizer
from pytorch_transformers import GPT2Model, GPT2Tokenizer

Basic Usage

from pytorch_transformers import AutoModel, AutoTokenizer

# Load a pre-trained model and tokenizer
model_name = "bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)

# Tokenize input text
text = "Hello, how are you?"
inputs = tokenizer(text, return_tensors="pt")

# Get model outputs
outputs = model(**inputs)
last_hidden_states = outputs.last_hidden_state

# For specific tasks like sequence classification
from pytorch_transformers import AutoModelForSequenceClassification
classifier = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased")

Architecture

The library follows a consistent design pattern across all transformer architectures:

  • Auto Classes: Factory classes that automatically select the appropriate model/tokenizer based on model name
  • Base Classes: Abstract base classes (PreTrainedModel, PreTrainedTokenizer, PretrainedConfig) providing common interfaces
  • Model-Specific Classes: Dedicated implementations for each transformer architecture with specialized task-specific variants
  • Configuration Classes: Parameter containers for model initialization and customization
  • Tokenizers: Architecture-specific text preprocessing with consistent encode/decode interfaces

This unified design enables seamless switching between different transformer architectures while maintaining consistent APIs for various NLP tasks including language modeling, sequence classification, question answering, and token classification.

Capabilities

Auto Classes

Factory classes that automatically select and instantiate the appropriate model, tokenizer, or configuration based on model name patterns. These provide the most convenient way to work with pre-trained models without needing to know the specific architecture.

class AutoTokenizer:
    @classmethod
    def from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs): ...

class AutoModel:
    @classmethod
    def from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs): ...

class AutoConfig:
    @classmethod
    def from_pretrained(cls, pretrained_model_name_or_path, **kwargs): ...

Auto Classes

Base Classes

Core abstract base classes that define the common interface shared by all models, tokenizers, and configurations. These classes provide essential methods like from_pretrained() and save_pretrained() that enable consistent model and tokenizer loading/saving across all architectures.

class PreTrainedModel:
    @classmethod
    def from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs): ...
    
    def save_pretrained(self, save_directory): ...
    def resize_token_embeddings(self, new_num_tokens): ...

class PreTrainedTokenizer:
    @classmethod
    def from_pretrained(cls, pretrained_model_name_or_path, **kwargs): ...
    
    def save_pretrained(self, save_directory): ...
    def tokenize(self, text): ...
    def encode(self, text): ...
    def decode(self, token_ids): ...

Base Classes

BERT Models

BERT (Bidirectional Encoder Representations from Transformers) models for various NLP tasks including masked language modeling, next sentence prediction, sequence classification, token classification, and question answering.

class BertModel:
    @classmethod
    def from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs): ...

class BertForSequenceClassification:
    @classmethod  
    def from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs): ...

class BertTokenizer:
    @classmethod
    def from_pretrained(cls, pretrained_model_name_or_path, **kwargs): ...

BERT Models

GPT-2 Models

GPT-2 (Generative Pre-trained Transformer 2) models for language generation tasks, including standard language modeling and multi-task models with both language modeling and classification heads.

class GPT2Model:
    @classmethod
    def from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs): ...

class GPT2LMHeadModel:
    @classmethod
    def from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs): ...

class GPT2Tokenizer:
    @classmethod
    def from_pretrained(cls, pretrained_model_name_or_path, **kwargs): ...

GPT-2 Models

Other Transformer Models

Additional transformer architectures including OpenAI GPT, Transformer-XL, XLNet, XLM, RoBERTa, and DistilBERT, each with their specific model variants and tokenizers optimized for different NLP tasks and languages.

class XLNetModel:
    @classmethod
    def from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs): ...

class RobertaModel:
    @classmethod
    def from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs): ...

class DistilBertModel:
    @classmethod
    def from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs): ...

Other Models

Optimization

Specialized optimizers and learning rate schedulers designed for transformer training, including AdamW optimizer with weight decay fix and various warmup schedules commonly used in transformer fine-tuning.

class AdamW:
    def __init__(self, params, lr=1e-3, betas=(0.9, 0.999), eps=1e-8, weight_decay=0.01, correct_bias=True): ...

def WarmupLinearSchedule(optimizer, warmup_steps, t_total, last_epoch=-1): ...
def WarmupCosineSchedule(optimizer, warmup_steps, t_total, cycles=0.5, last_epoch=-1): ...

Optimization

File Utilities

File handling utilities for downloading, caching, and managing pre-trained model files. These utilities handle automatic download of model weights and configurations from remote repositories with local caching support.

def cached_path(url_or_filename, cache_dir=None): ...

PYTORCH_TRANSFORMERS_CACHE: str
PYTORCH_PRETRAINED_BERT_CACHE: str

File Utilities

Constants

__version__: str = "1.2.0"

# Model file names
WEIGHTS_NAME: str = "pytorch_model.bin"
CONFIG_NAME: str = "config.json"  
TF_WEIGHTS_NAME: str = "model.ckpt"

# Archive maps (model name to URL mappings for pre-trained models)
BERT_PRETRAINED_MODEL_ARCHIVE_MAP: Dict[str, str]
GPT2_PRETRAINED_MODEL_ARCHIVE_MAP: Dict[str, str]
XLNET_PRETRAINED_MODEL_ARCHIVE_MAP: Dict[str, str]
# ... and similar maps for all other architectures

Special Token Properties

All tokenizers support standard special tokens:

# Special tokens available on all tokenizers
bos_token: str  # Beginning of sequence
eos_token: str  # End of sequence  
unk_token: str  # Unknown token
sep_token: str  # Separator token
pad_token: str  # Padding token
cls_token: str  # Classification token
mask_token: str # Mask token for MLM