Repository of pre-trained NLP Transformer models: BERT & RoBERTa, GPT & GPT-2, Transformer-XL, XLNet and XLM
npx @tessl/cli install tessl/pypi-pytorch-transformers@1.2.0A comprehensive Python library providing state-of-the-art pre-trained transformer models for Natural Language Processing (NLP) tasks. PyTorch Transformers includes PyTorch implementations, pre-trained model weights, usage scripts and conversion utilities for major transformer architectures including BERT, GPT/GPT-2, Transformer-XL, XLNet, XLM, RoBERTa, and DistilBERT.
pip install pytorch-transformersimport pytorch_transformersCommon patterns for working with models and tokenizers:
from pytorch_transformers import AutoModel, AutoTokenizer
from pytorch_transformers import BertModel, BertTokenizer
from pytorch_transformers import GPT2Model, GPT2Tokenizerfrom pytorch_transformers import AutoModel, AutoTokenizer
# Load a pre-trained model and tokenizer
model_name = "bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)
# Tokenize input text
text = "Hello, how are you?"
inputs = tokenizer(text, return_tensors="pt")
# Get model outputs
outputs = model(**inputs)
last_hidden_states = outputs.last_hidden_state
# For specific tasks like sequence classification
from pytorch_transformers import AutoModelForSequenceClassification
classifier = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased")The library follows a consistent design pattern across all transformer architectures:
This unified design enables seamless switching between different transformer architectures while maintaining consistent APIs for various NLP tasks including language modeling, sequence classification, question answering, and token classification.
Factory classes that automatically select and instantiate the appropriate model, tokenizer, or configuration based on model name patterns. These provide the most convenient way to work with pre-trained models without needing to know the specific architecture.
class AutoTokenizer:
@classmethod
def from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs): ...
class AutoModel:
@classmethod
def from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs): ...
class AutoConfig:
@classmethod
def from_pretrained(cls, pretrained_model_name_or_path, **kwargs): ...Core abstract base classes that define the common interface shared by all models, tokenizers, and configurations. These classes provide essential methods like from_pretrained() and save_pretrained() that enable consistent model and tokenizer loading/saving across all architectures.
class PreTrainedModel:
@classmethod
def from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs): ...
def save_pretrained(self, save_directory): ...
def resize_token_embeddings(self, new_num_tokens): ...
class PreTrainedTokenizer:
@classmethod
def from_pretrained(cls, pretrained_model_name_or_path, **kwargs): ...
def save_pretrained(self, save_directory): ...
def tokenize(self, text): ...
def encode(self, text): ...
def decode(self, token_ids): ...BERT (Bidirectional Encoder Representations from Transformers) models for various NLP tasks including masked language modeling, next sentence prediction, sequence classification, token classification, and question answering.
class BertModel:
@classmethod
def from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs): ...
class BertForSequenceClassification:
@classmethod
def from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs): ...
class BertTokenizer:
@classmethod
def from_pretrained(cls, pretrained_model_name_or_path, **kwargs): ...GPT-2 (Generative Pre-trained Transformer 2) models for language generation tasks, including standard language modeling and multi-task models with both language modeling and classification heads.
class GPT2Model:
@classmethod
def from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs): ...
class GPT2LMHeadModel:
@classmethod
def from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs): ...
class GPT2Tokenizer:
@classmethod
def from_pretrained(cls, pretrained_model_name_or_path, **kwargs): ...Additional transformer architectures including OpenAI GPT, Transformer-XL, XLNet, XLM, RoBERTa, and DistilBERT, each with their specific model variants and tokenizers optimized for different NLP tasks and languages.
class XLNetModel:
@classmethod
def from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs): ...
class RobertaModel:
@classmethod
def from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs): ...
class DistilBertModel:
@classmethod
def from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs): ...Specialized optimizers and learning rate schedulers designed for transformer training, including AdamW optimizer with weight decay fix and various warmup schedules commonly used in transformer fine-tuning.
class AdamW:
def __init__(self, params, lr=1e-3, betas=(0.9, 0.999), eps=1e-8, weight_decay=0.01, correct_bias=True): ...
def WarmupLinearSchedule(optimizer, warmup_steps, t_total, last_epoch=-1): ...
def WarmupCosineSchedule(optimizer, warmup_steps, t_total, cycles=0.5, last_epoch=-1): ...File handling utilities for downloading, caching, and managing pre-trained model files. These utilities handle automatic download of model weights and configurations from remote repositories with local caching support.
def cached_path(url_or_filename, cache_dir=None): ...
PYTORCH_TRANSFORMERS_CACHE: str
PYTORCH_PRETRAINED_BERT_CACHE: str__version__: str = "1.2.0"
# Model file names
WEIGHTS_NAME: str = "pytorch_model.bin"
CONFIG_NAME: str = "config.json"
TF_WEIGHTS_NAME: str = "model.ckpt"
# Archive maps (model name to URL mappings for pre-trained models)
BERT_PRETRAINED_MODEL_ARCHIVE_MAP: Dict[str, str]
GPT2_PRETRAINED_MODEL_ARCHIVE_MAP: Dict[str, str]
XLNET_PRETRAINED_MODEL_ARCHIVE_MAP: Dict[str, str]
# ... and similar maps for all other architecturesAll tokenizers support standard special tokens:
# Special tokens available on all tokenizers
bos_token: str # Beginning of sequence
eos_token: str # End of sequence
unk_token: str # Unknown token
sep_token: str # Separator token
pad_token: str # Padding token
cls_token: str # Classification token
mask_token: str # Mask token for MLM