Repository of pre-trained NLP Transformer models: BERT & RoBERTa, GPT & GPT-2, Transformer-XL, XLNet and XLM
npx @tessl/cli install tessl/pypi-pytorch-transformers@1.2.00
# PyTorch Transformers
1
2
A comprehensive Python library providing state-of-the-art pre-trained transformer models for Natural Language Processing (NLP) tasks. PyTorch Transformers includes PyTorch implementations, pre-trained model weights, usage scripts and conversion utilities for major transformer architectures including BERT, GPT/GPT-2, Transformer-XL, XLNet, XLM, RoBERTa, and DistilBERT.
3
4
## Package Information
5
6
- **Package Name**: pytorch-transformers
7
- **Package Type**: Library
8
- **Language**: Python
9
- **Installation**: `pip install pytorch-transformers`
10
11
## Core Imports
12
13
```python
14
import pytorch_transformers
15
```
16
17
Common patterns for working with models and tokenizers:
18
19
```python
20
from pytorch_transformers import AutoModel, AutoTokenizer
21
from pytorch_transformers import BertModel, BertTokenizer
22
from pytorch_transformers import GPT2Model, GPT2Tokenizer
23
```
24
25
## Basic Usage
26
27
```python
28
from pytorch_transformers import AutoModel, AutoTokenizer
29
30
# Load a pre-trained model and tokenizer
31
model_name = "bert-base-uncased"
32
tokenizer = AutoTokenizer.from_pretrained(model_name)
33
model = AutoModel.from_pretrained(model_name)
34
35
# Tokenize input text
36
text = "Hello, how are you?"
37
inputs = tokenizer(text, return_tensors="pt")
38
39
# Get model outputs
40
outputs = model(**inputs)
41
last_hidden_states = outputs.last_hidden_state
42
43
# For specific tasks like sequence classification
44
from pytorch_transformers import AutoModelForSequenceClassification
45
classifier = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased")
46
```
47
48
## Architecture
49
50
The library follows a consistent design pattern across all transformer architectures:
51
52
- **Auto Classes**: Factory classes that automatically select the appropriate model/tokenizer based on model name
53
- **Base Classes**: Abstract base classes (PreTrainedModel, PreTrainedTokenizer, PretrainedConfig) providing common interfaces
54
- **Model-Specific Classes**: Dedicated implementations for each transformer architecture with specialized task-specific variants
55
- **Configuration Classes**: Parameter containers for model initialization and customization
56
- **Tokenizers**: Architecture-specific text preprocessing with consistent encode/decode interfaces
57
58
This unified design enables seamless switching between different transformer architectures while maintaining consistent APIs for various NLP tasks including language modeling, sequence classification, question answering, and token classification.
59
60
## Capabilities
61
62
### Auto Classes
63
64
Factory classes that automatically select and instantiate the appropriate model, tokenizer, or configuration based on model name patterns. These provide the most convenient way to work with pre-trained models without needing to know the specific architecture.
65
66
```python { .api }
67
class AutoTokenizer:
68
@classmethod
69
def from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs): ...
70
71
class AutoModel:
72
@classmethod
73
def from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs): ...
74
75
class AutoConfig:
76
@classmethod
77
def from_pretrained(cls, pretrained_model_name_or_path, **kwargs): ...
78
```
79
80
[Auto Classes](./auto-classes.md)
81
82
### Base Classes
83
84
Core abstract base classes that define the common interface shared by all models, tokenizers, and configurations. These classes provide essential methods like `from_pretrained()` and `save_pretrained()` that enable consistent model and tokenizer loading/saving across all architectures.
85
86
```python { .api }
87
class PreTrainedModel:
88
@classmethod
89
def from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs): ...
90
91
def save_pretrained(self, save_directory): ...
92
def resize_token_embeddings(self, new_num_tokens): ...
93
94
class PreTrainedTokenizer:
95
@classmethod
96
def from_pretrained(cls, pretrained_model_name_or_path, **kwargs): ...
97
98
def save_pretrained(self, save_directory): ...
99
def tokenize(self, text): ...
100
def encode(self, text): ...
101
def decode(self, token_ids): ...
102
```
103
104
[Base Classes](./base-classes.md)
105
106
### BERT Models
107
108
BERT (Bidirectional Encoder Representations from Transformers) models for various NLP tasks including masked language modeling, next sentence prediction, sequence classification, token classification, and question answering.
109
110
```python { .api }
111
class BertModel:
112
@classmethod
113
def from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs): ...
114
115
class BertForSequenceClassification:
116
@classmethod
117
def from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs): ...
118
119
class BertTokenizer:
120
@classmethod
121
def from_pretrained(cls, pretrained_model_name_or_path, **kwargs): ...
122
```
123
124
[BERT Models](./bert-models.md)
125
126
### GPT-2 Models
127
128
GPT-2 (Generative Pre-trained Transformer 2) models for language generation tasks, including standard language modeling and multi-task models with both language modeling and classification heads.
129
130
```python { .api }
131
class GPT2Model:
132
@classmethod
133
def from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs): ...
134
135
class GPT2LMHeadModel:
136
@classmethod
137
def from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs): ...
138
139
class GPT2Tokenizer:
140
@classmethod
141
def from_pretrained(cls, pretrained_model_name_or_path, **kwargs): ...
142
```
143
144
[GPT-2 Models](./gpt2-models.md)
145
146
### Other Transformer Models
147
148
Additional transformer architectures including OpenAI GPT, Transformer-XL, XLNet, XLM, RoBERTa, and DistilBERT, each with their specific model variants and tokenizers optimized for different NLP tasks and languages.
149
150
```python { .api }
151
class XLNetModel:
152
@classmethod
153
def from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs): ...
154
155
class RobertaModel:
156
@classmethod
157
def from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs): ...
158
159
class DistilBertModel:
160
@classmethod
161
def from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs): ...
162
```
163
164
[Other Models](./other-models.md)
165
166
### Optimization
167
168
Specialized optimizers and learning rate schedulers designed for transformer training, including AdamW optimizer with weight decay fix and various warmup schedules commonly used in transformer fine-tuning.
169
170
```python { .api }
171
class AdamW:
172
def __init__(self, params, lr=1e-3, betas=(0.9, 0.999), eps=1e-8, weight_decay=0.01, correct_bias=True): ...
173
174
def WarmupLinearSchedule(optimizer, warmup_steps, t_total, last_epoch=-1): ...
175
def WarmupCosineSchedule(optimizer, warmup_steps, t_total, cycles=0.5, last_epoch=-1): ...
176
```
177
178
[Optimization](./optimization.md)
179
180
### File Utilities
181
182
File handling utilities for downloading, caching, and managing pre-trained model files. These utilities handle automatic download of model weights and configurations from remote repositories with local caching support.
183
184
```python { .api }
185
def cached_path(url_or_filename, cache_dir=None): ...
186
187
PYTORCH_TRANSFORMERS_CACHE: str
188
PYTORCH_PRETRAINED_BERT_CACHE: str
189
```
190
191
[File Utilities](./file-utilities.md)
192
193
## Constants
194
195
```python { .api }
196
__version__: str = "1.2.0"
197
198
# Model file names
199
WEIGHTS_NAME: str = "pytorch_model.bin"
200
CONFIG_NAME: str = "config.json"
201
TF_WEIGHTS_NAME: str = "model.ckpt"
202
203
# Archive maps (model name to URL mappings for pre-trained models)
204
BERT_PRETRAINED_MODEL_ARCHIVE_MAP: Dict[str, str]
205
GPT2_PRETRAINED_MODEL_ARCHIVE_MAP: Dict[str, str]
206
XLNET_PRETRAINED_MODEL_ARCHIVE_MAP: Dict[str, str]
207
# ... and similar maps for all other architectures
208
```
209
210
## Special Token Properties
211
212
All tokenizers support standard special tokens:
213
214
```python { .api }
215
# Special tokens available on all tokenizers
216
bos_token: str # Beginning of sequence
217
eos_token: str # End of sequence
218
unk_token: str # Unknown token
219
sep_token: str # Separator token
220
pad_token: str # Padding token
221
cls_token: str # Classification token
222
mask_token: str # Mask token for MLM
223
```