0
# PyTorch Pretrained BERT
1
2
PyTorch implementations of transformer-based language models including Google's BERT, OpenAI's GPT and GPT-2, and Google/CMU's Transformer-XL. This library provides pre-trained models, fine-tuning examples, tokenizers, and model architectures that match the performance of their original TensorFlow implementations, designed for researchers and practitioners working with state-of-the-art language models.
3
4
## Package Information
5
6
- **Package Name**: pytorch_pretrained_bert
7
- **Language**: Python
8
- **Installation**: `pip install pytorch_pretrained_bert`
9
- **Version**: 0.6.2
10
11
## Core Imports
12
13
```python
14
import pytorch_pretrained_bert
15
```
16
17
Common imports for specific functionality:
18
19
```python
20
# BERT models and tokenizer
21
from pytorch_pretrained_bert import (
22
BertTokenizer, BertModel, BertForSequenceClassification,
23
BertConfig, BertAdam
24
)
25
26
# OpenAI GPT models
27
from pytorch_pretrained_bert import (
28
OpenAIGPTTokenizer, OpenAIGPTLMHeadModel, OpenAIGPTConfig
29
)
30
31
# GPT-2 models
32
from pytorch_pretrained_bert import (
33
GPT2Tokenizer, GPT2LMHeadModel, GPT2Config
34
)
35
36
# Transformer-XL models
37
from pytorch_pretrained_bert import (
38
TransfoXLTokenizer, TransfoXLLMHeadModel, TransfoXLConfig
39
)
40
41
# Utilities
42
from pytorch_pretrained_bert import cached_path, WEIGHTS_NAME, CONFIG_NAME
43
```
44
45
## Basic Usage
46
47
### BERT for Sequence Classification
48
49
```python
50
import torch
51
from pytorch_pretrained_bert import BertTokenizer, BertForSequenceClassification, BertConfig
52
53
# Load pre-trained model and tokenizer
54
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
55
model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)
56
57
# Tokenize input text
58
text = "Hello, my dog is cute"
59
tokens = tokenizer.tokenize(text)
60
input_ids = tokenizer.convert_tokens_to_ids(tokens)
61
input_ids = torch.tensor([input_ids])
62
63
# Forward pass
64
with torch.no_grad():
65
outputs = model(input_ids)
66
predictions = torch.nn.functional.softmax(outputs[0], dim=-1)
67
68
print(f"Predictions: {predictions}")
69
```
70
71
### GPT-2 Text Generation
72
73
```python
74
from pytorch_pretrained_bert import GPT2Tokenizer, GPT2LMHeadModel
75
76
# Load pre-trained GPT-2
77
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
78
model = GPT2LMHeadModel.from_pretrained('gpt2')
79
80
# Prepare input
81
input_text = "The future of artificial intelligence"
82
input_ids = tokenizer.encode(input_text)
83
input_ids = torch.tensor([input_ids])
84
85
# Forward pass to get next token predictions
86
with torch.no_grad():
87
outputs = model(input_ids)
88
predictions = outputs[0] # Language modeling logits
89
90
# Get next token probabilities
91
next_token_logits = predictions[0, -1, :]
92
next_token_probs = torch.softmax(next_token_logits, dim=-1)
93
94
# Sample next token
95
next_token_id = torch.multinomial(next_token_probs, 1).item()
96
next_token = tokenizer.decode([next_token_id])
97
98
print(f"Input: {input_text}")
99
print(f"Next token: {next_token}")
100
```
101
102
## Architecture
103
104
The library is organized around four main transformer architectures:
105
106
- **BERT**: Bidirectional encoder for understanding tasks (classification, QA, NER)
107
- **OpenAI GPT**: Autoregressive decoder for generation and understanding
108
- **GPT-2**: Larger autoregressive model with byte-level BPE tokenization
109
- **Transformer-XL**: Extended context transformer with adaptive attention
110
111
Each model family includes:
112
- **Configuration classes**: Model hyperparameters and architecture settings
113
- **Model classes**: Various task-specific variants (base model, language modeling head, classification head)
114
- **Tokenizer classes**: Text preprocessing and encoding specific to each model
115
- **Weight loading utilities**: Functions to convert from original TensorFlow checkpoints
116
117
All models support the `from_pretrained()` class method for loading pre-trained weights with automatic download and caching.
118
119
## Capabilities
120
121
### BERT Models
122
123
Complete BERT model family including base model, task-specific variants, configuration, and tokenization for bidirectional language understanding tasks.
124
125
```python { .api }
126
class BertModel: ...
127
class BertForSequenceClassification: ...
128
class BertForQuestionAnswering: ...
129
class BertTokenizer: ...
130
class BertConfig: ...
131
```
132
133
[BERT Models](./bert-models.md)
134
135
### Tokenizers
136
137
Tokenization utilities for all supported model types, handling text preprocessing, encoding, decoding, and vocabulary management with model-specific tokenization strategies.
138
139
```python { .api }
140
class BertTokenizer: ...
141
class BasicTokenizer: ...
142
class WordpieceTokenizer: ...
143
class OpenAIGPTTokenizer: ...
144
class GPT2Tokenizer: ...
145
class TransfoXLTokenizer: ...
146
```
147
148
[Tokenizers](./tokenizers.md)
149
150
### GPT Models
151
152
OpenAI GPT, GPT-2, and Transformer-XL model families with their configurations and tokenizers for autoregressive language modeling and text generation tasks.
153
154
```python { .api }
155
class OpenAIGPTLMHeadModel: ...
156
class GPT2LMHeadModel: ...
157
class TransfoXLLMHeadModel: ...
158
```
159
160
[GPT Models](./gpt-models.md)
161
162
### Optimizers
163
164
Specialized optimizers with learning rate scheduling designed for transformer training, including BERT-specific and OpenAI-specific Adam variants.
165
166
```python { .api }
167
class BertAdam: ...
168
class OpenAIAdam: ...
169
```
170
171
[Optimizers](./optimizers.md)
172
173
### Utilities
174
175
File handling, caching, and model loading utilities for automatic download, caching of pre-trained models, and conversion from TensorFlow checkpoints.
176
177
```python { .api }
178
def cached_path(url_or_filename, cache_dir=None): ...
179
def load_tf_weights_in_bert(model, tf_checkpoint_path): ...
180
```
181
182
[Utilities](./utilities.md)
183
184
## Common Patterns
185
186
### Loading Pre-trained Models
187
188
All model classes support the standard `from_pretrained()` pattern:
189
190
```python
191
# Load model with default configuration
192
model = BertModel.from_pretrained('bert-base-uncased')
193
194
# Load with custom cache directory
195
model = BertModel.from_pretrained('bert-base-uncased', cache_dir='./models/')
196
197
# Load tokenizer
198
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
199
```
200
201
### Fine-tuning Setup
202
203
```python
204
from pytorch_pretrained_bert import BertForSequenceClassification, BertAdam
205
206
# Load model for fine-tuning
207
model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=3)
208
209
# Setup optimizer with learning rate scheduling
210
param_optimizer = list(model.named_parameters())
211
no_decay = ['bias', 'LayerNorm.bias', 'LayerNorm.weight']
212
optimizer_grouped_parameters = [
213
{'params': [p for n, p in param_optimizer if not any(nd in n for nd in no_decay)], 'weight_decay': 0.01},
214
{'params': [p for n, p in param_optimizer if any(nd in n for nd in no_decay)], 'weight_decay': 0.0}
215
]
216
217
optimizer = BertAdam(optimizer_grouped_parameters,
218
lr=2e-5,
219
warmup=0.1,
220
t_total=num_train_steps)
221
```
222
223
### Converting TensorFlow Checkpoints
224
225
```python
226
from pytorch_pretrained_bert import BertModel, load_tf_weights_in_bert
227
228
# Create PyTorch model
229
model = BertModel(config)
230
231
# Load TensorFlow weights
232
load_tf_weights_in_bert(model, tf_checkpoint_path)
233
```