0
# GPT-2 Models
1
2
GPT-2 (Generative Pre-trained Transformer 2) models for text generation and language modeling tasks. GPT-2 uses autoregressive (left-to-right) attention to generate coherent text by predicting the next token in a sequence.
3
4
## Capabilities
5
6
### GPT2Config
7
8
Configuration class for GPT-2 models containing all hyperparameters and architecture specifications.
9
10
```python { .api }
11
class GPT2Config(PretrainedConfig):
12
def __init__(
13
self,
14
vocab_size=50257,
15
n_positions=1024,
16
n_ctx=1024,
17
n_embd=768,
18
n_layer=12,
19
n_head=12,
20
n_inner=None,
21
activation_function="gelu_new",
22
resid_pdrop=0.1,
23
embd_pdrop=0.1,
24
attn_pdrop=0.1,
25
layer_norm_epsilon=1e-5,
26
initializer_range=0.02,
27
**kwargs
28
):
29
"""
30
Configuration for GPT-2 models.
31
32
Parameters:
33
- vocab_size (int): Vocabulary size
34
- n_positions (int): Maximum sequence length for positional embeddings
35
- n_ctx (int): Context size (same as n_positions)
36
- n_embd (int): Embedding dimensionality
37
- n_layer (int): Number of transformer blocks
38
- n_head (int): Number of attention heads per layer
39
- n_inner (int): Inner dimensionality in feed-forward (4 * n_embd if None)
40
- activation_function (str): Activation function ("gelu_new", "relu", "swish")
41
- resid_pdrop (float): Residual connection dropout probability
42
- embd_pdrop (float): Embedding dropout probability
43
- attn_pdrop (float): Attention dropout probability
44
- layer_norm_epsilon (float): Layer normalization epsilon
45
- initializer_range (float): Weight initialization range
46
"""
47
```
48
49
### GPT2Model
50
51
Base GPT-2 model for generating contextualized representations and text generation.
52
53
```python { .api }
54
class GPT2Model(PreTrainedModel):
55
def __init__(self, config):
56
"""
57
Initialize GPT-2 base model.
58
59
Parameters:
60
- config (GPT2Config): Model configuration
61
"""
62
63
def forward(
64
self,
65
input_ids=None,
66
past=None,
67
attention_mask=None,
68
token_type_ids=None,
69
position_ids=None,
70
head_mask=None,
71
inputs_embeds=None
72
):
73
"""
74
Forward pass through GPT-2 model.
75
76
Parameters:
77
- input_ids (torch.Tensor): Token IDs of shape (batch_size, sequence_length)
78
- past (Tuple[torch.Tensor]): Pre-computed hidden states for efficient generation
79
- attention_mask (torch.Tensor): Attention mask to avoid padding tokens
80
- token_type_ids (torch.Tensor): Segment token indices
81
- position_ids (torch.Tensor): Position indices
82
- head_mask (torch.Tensor): Mask to nullify selected heads
83
- inputs_embeds (torch.Tensor): Pre-computed embeddings
84
85
Returns:
86
BaseModelOutputWithPast: Object with last_hidden_state and past_key_values
87
"""
88
```
89
90
**Usage Example:**
91
92
```python
93
from pytorch_transformers import GPT2Model, GPT2Tokenizer
94
import torch
95
96
# Load model and tokenizer
97
model = GPT2Model.from_pretrained("gpt2")
98
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
99
100
# Prepare input
101
text = "The future of artificial intelligence is"
102
inputs = tokenizer(text, return_tensors="pt")
103
104
# Get model outputs
105
with torch.no_grad():
106
outputs = model(**inputs)
107
108
# Access representations
109
last_hidden_state = outputs.last_hidden_state # Shape: (1, seq_len, 768)
110
past_key_values = outputs.past_key_values # For efficient generation
111
112
print(f"Hidden state shape: {last_hidden_state.shape}")
113
print(f"Number of past layers: {len(past_key_values) if past_key_values else 0}")
114
```
115
116
### GPT2LMHeadModel
117
118
GPT-2 model with a language modeling head for text generation and language modeling tasks.
119
120
```python { .api }
121
class GPT2LMHeadModel(PreTrainedModel):
122
def __init__(self, config):
123
"""
124
Initialize GPT-2 for language modeling.
125
126
Parameters:
127
- config (GPT2Config): Model configuration
128
"""
129
130
def forward(
131
self,
132
input_ids=None,
133
past=None,
134
attention_mask=None,
135
token_type_ids=None,
136
position_ids=None,
137
head_mask=None,
138
inputs_embeds=None,
139
labels=None
140
):
141
"""
142
Forward pass for language modeling.
143
144
Parameters:
145
- input_ids (torch.Tensor): Token IDs
146
- past (Tuple[torch.Tensor]): Pre-computed hidden states
147
- attention_mask (torch.Tensor): Attention mask
148
- token_type_ids (torch.Tensor): Segment token indices
149
- position_ids (torch.Tensor): Position indices
150
- head_mask (torch.Tensor): Head mask
151
- inputs_embeds (torch.Tensor): Pre-computed embeddings
152
- labels (torch.Tensor): Language modeling labels (shifted input_ids)
153
154
Returns:
155
CausalLMOutputWithPast: Object with loss, logits, and past_key_values
156
"""
157
158
def generate(
159
self,
160
input_ids=None,
161
max_length=20,
162
do_sample=False,
163
temperature=1.0,
164
top_k=0,
165
top_p=1.0,
166
repetition_penalty=1.0,
167
pad_token_id=None,
168
eos_token_id=None,
169
**kwargs
170
):
171
"""
172
Generate text using the language model.
173
174
Parameters:
175
- input_ids (torch.Tensor): Input token IDs as prompt
176
- max_length (int): Maximum length of generated sequence
177
- do_sample (bool): Whether to use sampling or greedy decoding
178
- temperature (float): Sampling temperature (higher = more random)
179
- top_k (int): Top-k sampling (0 = disabled)
180
- top_p (float): Nucleus sampling threshold (1.0 = disabled)
181
- repetition_penalty (float): Penalty for repeated tokens
182
- pad_token_id (int): Padding token ID
183
- eos_token_id (int): End-of-sequence token ID
184
185
Returns:
186
torch.Tensor: Generated token IDs
187
"""
188
```
189
190
**Usage Example:**
191
192
```python
193
from pytorch_transformers import GPT2LMHeadModel, GPT2Tokenizer
194
import torch
195
196
# Load model and tokenizer
197
model = GPT2LMHeadModel.from_pretrained("gpt2")
198
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
199
200
# Set pad token
201
tokenizer.pad_token = tokenizer.eos_token
202
203
# Generate text
204
prompt = "The future of artificial intelligence"
205
inputs = tokenizer.encode(prompt, return_tensors="pt")
206
207
# Generate with different strategies
208
with torch.no_grad():
209
# Greedy generation
210
greedy_output = model.generate(
211
inputs,
212
max_length=50,
213
do_sample=False,
214
pad_token_id=tokenizer.eos_token_id
215
)
216
217
# Sampling with temperature
218
sample_output = model.generate(
219
inputs,
220
max_length=50,
221
do_sample=True,
222
temperature=0.8,
223
top_k=50,
224
top_p=0.9,
225
pad_token_id=tokenizer.eos_token_id
226
)
227
228
# Decode generated text
229
greedy_text = tokenizer.decode(greedy_output[0], skip_special_tokens=True)
230
sample_text = tokenizer.decode(sample_output[0], skip_special_tokens=True)
231
232
print(f"Greedy: {greedy_text}")
233
print(f"Sampled: {sample_text}")
234
```
235
236
### GPT2DoubleHeadsModel
237
238
GPT-2 model with both language modeling and classification heads for multi-task learning.
239
240
```python { .api }
241
class GPT2DoubleHeadsModel(PreTrainedModel):
242
def __init__(self, config):
243
"""
244
Initialize GPT-2 with double heads.
245
246
Parameters:
247
- config (GPT2Config): Model configuration
248
"""
249
250
def forward(
251
self,
252
input_ids=None,
253
past=None,
254
attention_mask=None,
255
token_type_ids=None,
256
position_ids=None,
257
head_mask=None,
258
inputs_embeds=None,
259
mc_token_ids=None,
260
lm_labels=None,
261
mc_labels=None
262
):
263
"""
264
Forward pass for double heads model.
265
266
Parameters:
267
- input_ids (torch.Tensor): Token IDs
268
- past (Tuple[torch.Tensor]): Pre-computed hidden states
269
- attention_mask (torch.Tensor): Attention mask
270
- token_type_ids (torch.Tensor): Segment token indices
271
- position_ids (torch.Tensor): Position indices
272
- head_mask (torch.Tensor): Head mask
273
- inputs_embeds (torch.Tensor): Pre-computed embeddings
274
- mc_token_ids (torch.Tensor): Token IDs for classification head
275
- lm_labels (torch.Tensor): Language modeling labels
276
- mc_labels (torch.Tensor): Multiple choice labels
277
278
Returns:
279
GPT2DoubleHeadsModelOutput: Object with lm_loss, mc_loss, lm_logits, mc_logits, past_key_values
280
"""
281
```
282
283
### GPT2Tokenizer
284
285
Byte-pair encoding (BPE) tokenizer for GPT-2 models.
286
287
```python { .api }
288
class GPT2Tokenizer(PreTrainedTokenizer):
289
def __init__(
290
self,
291
vocab_file,
292
merges_file,
293
errors="replace",
294
unk_token="<|endoftext|>",
295
bos_token="<|endoftext|>",
296
eos_token="<|endoftext|>",
297
add_prefix_space=False,
298
**kwargs
299
):
300
"""
301
Initialize GPT-2 tokenizer.
302
303
Parameters:
304
- vocab_file (str): Path to vocabulary file
305
- merges_file (str): Path to BPE merges file
306
- errors (str): Error handling for encoding ("replace", "ignore", "strict")
307
- unk_token (str): Unknown token
308
- bos_token (str): Beginning of sequence token
309
- eos_token (str): End of sequence token
310
- add_prefix_space (bool): Whether to add space before tokenizing
311
"""
312
313
def encode(
314
self,
315
text,
316
add_special_tokens=True,
317
max_length=None,
318
stride=0,
319
truncation_strategy="longest_first",
320
**kwargs
321
):
322
"""
323
Encode text to token IDs using BPE.
324
325
Parameters:
326
- text (str): Input text to encode
327
- add_special_tokens (bool): Whether to add special tokens
328
- max_length (int): Maximum sequence length
329
- stride (int): Stride for sliding window
330
- truncation_strategy (str): How to truncate long sequences
331
332
Returns:
333
List[int]: List of token IDs
334
"""
335
```
336
337
**Usage Example:**
338
339
```python
340
from pytorch_transformers import GPT2Tokenizer
341
342
# Load tokenizer
343
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
344
345
# GPT-2 uses the same token for BOS, EOS, UNK, and PAD
346
print(f"Special token: {tokenizer.eos_token}") # <|endoftext|>
347
348
# Tokenize text
349
text = "Hello, how are you today?"
350
tokens = tokenizer.tokenize(text)
351
token_ids = tokenizer.encode(text)
352
353
print(f"Tokens: {tokens}")
354
print(f"Token IDs: {token_ids}")
355
356
# Decode back
357
decoded = tokenizer.decode(token_ids)
358
print(f"Decoded: {decoded}")
359
360
# Handle multiple sequences
361
texts = ["First sentence.", "Second sentence."]
362
encoded = tokenizer(
363
texts,
364
padding=True,
365
truncation=True,
366
return_tensors="pt"
367
)
368
print(f"Batch shape: {encoded['input_ids'].shape}")
369
```
370
371
## Utility Functions
372
373
### load_tf_weights_in_gpt2
374
375
```python { .api }
376
def load_tf_weights_in_gpt2(model, gpt2_checkpoint_path):
377
"""
378
Load TensorFlow GPT-2 checkpoint weights into a PyTorch GPT-2 model.
379
380
Parameters:
381
- model (GPT2Model): PyTorch GPT-2 model
382
- gpt2_checkpoint_path (str): Path to TensorFlow checkpoint directory
383
384
Returns:
385
GPT2Model: Model with loaded weights
386
"""
387
```
388
389
## Archive Maps
390
391
```python { .api }
392
GPT2_PRETRAINED_CONFIG_ARCHIVE_MAP: Dict[str, str]
393
# Maps model names to download URLs for configurations
394
395
GPT2_PRETRAINED_MODEL_ARCHIVE_MAP: Dict[str, str]
396
# Maps model names to download URLs for pre-trained weights
397
```
398
399
**Available Pre-trained Models:**
400
- `gpt2`: 12-layer, 768-hidden, 12-heads, 117M parameters (small)
401
- `gpt2-medium`: 24-layer, 1024-hidden, 16-heads, 345M parameters
402
- `gpt2-large`: 36-layer, 1280-hidden, 20-heads, 762M parameters
403
- `gpt2-xl`: 48-layer, 1600-hidden, 25-heads, 1558M parameters
404
405
## Text Generation Strategies
406
407
GPT-2 models support various text generation strategies:
408
409
**Greedy Decoding**: Always selects the most likely next token
410
```python
411
output = model.generate(input_ids, do_sample=False)
412
```
413
414
**Sampling**: Randomly samples from the probability distribution
415
```python
416
output = model.generate(input_ids, do_sample=True, temperature=0.8)
417
```
418
419
**Top-k Sampling**: Samples from the k most likely tokens
420
```python
421
output = model.generate(input_ids, do_sample=True, top_k=50)
422
```
423
424
**Nucleus (Top-p) Sampling**: Samples from tokens whose cumulative probability exceeds p
425
```python
426
output = model.generate(input_ids, do_sample=True, top_p=0.9)
427
```
428
429
**Combined Strategies**: Use multiple techniques together
430
```python
431
output = model.generate(
432
input_ids,
433
do_sample=True,
434
temperature=0.8,
435
top_k=50,
436
top_p=0.9,
437
repetition_penalty=1.1
438
)
439
```