Tessl Tile for pypi/pytorch-transformers@1.2.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

auto-classes.md base-classes.md bert-models.md file-utilities.md gpt2-models.md index.md optimization.md other-models.md

gpt2-models.mddocs/

0
# GPT-2 Models
1

2
GPT-2 (Generative Pre-trained Transformer 2) models for text generation and language modeling tasks. GPT-2 uses autoregressive (left-to-right) attention to generate coherent text by predicting the next token in a sequence.
3

4
## Capabilities
5

6
### GPT2Config
7

8
Configuration class for GPT-2 models containing all hyperparameters and architecture specifications.
9

10
```python { .api }
11
class GPT2Config(PretrainedConfig):
12
    def __init__(
13
        self,
14
        vocab_size=50257,
15
        n_positions=1024,
16
        n_ctx=1024,
17
        n_embd=768,
18
        n_layer=12,
19
        n_head=12,
20
        n_inner=None,
21
        activation_function="gelu_new",
22
        resid_pdrop=0.1,
23
        embd_pdrop=0.1,
24
        attn_pdrop=0.1,
25
        layer_norm_epsilon=1e-5,
26
        initializer_range=0.02,
27
        **kwargs
28
    ):
29
        """
30
        Configuration for GPT-2 models.
31
        
32
        Parameters:
33
        - vocab_size (int): Vocabulary size
34
        - n_positions (int): Maximum sequence length for positional embeddings
35
        - n_ctx (int): Context size (same as n_positions)
36
        - n_embd (int): Embedding dimensionality
37
        - n_layer (int): Number of transformer blocks
38
        - n_head (int): Number of attention heads per layer
39
        - n_inner (int): Inner dimensionality in feed-forward (4 * n_embd if None)
40
        - activation_function (str): Activation function ("gelu_new", "relu", "swish")
41
        - resid_pdrop (float): Residual connection dropout probability
42
        - embd_pdrop (float): Embedding dropout probability
43
        - attn_pdrop (float): Attention dropout probability
44
        - layer_norm_epsilon (float): Layer normalization epsilon
45
        - initializer_range (float): Weight initialization range
46
        """
47
```
48

49
### GPT2Model
50

51
Base GPT-2 model for generating contextualized representations and text generation.
52

53
```python { .api }
54
class GPT2Model(PreTrainedModel):
55
    def __init__(self, config):
56
        """
57
        Initialize GPT-2 base model.
58
        
59
        Parameters:
60
        - config (GPT2Config): Model configuration
61
        """
62
    
63
    def forward(
64
        self,
65
        input_ids=None,
66
        past=None,
67
        attention_mask=None,
68
        token_type_ids=None,
69
        position_ids=None,
70
        head_mask=None,
71
        inputs_embeds=None
72
    ):
73
        """
74
        Forward pass through GPT-2 model.
75
        
76
        Parameters:
77
        - input_ids (torch.Tensor): Token IDs of shape (batch_size, sequence_length)
78
        - past (Tuple[torch.Tensor]): Pre-computed hidden states for efficient generation
79
        - attention_mask (torch.Tensor): Attention mask to avoid padding tokens
80
        - token_type_ids (torch.Tensor): Segment token indices
81
        - position_ids (torch.Tensor): Position indices
82
        - head_mask (torch.Tensor): Mask to nullify selected heads
83
        - inputs_embeds (torch.Tensor): Pre-computed embeddings
84
        
85
        Returns:
86
        BaseModelOutputWithPast: Object with last_hidden_state and past_key_values
87
        """
88
```
89

90
**Usage Example:**
91

92
```python
93
from pytorch_transformers import GPT2Model, GPT2Tokenizer
94
import torch
95

96
# Load model and tokenizer
97
model = GPT2Model.from_pretrained("gpt2")
98
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
99

100
# Prepare input
101
text = "The future of artificial intelligence is"
102
inputs = tokenizer(text, return_tensors="pt")
103

104
# Get model outputs
105
with torch.no_grad():
106
    outputs = model(**inputs)
107
    
108
# Access representations
109
last_hidden_state = outputs.last_hidden_state  # Shape: (1, seq_len, 768)
110
past_key_values = outputs.past_key_values      # For efficient generation
111

112
print(f"Hidden state shape: {last_hidden_state.shape}")
113
print(f"Number of past layers: {len(past_key_values) if past_key_values else 0}")
114
```
115

116
### GPT2LMHeadModel
117

118
GPT-2 model with a language modeling head for text generation and language modeling tasks.
119

120
```python { .api }
121
class GPT2LMHeadModel(PreTrainedModel):
122
    def __init__(self, config):
123
        """
124
        Initialize GPT-2 for language modeling.
125
        
126
        Parameters:
127
        - config (GPT2Config): Model configuration
128
        """
129
    
130
    def forward(
131
        self,
132
        input_ids=None,
133
        past=None,
134
        attention_mask=None,
135
        token_type_ids=None,
136
        position_ids=None,
137
        head_mask=None,
138
        inputs_embeds=None,
139
        labels=None
140
    ):
141
        """
142
        Forward pass for language modeling.
143
        
144
        Parameters:
145
        - input_ids (torch.Tensor): Token IDs
146
        - past (Tuple[torch.Tensor]): Pre-computed hidden states
147
        - attention_mask (torch.Tensor): Attention mask
148
        - token_type_ids (torch.Tensor): Segment token indices
149
        - position_ids (torch.Tensor): Position indices
150
        - head_mask (torch.Tensor): Head mask
151
        - inputs_embeds (torch.Tensor): Pre-computed embeddings
152
        - labels (torch.Tensor): Language modeling labels (shifted input_ids)
153
        
154
        Returns:
155
        CausalLMOutputWithPast: Object with loss, logits, and past_key_values
156
        """
157
    
158
    def generate(
159
        self,
160
        input_ids=None,
161
        max_length=20,
162
        do_sample=False,
163
        temperature=1.0,
164
        top_k=0,
165
        top_p=1.0,
166
        repetition_penalty=1.0,
167
        pad_token_id=None,
168
        eos_token_id=None,
169
        **kwargs
170
    ):
171
        """
172
        Generate text using the language model.
173
        
174
        Parameters:
175
        - input_ids (torch.Tensor): Input token IDs as prompt
176
        - max_length (int): Maximum length of generated sequence
177
        - do_sample (bool): Whether to use sampling or greedy decoding
178
        - temperature (float): Sampling temperature (higher = more random)
179
        - top_k (int): Top-k sampling (0 = disabled)
180
        - top_p (float): Nucleus sampling threshold (1.0 = disabled)
181
        - repetition_penalty (float): Penalty for repeated tokens
182
        - pad_token_id (int): Padding token ID
183
        - eos_token_id (int): End-of-sequence token ID
184
        
185
        Returns:
186
        torch.Tensor: Generated token IDs
187
        """
188
```
189

190
**Usage Example:**
191

192
```python
193
from pytorch_transformers import GPT2LMHeadModel, GPT2Tokenizer
194
import torch
195

196
# Load model and tokenizer
197
model = GPT2LMHeadModel.from_pretrained("gpt2")
198
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
199

200
# Set pad token
201
tokenizer.pad_token = tokenizer.eos_token
202

203
# Generate text
204
prompt = "The future of artificial intelligence"
205
inputs = tokenizer.encode(prompt, return_tensors="pt")
206

207
# Generate with different strategies
208
with torch.no_grad():
209
    # Greedy generation
210
    greedy_output = model.generate(
211
        inputs,
212
        max_length=50,
213
        do_sample=False,
214
        pad_token_id=tokenizer.eos_token_id
215
    )
216
    
217
    # Sampling with temperature
218
    sample_output = model.generate(
219
        inputs,
220
        max_length=50,
221
        do_sample=True,
222
        temperature=0.8,
223
        top_k=50,
224
        top_p=0.9,
225
        pad_token_id=tokenizer.eos_token_id
226
    )
227

228
# Decode generated text
229
greedy_text = tokenizer.decode(greedy_output[0], skip_special_tokens=True)
230
sample_text = tokenizer.decode(sample_output[0], skip_special_tokens=True)
231

232
print(f"Greedy: {greedy_text}")
233
print(f"Sampled: {sample_text}")
234
```
235

236
### GPT2DoubleHeadsModel
237

238
GPT-2 model with both language modeling and classification heads for multi-task learning.
239

240
```python { .api }
241
class GPT2DoubleHeadsModel(PreTrainedModel):
242
    def __init__(self, config):
243
        """
244
        Initialize GPT-2 with double heads.
245
        
246
        Parameters:
247
        - config (GPT2Config): Model configuration
248
        """
249
    
250
    def forward(
251
        self,
252
        input_ids=None,
253
        past=None,
254
        attention_mask=None,
255
        token_type_ids=None,
256
        position_ids=None,
257
        head_mask=None,
258
        inputs_embeds=None,
259
        mc_token_ids=None,
260
        lm_labels=None,
261
        mc_labels=None
262
    ):
263
        """
264
        Forward pass for double heads model.
265
        
266
        Parameters:
267
        - input_ids (torch.Tensor): Token IDs
268
        - past (Tuple[torch.Tensor]): Pre-computed hidden states
269
        - attention_mask (torch.Tensor): Attention mask
270
        - token_type_ids (torch.Tensor): Segment token indices
271
        - position_ids (torch.Tensor): Position indices
272
        - head_mask (torch.Tensor): Head mask
273
        - inputs_embeds (torch.Tensor): Pre-computed embeddings
274
        - mc_token_ids (torch.Tensor): Token IDs for classification head
275
        - lm_labels (torch.Tensor): Language modeling labels
276
        - mc_labels (torch.Tensor): Multiple choice labels
277
        
278
        Returns:
279
        GPT2DoubleHeadsModelOutput: Object with lm_loss, mc_loss, lm_logits, mc_logits, past_key_values
280
        """
281
```
282

283
### GPT2Tokenizer
284

285
Byte-pair encoding (BPE) tokenizer for GPT-2 models.
286

287
```python { .api }
288
class GPT2Tokenizer(PreTrainedTokenizer):
289
    def __init__(
290
        self,
291
        vocab_file,
292
        merges_file,
293
        errors="replace",
294
        unk_token="<|endoftext|>",
295
        bos_token="<|endoftext|>",
296
        eos_token="<|endoftext|>",
297
        add_prefix_space=False,
298
        **kwargs
299
    ):
300
        """
301
        Initialize GPT-2 tokenizer.
302
        
303
        Parameters:
304
        - vocab_file (str): Path to vocabulary file
305
        - merges_file (str): Path to BPE merges file
306
        - errors (str): Error handling for encoding ("replace", "ignore", "strict")
307
        - unk_token (str): Unknown token
308
        - bos_token (str): Beginning of sequence token
309
        - eos_token (str): End of sequence token
310
        - add_prefix_space (bool): Whether to add space before tokenizing
311
        """
312
    
313
    def encode(
314
        self, 
315
        text, 
316
        add_special_tokens=True, 
317
        max_length=None, 
318
        stride=0,
319
        truncation_strategy="longest_first",
320
        **kwargs
321
    ):
322
        """
323
        Encode text to token IDs using BPE.
324
        
325
        Parameters:
326
        - text (str): Input text to encode
327
        - add_special_tokens (bool): Whether to add special tokens
328
        - max_length (int): Maximum sequence length
329
        - stride (int): Stride for sliding window
330
        - truncation_strategy (str): How to truncate long sequences
331
        
332
        Returns:
333
        List[int]: List of token IDs
334
        """
335
```
336

337
**Usage Example:**
338

339
```python
340
from pytorch_transformers import GPT2Tokenizer
341

342
# Load tokenizer
343
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
344

345
# GPT-2 uses the same token for BOS, EOS, UNK, and PAD
346
print(f"Special token: {tokenizer.eos_token}")  # <|endoftext|>
347

348
# Tokenize text
349
text = "Hello, how are you today?"
350
tokens = tokenizer.tokenize(text)
351
token_ids = tokenizer.encode(text)
352

353
print(f"Tokens: {tokens}")
354
print(f"Token IDs: {token_ids}")
355

356
# Decode back
357
decoded = tokenizer.decode(token_ids)
358
print(f"Decoded: {decoded}")
359

360
# Handle multiple sequences
361
texts = ["First sentence.", "Second sentence."]
362
encoded = tokenizer(
363
    texts,
364
    padding=True,
365
    truncation=True,
366
    return_tensors="pt"
367
)
368
print(f"Batch shape: {encoded['input_ids'].shape}")
369
```
370

371
## Utility Functions
372

373
### load_tf_weights_in_gpt2
374

375
```python { .api }
376
def load_tf_weights_in_gpt2(model, gpt2_checkpoint_path):
377
    """
378
    Load TensorFlow GPT-2 checkpoint weights into a PyTorch GPT-2 model.
379
    
380
    Parameters:
381
    - model (GPT2Model): PyTorch GPT-2 model
382
    - gpt2_checkpoint_path (str): Path to TensorFlow checkpoint directory
383
    
384
    Returns:
385
    GPT2Model: Model with loaded weights
386
    """
387
```
388

389
## Archive Maps
390

391
```python { .api }
392
GPT2_PRETRAINED_CONFIG_ARCHIVE_MAP: Dict[str, str]
393
# Maps model names to download URLs for configurations
394

395
GPT2_PRETRAINED_MODEL_ARCHIVE_MAP: Dict[str, str]
396
# Maps model names to download URLs for pre-trained weights
397
```
398

399
**Available Pre-trained Models:**
400
- `gpt2`: 12-layer, 768-hidden, 12-heads, 117M parameters (small)
401
- `gpt2-medium`: 24-layer, 1024-hidden, 16-heads, 345M parameters
402
- `gpt2-large`: 36-layer, 1280-hidden, 20-heads, 762M parameters
403
- `gpt2-xl`: 48-layer, 1600-hidden, 25-heads, 1558M parameters
404

405
## Text Generation Strategies
406

407
GPT-2 models support various text generation strategies:
408

409
**Greedy Decoding**: Always selects the most likely next token
410
```python
411
output = model.generate(input_ids, do_sample=False)
412
```
413

414
**Sampling**: Randomly samples from the probability distribution
415
```python
416
output = model.generate(input_ids, do_sample=True, temperature=0.8)
417
```
418

419
**Top-k Sampling**: Samples from the k most likely tokens
420
```python
421
output = model.generate(input_ids, do_sample=True, top_k=50)
422
```
423

424
**Nucleus (Top-p) Sampling**: Samples from tokens whose cumulative probability exceeds p
425
```python
426
output = model.generate(input_ids, do_sample=True, top_p=0.9)
427
```
428

429
**Combined Strategies**: Use multiple techniques together
430
```python
431
output = model.generate(
432
    input_ids,
433
    do_sample=True,
434
    temperature=0.8,
435
    top_k=50,
436
    top_p=0.9,
437
    repetition_penalty=1.1
438
)
439
```

Version

Tile

Files

gpt2-models.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

gpt2-models.mddocs/