or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

auto-classes.mdbase-classes.mdbert-models.mdfile-utilities.mdgpt2-models.mdindex.mdoptimization.mdother-models.md

gpt2-models.mddocs/

0

# GPT-2 Models

1

2

GPT-2 (Generative Pre-trained Transformer 2) models for text generation and language modeling tasks. GPT-2 uses autoregressive (left-to-right) attention to generate coherent text by predicting the next token in a sequence.

3

4

## Capabilities

5

6

### GPT2Config

7

8

Configuration class for GPT-2 models containing all hyperparameters and architecture specifications.

9

10

```python { .api }

11

class GPT2Config(PretrainedConfig):

12

def __init__(

13

self,

14

vocab_size=50257,

15

n_positions=1024,

16

n_ctx=1024,

17

n_embd=768,

18

n_layer=12,

19

n_head=12,

20

n_inner=None,

21

activation_function="gelu_new",

22

resid_pdrop=0.1,

23

embd_pdrop=0.1,

24

attn_pdrop=0.1,

25

layer_norm_epsilon=1e-5,

26

initializer_range=0.02,

27

**kwargs

28

):

29

"""

30

Configuration for GPT-2 models.

31

32

Parameters:

33

- vocab_size (int): Vocabulary size

34

- n_positions (int): Maximum sequence length for positional embeddings

35

- n_ctx (int): Context size (same as n_positions)

36

- n_embd (int): Embedding dimensionality

37

- n_layer (int): Number of transformer blocks

38

- n_head (int): Number of attention heads per layer

39

- n_inner (int): Inner dimensionality in feed-forward (4 * n_embd if None)

40

- activation_function (str): Activation function ("gelu_new", "relu", "swish")

41

- resid_pdrop (float): Residual connection dropout probability

42

- embd_pdrop (float): Embedding dropout probability

43

- attn_pdrop (float): Attention dropout probability

44

- layer_norm_epsilon (float): Layer normalization epsilon

45

- initializer_range (float): Weight initialization range

46

"""

47

```

48

49

### GPT2Model

50

51

Base GPT-2 model for generating contextualized representations and text generation.

52

53

```python { .api }

54

class GPT2Model(PreTrainedModel):

55

def __init__(self, config):

56

"""

57

Initialize GPT-2 base model.

58

59

Parameters:

60

- config (GPT2Config): Model configuration

61

"""

62

63

def forward(

64

self,

65

input_ids=None,

66

past=None,

67

attention_mask=None,

68

token_type_ids=None,

69

position_ids=None,

70

head_mask=None,

71

inputs_embeds=None

72

):

73

"""

74

Forward pass through GPT-2 model.

75

76

Parameters:

77

- input_ids (torch.Tensor): Token IDs of shape (batch_size, sequence_length)

78

- past (Tuple[torch.Tensor]): Pre-computed hidden states for efficient generation

79

- attention_mask (torch.Tensor): Attention mask to avoid padding tokens

80

- token_type_ids (torch.Tensor): Segment token indices

81

- position_ids (torch.Tensor): Position indices

82

- head_mask (torch.Tensor): Mask to nullify selected heads

83

- inputs_embeds (torch.Tensor): Pre-computed embeddings

84

85

Returns:

86

BaseModelOutputWithPast: Object with last_hidden_state and past_key_values

87

"""

88

```

89

90

**Usage Example:**

91

92

```python

93

from pytorch_transformers import GPT2Model, GPT2Tokenizer

94

import torch

95

96

# Load model and tokenizer

97

model = GPT2Model.from_pretrained("gpt2")

98

tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

99

100

# Prepare input

101

text = "The future of artificial intelligence is"

102

inputs = tokenizer(text, return_tensors="pt")

103

104

# Get model outputs

105

with torch.no_grad():

106

outputs = model(**inputs)

107

108

# Access representations

109

last_hidden_state = outputs.last_hidden_state # Shape: (1, seq_len, 768)

110

past_key_values = outputs.past_key_values # For efficient generation

111

112

print(f"Hidden state shape: {last_hidden_state.shape}")

113

print(f"Number of past layers: {len(past_key_values) if past_key_values else 0}")

114

```

115

116

### GPT2LMHeadModel

117

118

GPT-2 model with a language modeling head for text generation and language modeling tasks.

119

120

```python { .api }

121

class GPT2LMHeadModel(PreTrainedModel):

122

def __init__(self, config):

123

"""

124

Initialize GPT-2 for language modeling.

125

126

Parameters:

127

- config (GPT2Config): Model configuration

128

"""

129

130

def forward(

131

self,

132

input_ids=None,

133

past=None,

134

attention_mask=None,

135

token_type_ids=None,

136

position_ids=None,

137

head_mask=None,

138

inputs_embeds=None,

139

labels=None

140

):

141

"""

142

Forward pass for language modeling.

143

144

Parameters:

145

- input_ids (torch.Tensor): Token IDs

146

- past (Tuple[torch.Tensor]): Pre-computed hidden states

147

- attention_mask (torch.Tensor): Attention mask

148

- token_type_ids (torch.Tensor): Segment token indices

149

- position_ids (torch.Tensor): Position indices

150

- head_mask (torch.Tensor): Head mask

151

- inputs_embeds (torch.Tensor): Pre-computed embeddings

152

- labels (torch.Tensor): Language modeling labels (shifted input_ids)

153

154

Returns:

155

CausalLMOutputWithPast: Object with loss, logits, and past_key_values

156

"""

157

158

def generate(

159

self,

160

input_ids=None,

161

max_length=20,

162

do_sample=False,

163

temperature=1.0,

164

top_k=0,

165

top_p=1.0,

166

repetition_penalty=1.0,

167

pad_token_id=None,

168

eos_token_id=None,

169

**kwargs

170

):

171

"""

172

Generate text using the language model.

173

174

Parameters:

175

- input_ids (torch.Tensor): Input token IDs as prompt

176

- max_length (int): Maximum length of generated sequence

177

- do_sample (bool): Whether to use sampling or greedy decoding

178

- temperature (float): Sampling temperature (higher = more random)

179

- top_k (int): Top-k sampling (0 = disabled)

180

- top_p (float): Nucleus sampling threshold (1.0 = disabled)

181

- repetition_penalty (float): Penalty for repeated tokens

182

- pad_token_id (int): Padding token ID

183

- eos_token_id (int): End-of-sequence token ID

184

185

Returns:

186

torch.Tensor: Generated token IDs

187

"""

188

```

189

190

**Usage Example:**

191

192

```python

193

from pytorch_transformers import GPT2LMHeadModel, GPT2Tokenizer

194

import torch

195

196

# Load model and tokenizer

197

model = GPT2LMHeadModel.from_pretrained("gpt2")

198

tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

199

200

# Set pad token

201

tokenizer.pad_token = tokenizer.eos_token

202

203

# Generate text

204

prompt = "The future of artificial intelligence"

205

inputs = tokenizer.encode(prompt, return_tensors="pt")

206

207

# Generate with different strategies

208

with torch.no_grad():

209

# Greedy generation

210

greedy_output = model.generate(

211

inputs,

212

max_length=50,

213

do_sample=False,

214

pad_token_id=tokenizer.eos_token_id

215

)

216

217

# Sampling with temperature

218

sample_output = model.generate(

219

inputs,

220

max_length=50,

221

do_sample=True,

222

temperature=0.8,

223

top_k=50,

224

top_p=0.9,

225

pad_token_id=tokenizer.eos_token_id

226

)

227

228

# Decode generated text

229

greedy_text = tokenizer.decode(greedy_output[0], skip_special_tokens=True)

230

sample_text = tokenizer.decode(sample_output[0], skip_special_tokens=True)

231

232

print(f"Greedy: {greedy_text}")

233

print(f"Sampled: {sample_text}")

234

```

235

236

### GPT2DoubleHeadsModel

237

238

GPT-2 model with both language modeling and classification heads for multi-task learning.

239

240

```python { .api }

241

class GPT2DoubleHeadsModel(PreTrainedModel):

242

def __init__(self, config):

243

"""

244

Initialize GPT-2 with double heads.

245

246

Parameters:

247

- config (GPT2Config): Model configuration

248

"""

249

250

def forward(

251

self,

252

input_ids=None,

253

past=None,

254

attention_mask=None,

255

token_type_ids=None,

256

position_ids=None,

257

head_mask=None,

258

inputs_embeds=None,

259

mc_token_ids=None,

260

lm_labels=None,

261

mc_labels=None

262

):

263

"""

264

Forward pass for double heads model.

265

266

Parameters:

267

- input_ids (torch.Tensor): Token IDs

268

- past (Tuple[torch.Tensor]): Pre-computed hidden states

269

- attention_mask (torch.Tensor): Attention mask

270

- token_type_ids (torch.Tensor): Segment token indices

271

- position_ids (torch.Tensor): Position indices

272

- head_mask (torch.Tensor): Head mask

273

- inputs_embeds (torch.Tensor): Pre-computed embeddings

274

- mc_token_ids (torch.Tensor): Token IDs for classification head

275

- lm_labels (torch.Tensor): Language modeling labels

276

- mc_labels (torch.Tensor): Multiple choice labels

277

278

Returns:

279

GPT2DoubleHeadsModelOutput: Object with lm_loss, mc_loss, lm_logits, mc_logits, past_key_values

280

"""

281

```

282

283

### GPT2Tokenizer

284

285

Byte-pair encoding (BPE) tokenizer for GPT-2 models.

286

287

```python { .api }

288

class GPT2Tokenizer(PreTrainedTokenizer):

289

def __init__(

290

self,

291

vocab_file,

292

merges_file,

293

errors="replace",

294

unk_token="<|endoftext|>",

295

bos_token="<|endoftext|>",

296

eos_token="<|endoftext|>",

297

add_prefix_space=False,

298

**kwargs

299

):

300

"""

301

Initialize GPT-2 tokenizer.

302

303

Parameters:

304

- vocab_file (str): Path to vocabulary file

305

- merges_file (str): Path to BPE merges file

306

- errors (str): Error handling for encoding ("replace", "ignore", "strict")

307

- unk_token (str): Unknown token

308

- bos_token (str): Beginning of sequence token

309

- eos_token (str): End of sequence token

310

- add_prefix_space (bool): Whether to add space before tokenizing

311

"""

312

313

def encode(

314

self,

315

text,

316

add_special_tokens=True,

317

max_length=None,

318

stride=0,

319

truncation_strategy="longest_first",

320

**kwargs

321

):

322

"""

323

Encode text to token IDs using BPE.

324

325

Parameters:

326

- text (str): Input text to encode

327

- add_special_tokens (bool): Whether to add special tokens

328

- max_length (int): Maximum sequence length

329

- stride (int): Stride for sliding window

330

- truncation_strategy (str): How to truncate long sequences

331

332

Returns:

333

List[int]: List of token IDs

334

"""

335

```

336

337

**Usage Example:**

338

339

```python

340

from pytorch_transformers import GPT2Tokenizer

341

342

# Load tokenizer

343

tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

344

345

# GPT-2 uses the same token for BOS, EOS, UNK, and PAD

346

print(f"Special token: {tokenizer.eos_token}") # <|endoftext|>

347

348

# Tokenize text

349

text = "Hello, how are you today?"

350

tokens = tokenizer.tokenize(text)

351

token_ids = tokenizer.encode(text)

352

353

print(f"Tokens: {tokens}")

354

print(f"Token IDs: {token_ids}")

355

356

# Decode back

357

decoded = tokenizer.decode(token_ids)

358

print(f"Decoded: {decoded}")

359

360

# Handle multiple sequences

361

texts = ["First sentence.", "Second sentence."]

362

encoded = tokenizer(

363

texts,

364

padding=True,

365

truncation=True,

366

return_tensors="pt"

367

)

368

print(f"Batch shape: {encoded['input_ids'].shape}")

369

```

370

371

## Utility Functions

372

373

### load_tf_weights_in_gpt2

374

375

```python { .api }

376

def load_tf_weights_in_gpt2(model, gpt2_checkpoint_path):

377

"""

378

Load TensorFlow GPT-2 checkpoint weights into a PyTorch GPT-2 model.

379

380

Parameters:

381

- model (GPT2Model): PyTorch GPT-2 model

382

- gpt2_checkpoint_path (str): Path to TensorFlow checkpoint directory

383

384

Returns:

385

GPT2Model: Model with loaded weights

386

"""

387

```

388

389

## Archive Maps

390

391

```python { .api }

392

GPT2_PRETRAINED_CONFIG_ARCHIVE_MAP: Dict[str, str]

393

# Maps model names to download URLs for configurations

394

395

GPT2_PRETRAINED_MODEL_ARCHIVE_MAP: Dict[str, str]

396

# Maps model names to download URLs for pre-trained weights

397

```

398

399

**Available Pre-trained Models:**

400

- `gpt2`: 12-layer, 768-hidden, 12-heads, 117M parameters (small)

401

- `gpt2-medium`: 24-layer, 1024-hidden, 16-heads, 345M parameters

402

- `gpt2-large`: 36-layer, 1280-hidden, 20-heads, 762M parameters

403

- `gpt2-xl`: 48-layer, 1600-hidden, 25-heads, 1558M parameters

404

405

## Text Generation Strategies

406

407

GPT-2 models support various text generation strategies:

408

409

**Greedy Decoding**: Always selects the most likely next token

410

```python

411

output = model.generate(input_ids, do_sample=False)

412

```

413

414

**Sampling**: Randomly samples from the probability distribution

415

```python

416

output = model.generate(input_ids, do_sample=True, temperature=0.8)

417

```

418

419

**Top-k Sampling**: Samples from the k most likely tokens

420

```python

421

output = model.generate(input_ids, do_sample=True, top_k=50)

422

```

423

424

**Nucleus (Top-p) Sampling**: Samples from tokens whose cumulative probability exceeds p

425

```python

426

output = model.generate(input_ids, do_sample=True, top_p=0.9)

427

```

428

429

**Combined Strategies**: Use multiple techniques together

430

```python

431

output = model.generate(

432

input_ids,

433

do_sample=True,

434

temperature=0.8,

435

top_k=50,

436

top_p=0.9,

437

repetition_penalty=1.1

438

)

439

```