or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

bert-models.mdgpt-models.mdindex.mdoptimizers.mdtokenizers.mdutilities.md

index.mddocs/

0

# PyTorch Pretrained BERT

1

2

PyTorch implementations of transformer-based language models including Google's BERT, OpenAI's GPT and GPT-2, and Google/CMU's Transformer-XL. This library provides pre-trained models, fine-tuning examples, tokenizers, and model architectures that match the performance of their original TensorFlow implementations, designed for researchers and practitioners working with state-of-the-art language models.

3

4

## Package Information

5

6

- **Package Name**: pytorch_pretrained_bert

7

- **Language**: Python

8

- **Installation**: `pip install pytorch_pretrained_bert`

9

- **Version**: 0.6.2

10

11

## Core Imports

12

13

```python

14

import pytorch_pretrained_bert

15

```

16

17

Common imports for specific functionality:

18

19

```python

20

# BERT models and tokenizer

21

from pytorch_pretrained_bert import (

22

BertTokenizer, BertModel, BertForSequenceClassification,

23

BertConfig, BertAdam

24

)

25

26

# OpenAI GPT models

27

from pytorch_pretrained_bert import (

28

OpenAIGPTTokenizer, OpenAIGPTLMHeadModel, OpenAIGPTConfig

29

)

30

31

# GPT-2 models

32

from pytorch_pretrained_bert import (

33

GPT2Tokenizer, GPT2LMHeadModel, GPT2Config

34

)

35

36

# Transformer-XL models

37

from pytorch_pretrained_bert import (

38

TransfoXLTokenizer, TransfoXLLMHeadModel, TransfoXLConfig

39

)

40

41

# Utilities

42

from pytorch_pretrained_bert import cached_path, WEIGHTS_NAME, CONFIG_NAME

43

```

44

45

## Basic Usage

46

47

### BERT for Sequence Classification

48

49

```python

50

import torch

51

from pytorch_pretrained_bert import BertTokenizer, BertForSequenceClassification, BertConfig

52

53

# Load pre-trained model and tokenizer

54

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

55

model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)

56

57

# Tokenize input text

58

text = "Hello, my dog is cute"

59

tokens = tokenizer.tokenize(text)

60

input_ids = tokenizer.convert_tokens_to_ids(tokens)

61

input_ids = torch.tensor([input_ids])

62

63

# Forward pass

64

with torch.no_grad():

65

outputs = model(input_ids)

66

predictions = torch.nn.functional.softmax(outputs[0], dim=-1)

67

68

print(f"Predictions: {predictions}")

69

```

70

71

### GPT-2 Text Generation

72

73

```python

74

from pytorch_pretrained_bert import GPT2Tokenizer, GPT2LMHeadModel

75

76

# Load pre-trained GPT-2

77

tokenizer = GPT2Tokenizer.from_pretrained('gpt2')

78

model = GPT2LMHeadModel.from_pretrained('gpt2')

79

80

# Prepare input

81

input_text = "The future of artificial intelligence"

82

input_ids = tokenizer.encode(input_text)

83

input_ids = torch.tensor([input_ids])

84

85

# Forward pass to get next token predictions

86

with torch.no_grad():

87

outputs = model(input_ids)

88

predictions = outputs[0] # Language modeling logits

89

90

# Get next token probabilities

91

next_token_logits = predictions[0, -1, :]

92

next_token_probs = torch.softmax(next_token_logits, dim=-1)

93

94

# Sample next token

95

next_token_id = torch.multinomial(next_token_probs, 1).item()

96

next_token = tokenizer.decode([next_token_id])

97

98

print(f"Input: {input_text}")

99

print(f"Next token: {next_token}")

100

```

101

102

## Architecture

103

104

The library is organized around four main transformer architectures:

105

106

- **BERT**: Bidirectional encoder for understanding tasks (classification, QA, NER)

107

- **OpenAI GPT**: Autoregressive decoder for generation and understanding

108

- **GPT-2**: Larger autoregressive model with byte-level BPE tokenization

109

- **Transformer-XL**: Extended context transformer with adaptive attention

110

111

Each model family includes:

112

- **Configuration classes**: Model hyperparameters and architecture settings

113

- **Model classes**: Various task-specific variants (base model, language modeling head, classification head)

114

- **Tokenizer classes**: Text preprocessing and encoding specific to each model

115

- **Weight loading utilities**: Functions to convert from original TensorFlow checkpoints

116

117

All models support the `from_pretrained()` class method for loading pre-trained weights with automatic download and caching.

118

119

## Capabilities

120

121

### BERT Models

122

123

Complete BERT model family including base model, task-specific variants, configuration, and tokenization for bidirectional language understanding tasks.

124

125

```python { .api }

126

class BertModel: ...

127

class BertForSequenceClassification: ...

128

class BertForQuestionAnswering: ...

129

class BertTokenizer: ...

130

class BertConfig: ...

131

```

132

133

[BERT Models](./bert-models.md)

134

135

### Tokenizers

136

137

Tokenization utilities for all supported model types, handling text preprocessing, encoding, decoding, and vocabulary management with model-specific tokenization strategies.

138

139

```python { .api }

140

class BertTokenizer: ...

141

class BasicTokenizer: ...

142

class WordpieceTokenizer: ...

143

class OpenAIGPTTokenizer: ...

144

class GPT2Tokenizer: ...

145

class TransfoXLTokenizer: ...

146

```

147

148

[Tokenizers](./tokenizers.md)

149

150

### GPT Models

151

152

OpenAI GPT, GPT-2, and Transformer-XL model families with their configurations and tokenizers for autoregressive language modeling and text generation tasks.

153

154

```python { .api }

155

class OpenAIGPTLMHeadModel: ...

156

class GPT2LMHeadModel: ...

157

class TransfoXLLMHeadModel: ...

158

```

159

160

[GPT Models](./gpt-models.md)

161

162

### Optimizers

163

164

Specialized optimizers with learning rate scheduling designed for transformer training, including BERT-specific and OpenAI-specific Adam variants.

165

166

```python { .api }

167

class BertAdam: ...

168

class OpenAIAdam: ...

169

```

170

171

[Optimizers](./optimizers.md)

172

173

### Utilities

174

175

File handling, caching, and model loading utilities for automatic download, caching of pre-trained models, and conversion from TensorFlow checkpoints.

176

177

```python { .api }

178

def cached_path(url_or_filename, cache_dir=None): ...

179

def load_tf_weights_in_bert(model, tf_checkpoint_path): ...

180

```

181

182

[Utilities](./utilities.md)

183

184

## Common Patterns

185

186

### Loading Pre-trained Models

187

188

All model classes support the standard `from_pretrained()` pattern:

189

190

```python

191

# Load model with default configuration

192

model = BertModel.from_pretrained('bert-base-uncased')

193

194

# Load with custom cache directory

195

model = BertModel.from_pretrained('bert-base-uncased', cache_dir='./models/')

196

197

# Load tokenizer

198

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

199

```

200

201

### Fine-tuning Setup

202

203

```python

204

from pytorch_pretrained_bert import BertForSequenceClassification, BertAdam

205

206

# Load model for fine-tuning

207

model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=3)

208

209

# Setup optimizer with learning rate scheduling

210

param_optimizer = list(model.named_parameters())

211

no_decay = ['bias', 'LayerNorm.bias', 'LayerNorm.weight']

212

optimizer_grouped_parameters = [

213

{'params': [p for n, p in param_optimizer if not any(nd in n for nd in no_decay)], 'weight_decay': 0.01},

214

{'params': [p for n, p in param_optimizer if any(nd in n for nd in no_decay)], 'weight_decay': 0.0}

215

]

216

217

optimizer = BertAdam(optimizer_grouped_parameters,

218

lr=2e-5,

219

warmup=0.1,

220

t_total=num_train_steps)

221

```

222

223

### Converting TensorFlow Checkpoints

224

225

```python

226

from pytorch_pretrained_bert import BertModel, load_tf_weights_in_bert

227

228

# Create PyTorch model

229

model = BertModel(config)

230

231

# Load TensorFlow weights

232

load_tf_weights_in_bert(model, tf_checkpoint_path)

233

```