Tessl Tile for pypi/pytorch-pretrained-bert@0.6.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

bert-models.md gpt-models.md index.md optimizers.md tokenizers.md utilities.md

index.mddocs/

0
# PyTorch Pretrained BERT
1

2
PyTorch implementations of transformer-based language models including Google's BERT, OpenAI's GPT and GPT-2, and Google/CMU's Transformer-XL. This library provides pre-trained models, fine-tuning examples, tokenizers, and model architectures that match the performance of their original TensorFlow implementations, designed for researchers and practitioners working with state-of-the-art language models.
3

4
## Package Information
5

6
- **Package Name**: pytorch_pretrained_bert
7
- **Language**: Python
8
- **Installation**: `pip install pytorch_pretrained_bert`
9
- **Version**: 0.6.2
10

11
## Core Imports
12

13
```python
14
import pytorch_pretrained_bert
15
```
16

17
Common imports for specific functionality:
18

19
```python
20
# BERT models and tokenizer
21
from pytorch_pretrained_bert import (
22
    BertTokenizer, BertModel, BertForSequenceClassification,
23
    BertConfig, BertAdam
24
)
25

26
# OpenAI GPT models
27
from pytorch_pretrained_bert import (
28
    OpenAIGPTTokenizer, OpenAIGPTLMHeadModel, OpenAIGPTConfig
29
)
30

31
# GPT-2 models
32
from pytorch_pretrained_bert import (
33
    GPT2Tokenizer, GPT2LMHeadModel, GPT2Config
34
)
35

36
# Transformer-XL models
37
from pytorch_pretrained_bert import (
38
    TransfoXLTokenizer, TransfoXLLMHeadModel, TransfoXLConfig
39
)
40

41
# Utilities
42
from pytorch_pretrained_bert import cached_path, WEIGHTS_NAME, CONFIG_NAME
43
```
44

45
## Basic Usage
46

47
### BERT for Sequence Classification
48

49
```python
50
import torch
51
from pytorch_pretrained_bert import BertTokenizer, BertForSequenceClassification, BertConfig
52

53
# Load pre-trained model and tokenizer
54
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
55
model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)
56

57
# Tokenize input text
58
text = "Hello, my dog is cute"
59
tokens = tokenizer.tokenize(text)
60
input_ids = tokenizer.convert_tokens_to_ids(tokens)
61
input_ids = torch.tensor([input_ids])
62

63
# Forward pass
64
with torch.no_grad():
65
    outputs = model(input_ids)
66
    predictions = torch.nn.functional.softmax(outputs[0], dim=-1)
67

68
print(f"Predictions: {predictions}")
69
```
70

71
### GPT-2 Text Generation
72

73
```python
74
from pytorch_pretrained_bert import GPT2Tokenizer, GPT2LMHeadModel
75

76
# Load pre-trained GPT-2
77
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
78
model = GPT2LMHeadModel.from_pretrained('gpt2')
79

80
# Prepare input
81
input_text = "The future of artificial intelligence"
82
input_ids = tokenizer.encode(input_text)
83
input_ids = torch.tensor([input_ids])
84

85
# Forward pass to get next token predictions
86
with torch.no_grad():
87
    outputs = model(input_ids)
88
    predictions = outputs[0]  # Language modeling logits
89
    
90
    # Get next token probabilities
91
    next_token_logits = predictions[0, -1, :]
92
    next_token_probs = torch.softmax(next_token_logits, dim=-1)
93
    
94
    # Sample next token
95
    next_token_id = torch.multinomial(next_token_probs, 1).item()
96
    next_token = tokenizer.decode([next_token_id])
97
    
98
print(f"Input: {input_text}")
99
print(f"Next token: {next_token}")
100
```
101

102
## Architecture
103

104
The library is organized around four main transformer architectures:
105

106
- **BERT**: Bidirectional encoder for understanding tasks (classification, QA, NER)
107
- **OpenAI GPT**: Autoregressive decoder for generation and understanding
108
- **GPT-2**: Larger autoregressive model with byte-level BPE tokenization
109
- **Transformer-XL**: Extended context transformer with adaptive attention
110

111
Each model family includes:
112
- **Configuration classes**: Model hyperparameters and architecture settings
113
- **Model classes**: Various task-specific variants (base model, language modeling head, classification head)
114
- **Tokenizer classes**: Text preprocessing and encoding specific to each model
115
- **Weight loading utilities**: Functions to convert from original TensorFlow checkpoints
116

117
All models support the `from_pretrained()` class method for loading pre-trained weights with automatic download and caching.
118

119
## Capabilities
120

121
### BERT Models
122

123
Complete BERT model family including base model, task-specific variants, configuration, and tokenization for bidirectional language understanding tasks.
124

125
```python { .api }
126
class BertModel: ...
127
class BertForSequenceClassification: ...
128
class BertForQuestionAnswering: ...
129
class BertTokenizer: ...
130
class BertConfig: ...
131
```
132

133
[BERT Models](./bert-models.md)
134

135
### Tokenizers
136

137
Tokenization utilities for all supported model types, handling text preprocessing, encoding, decoding, and vocabulary management with model-specific tokenization strategies.
138

139
```python { .api }
140
class BertTokenizer: ...
141
class BasicTokenizer: ...
142
class WordpieceTokenizer: ...
143
class OpenAIGPTTokenizer: ...
144
class GPT2Tokenizer: ...
145
class TransfoXLTokenizer: ...
146
```
147

148
[Tokenizers](./tokenizers.md)
149

150
### GPT Models
151

152
OpenAI GPT, GPT-2, and Transformer-XL model families with their configurations and tokenizers for autoregressive language modeling and text generation tasks.
153

154
```python { .api }
155
class OpenAIGPTLMHeadModel: ...
156
class GPT2LMHeadModel: ...
157
class TransfoXLLMHeadModel: ...
158
```
159

160
[GPT Models](./gpt-models.md)
161

162
### Optimizers
163

164
Specialized optimizers with learning rate scheduling designed for transformer training, including BERT-specific and OpenAI-specific Adam variants.
165

166
```python { .api }
167
class BertAdam: ...
168
class OpenAIAdam: ...
169
```
170

171
[Optimizers](./optimizers.md)
172

173
### Utilities
174

175
File handling, caching, and model loading utilities for automatic download, caching of pre-trained models, and conversion from TensorFlow checkpoints.
176

177
```python { .api }
178
def cached_path(url_or_filename, cache_dir=None): ...
179
def load_tf_weights_in_bert(model, tf_checkpoint_path): ...
180
```
181

182
[Utilities](./utilities.md)
183

184
## Common Patterns
185

186
### Loading Pre-trained Models
187

188
All model classes support the standard `from_pretrained()` pattern:
189

190
```python
191
# Load model with default configuration
192
model = BertModel.from_pretrained('bert-base-uncased')
193

194
# Load with custom cache directory
195
model = BertModel.from_pretrained('bert-base-uncased', cache_dir='./models/')
196

197
# Load tokenizer
198
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
199
```
200

201
### Fine-tuning Setup
202

203
```python
204
from pytorch_pretrained_bert import BertForSequenceClassification, BertAdam
205

206
# Load model for fine-tuning
207
model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=3)
208

209
# Setup optimizer with learning rate scheduling
210
param_optimizer = list(model.named_parameters())
211
no_decay = ['bias', 'LayerNorm.bias', 'LayerNorm.weight']
212
optimizer_grouped_parameters = [
213
    {'params': [p for n, p in param_optimizer if not any(nd in n for nd in no_decay)], 'weight_decay': 0.01},
214
    {'params': [p for n, p in param_optimizer if any(nd in n for nd in no_decay)], 'weight_decay': 0.0}
215
]
216

217
optimizer = BertAdam(optimizer_grouped_parameters,
218
                     lr=2e-5,
219
                     warmup=0.1,
220
                     t_total=num_train_steps)
221
```
222

223
### Converting TensorFlow Checkpoints
224

225
```python
226
from pytorch_pretrained_bert import BertModel, load_tf_weights_in_bert
227

228
# Create PyTorch model
229
model = BertModel(config)
230

231
# Load TensorFlow weights
232
load_tf_weights_in_bert(model, tf_checkpoint_path)
233
```

Version

Tile

Files

index.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

index.mddocs/