or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

auto-classes.mdbase-classes.mdbert-models.mdfile-utilities.mdgpt2-models.mdindex.mdoptimization.mdother-models.md

index.mddocs/

0

# PyTorch Transformers

1

2

A comprehensive Python library providing state-of-the-art pre-trained transformer models for Natural Language Processing (NLP) tasks. PyTorch Transformers includes PyTorch implementations, pre-trained model weights, usage scripts and conversion utilities for major transformer architectures including BERT, GPT/GPT-2, Transformer-XL, XLNet, XLM, RoBERTa, and DistilBERT.

3

4

## Package Information

5

6

- **Package Name**: pytorch-transformers

7

- **Package Type**: Library

8

- **Language**: Python

9

- **Installation**: `pip install pytorch-transformers`

10

11

## Core Imports

12

13

```python

14

import pytorch_transformers

15

```

16

17

Common patterns for working with models and tokenizers:

18

19

```python

20

from pytorch_transformers import AutoModel, AutoTokenizer

21

from pytorch_transformers import BertModel, BertTokenizer

22

from pytorch_transformers import GPT2Model, GPT2Tokenizer

23

```

24

25

## Basic Usage

26

27

```python

28

from pytorch_transformers import AutoModel, AutoTokenizer

29

30

# Load a pre-trained model and tokenizer

31

model_name = "bert-base-uncased"

32

tokenizer = AutoTokenizer.from_pretrained(model_name)

33

model = AutoModel.from_pretrained(model_name)

34

35

# Tokenize input text

36

text = "Hello, how are you?"

37

inputs = tokenizer(text, return_tensors="pt")

38

39

# Get model outputs

40

outputs = model(**inputs)

41

last_hidden_states = outputs.last_hidden_state

42

43

# For specific tasks like sequence classification

44

from pytorch_transformers import AutoModelForSequenceClassification

45

classifier = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased")

46

```

47

48

## Architecture

49

50

The library follows a consistent design pattern across all transformer architectures:

51

52

- **Auto Classes**: Factory classes that automatically select the appropriate model/tokenizer based on model name

53

- **Base Classes**: Abstract base classes (PreTrainedModel, PreTrainedTokenizer, PretrainedConfig) providing common interfaces

54

- **Model-Specific Classes**: Dedicated implementations for each transformer architecture with specialized task-specific variants

55

- **Configuration Classes**: Parameter containers for model initialization and customization

56

- **Tokenizers**: Architecture-specific text preprocessing with consistent encode/decode interfaces

57

58

This unified design enables seamless switching between different transformer architectures while maintaining consistent APIs for various NLP tasks including language modeling, sequence classification, question answering, and token classification.

59

60

## Capabilities

61

62

### Auto Classes

63

64

Factory classes that automatically select and instantiate the appropriate model, tokenizer, or configuration based on model name patterns. These provide the most convenient way to work with pre-trained models without needing to know the specific architecture.

65

66

```python { .api }

67

class AutoTokenizer:

68

@classmethod

69

def from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs): ...

70

71

class AutoModel:

72

@classmethod

73

def from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs): ...

74

75

class AutoConfig:

76

@classmethod

77

def from_pretrained(cls, pretrained_model_name_or_path, **kwargs): ...

78

```

79

80

[Auto Classes](./auto-classes.md)

81

82

### Base Classes

83

84

Core abstract base classes that define the common interface shared by all models, tokenizers, and configurations. These classes provide essential methods like `from_pretrained()` and `save_pretrained()` that enable consistent model and tokenizer loading/saving across all architectures.

85

86

```python { .api }

87

class PreTrainedModel:

88

@classmethod

89

def from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs): ...

90

91

def save_pretrained(self, save_directory): ...

92

def resize_token_embeddings(self, new_num_tokens): ...

93

94

class PreTrainedTokenizer:

95

@classmethod

96

def from_pretrained(cls, pretrained_model_name_or_path, **kwargs): ...

97

98

def save_pretrained(self, save_directory): ...

99

def tokenize(self, text): ...

100

def encode(self, text): ...

101

def decode(self, token_ids): ...

102

```

103

104

[Base Classes](./base-classes.md)

105

106

### BERT Models

107

108

BERT (Bidirectional Encoder Representations from Transformers) models for various NLP tasks including masked language modeling, next sentence prediction, sequence classification, token classification, and question answering.

109

110

```python { .api }

111

class BertModel:

112

@classmethod

113

def from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs): ...

114

115

class BertForSequenceClassification:

116

@classmethod

117

def from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs): ...

118

119

class BertTokenizer:

120

@classmethod

121

def from_pretrained(cls, pretrained_model_name_or_path, **kwargs): ...

122

```

123

124

[BERT Models](./bert-models.md)

125

126

### GPT-2 Models

127

128

GPT-2 (Generative Pre-trained Transformer 2) models for language generation tasks, including standard language modeling and multi-task models with both language modeling and classification heads.

129

130

```python { .api }

131

class GPT2Model:

132

@classmethod

133

def from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs): ...

134

135

class GPT2LMHeadModel:

136

@classmethod

137

def from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs): ...

138

139

class GPT2Tokenizer:

140

@classmethod

141

def from_pretrained(cls, pretrained_model_name_or_path, **kwargs): ...

142

```

143

144

[GPT-2 Models](./gpt2-models.md)

145

146

### Other Transformer Models

147

148

Additional transformer architectures including OpenAI GPT, Transformer-XL, XLNet, XLM, RoBERTa, and DistilBERT, each with their specific model variants and tokenizers optimized for different NLP tasks and languages.

149

150

```python { .api }

151

class XLNetModel:

152

@classmethod

153

def from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs): ...

154

155

class RobertaModel:

156

@classmethod

157

def from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs): ...

158

159

class DistilBertModel:

160

@classmethod

161

def from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs): ...

162

```

163

164

[Other Models](./other-models.md)

165

166

### Optimization

167

168

Specialized optimizers and learning rate schedulers designed for transformer training, including AdamW optimizer with weight decay fix and various warmup schedules commonly used in transformer fine-tuning.

169

170

```python { .api }

171

class AdamW:

172

def __init__(self, params, lr=1e-3, betas=(0.9, 0.999), eps=1e-8, weight_decay=0.01, correct_bias=True): ...

173

174

def WarmupLinearSchedule(optimizer, warmup_steps, t_total, last_epoch=-1): ...

175

def WarmupCosineSchedule(optimizer, warmup_steps, t_total, cycles=0.5, last_epoch=-1): ...

176

```

177

178

[Optimization](./optimization.md)

179

180

### File Utilities

181

182

File handling utilities for downloading, caching, and managing pre-trained model files. These utilities handle automatic download of model weights and configurations from remote repositories with local caching support.

183

184

```python { .api }

185

def cached_path(url_or_filename, cache_dir=None): ...

186

187

PYTORCH_TRANSFORMERS_CACHE: str

188

PYTORCH_PRETRAINED_BERT_CACHE: str

189

```

190

191

[File Utilities](./file-utilities.md)

192

193

## Constants

194

195

```python { .api }

196

__version__: str = "1.2.0"

197

198

# Model file names

199

WEIGHTS_NAME: str = "pytorch_model.bin"

200

CONFIG_NAME: str = "config.json"

201

TF_WEIGHTS_NAME: str = "model.ckpt"

202

203

# Archive maps (model name to URL mappings for pre-trained models)

204

BERT_PRETRAINED_MODEL_ARCHIVE_MAP: Dict[str, str]

205

GPT2_PRETRAINED_MODEL_ARCHIVE_MAP: Dict[str, str]

206

XLNET_PRETRAINED_MODEL_ARCHIVE_MAP: Dict[str, str]

207

# ... and similar maps for all other architectures

208

```

209

210

## Special Token Properties

211

212

All tokenizers support standard special tokens:

213

214

```python { .api }

215

# Special tokens available on all tokenizers

216

bos_token: str # Beginning of sequence

217

eos_token: str # End of sequence

218

unk_token: str # Unknown token

219

sep_token: str # Separator token

220

pad_token: str # Padding token

221

cls_token: str # Classification token

222

mask_token: str # Mask token for MLM

223

```