Tessl Tile for pypi/pytorch-transformers@1.2.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

auto-classes.md base-classes.md bert-models.md file-utilities.md gpt2-models.md index.md optimization.md other-models.md

bert-models.mddocs/

0
# BERT Models
1

2
BERT (Bidirectional Encoder Representations from Transformers) models for various NLP tasks. BERT uses bidirectional attention to understand context from both directions, making it highly effective for understanding-based tasks like classification, question answering, and token-level predictions.
3

4
## Capabilities
5

6
### BertConfig
7

8
Configuration class for BERT models containing all hyperparameters and architecture specifications.
9

10
```python { .api }
11
class BertConfig(PretrainedConfig):
12
    def __init__(
13
        self,
14
        vocab_size=30522,
15
        hidden_size=768,
16
        num_hidden_layers=12,
17
        num_attention_heads=12,
18
        intermediate_size=3072,
19
        hidden_act="gelu",
20
        hidden_dropout_prob=0.1,
21
        attention_probs_dropout_prob=0.1,
22
        max_position_embeddings=512,
23
        type_vocab_size=2,
24
        initializer_range=0.02,
25
        layer_norm_eps=1e-12,
26
        **kwargs
27
    ):
28
        """
29
        Configuration for BERT models.
30
        
31
        Parameters:
32
        - vocab_size (int): Vocabulary size
33
        - hidden_size (int): Hidden layer dimensionality
34
        - num_hidden_layers (int): Number of transformer layers
35
        - num_attention_heads (int): Number of attention heads per layer
36
        - intermediate_size (int): Feed-forward layer dimensionality
37
        - hidden_act (str): Activation function ("gelu", "relu", "swish")
38
        - hidden_dropout_prob (float): Dropout probability for hidden layers
39
        - attention_probs_dropout_prob (float): Dropout for attention probabilities
40
        - max_position_embeddings (int): Maximum sequence length
41
        - type_vocab_size (int): Number of token type embeddings
42
        - initializer_range (float): Weight initialization range
43
        - layer_norm_eps (float): Layer normalization epsilon
44
        """
45
```
46

47
### BertModel
48

49
Base BERT model for encoding sequences into contextualized representations.
50

51
```python { .api }
52
class BertModel(PreTrainedModel):
53
    def __init__(self, config):
54
        """
55
        Initialize BERT base model.
56
        
57
        Parameters:
58
        - config (BertConfig): Model configuration
59
        """
60
    
61
    def forward(
62
        self,
63
        input_ids=None,
64
        attention_mask=None,
65
        token_type_ids=None,
66
        position_ids=None,
67
        head_mask=None,
68
        inputs_embeds=None
69
    ):
70
        """
71
        Forward pass through BERT model.
72
        
73
        Parameters:
74
        - input_ids (torch.Tensor): Token IDs of shape (batch_size, sequence_length)
75
        - attention_mask (torch.Tensor): Attention mask to avoid padding tokens
76
        - token_type_ids (torch.Tensor): Segment token indices for sentence pairs
77
        - position_ids (torch.Tensor): Position indices
78
        - head_mask (torch.Tensor): Mask to nullify selected heads
79
        - inputs_embeds (torch.Tensor): Pre-computed embeddings
80
        
81
        Returns:
82
        BaseModelOutputWithPooling: Object with last_hidden_state and pooler_output
83
        """
84
```
85

86
**Usage Example:**
87

88
```python
89
from pytorch_transformers import BertModel, BertTokenizer
90
import torch
91

92
# Load model and tokenizer
93
model = BertModel.from_pretrained("bert-base-uncased")
94
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
95

96
# Prepare input
97
text = "The quick brown fox jumps over the lazy dog."
98
inputs = tokenizer(text, return_tensors="pt")
99

100
# Get model outputs
101
with torch.no_grad():
102
    outputs = model(**inputs)
103
    
104
# Access representations
105
last_hidden_state = outputs.last_hidden_state  # Shape: (1, seq_len, 768)
106
pooled_output = outputs.pooler_output          # Shape: (1, 768)
107

108
print(f"Sequence representation shape: {last_hidden_state.shape}")
109
print(f"Pooled representation shape: {pooled_output.shape}")
110
```
111

112
### BertPreTrainedModel
113

114
Abstract base class for all BERT models that handles weight initialization and provides a simple interface for downloading and loading pre-trained models.
115

116
```python { .api }
117
class BertPreTrainedModel(PreTrainedModel):
118
    config_class = BertConfig
119
    pretrained_model_archive_map = BERT_PRETRAINED_MODEL_ARCHIVE_MAP
120
    load_tf_weights = load_tf_weights_in_bert
121
    base_model_prefix = "bert"
122
    
123
    def _init_weights(self, module):
124
        """
125
        Initialize the weights for BERT models.
126
        
127
        Parameters:
128
        - module (nn.Module): Module to initialize
129
        """
130
```
131

132
**Usage Example:**
133

134
```python
135
from pytorch_transformers import BertPreTrainedModel, BertConfig
136

137
# BertPreTrainedModel is typically used as a base class for custom BERT models
138
class CustomBertModel(BertPreTrainedModel):
139
    def __init__(self, config):
140
        super().__init__(config)
141
        # Custom model implementation
142
        
143
    def forward(self, input_ids):
144
        # Custom forward implementation
145
        pass
146

147
# Initialize with proper weight initialization
148
config = BertConfig()
149
model = CustomBertModel(config)
150
# Weights are automatically initialized according to BERT standards
151
```
152

153
### BertForPreTraining
154

155
BERT model for pre-training with both masked language modeling and next sentence prediction heads.
156

157
```python { .api }
158
class BertForPreTraining(BertPreTrainedModel):
159
    def __init__(self, config):
160
        """
161
        Initialize BERT for pre-training with MLM and NSP heads.
162
        
163
        Parameters:
164
        - config (BertConfig): Model configuration
165
        """
166
    
167
    def forward(
168
        self,
169
        input_ids=None,
170
        attention_mask=None,
171
        token_type_ids=None,
172
        position_ids=None,
173
        head_mask=None,
174
        inputs_embeds=None,
175
        masked_lm_labels=None,
176
        next_sentence_label=None
177
    ):
178
        """
179
        Forward pass for pre-training with MLM and NSP tasks.
180
        
181
        Parameters:
182
        - input_ids (torch.Tensor): Token IDs
183
        - attention_mask (torch.Tensor): Attention mask
184
        - token_type_ids (torch.Tensor): Segment token indices
185
        - position_ids (torch.Tensor): Position indices
186
        - head_mask (torch.Tensor): Head mask
187
        - inputs_embeds (torch.Tensor): Pre-computed embeddings
188
        - masked_lm_labels (torch.Tensor): Labels for MLM loss
189
        - next_sentence_label (torch.Tensor): Labels for NSP loss
190
        
191
        Returns:
192
        BertForPreTrainingOutput: Object with prediction_logits, seq_relationship_logits, and losses
193
        """
194
```
195

196
**Usage Example:**
197

198
```python
199
from pytorch_transformers import BertForPreTraining, BertTokenizer
200
import torch
201

202
# Load model and tokenizer
203
model = BertForPreTraining.from_pretrained("bert-base-uncased")
204
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
205

206
# Prepare pre-training data
207
text_a = "The cat sat on the"
208
text_b = "mat and slept peacefully"
209
inputs = tokenizer(text_a, text_b, return_tensors="pt")
210

211
# Add masked LM labels (replace some tokens with [MASK])
212
masked_inputs = inputs.copy()
213
masked_inputs['input_ids'][0, 5] = tokenizer.mask_token_id  # Mask "on"
214
masked_lm_labels = inputs['input_ids'].clone()
215
masked_lm_labels[masked_inputs['input_ids'] != tokenizer.mask_token_id] = -1
216

217
# Add NSP label (0 = sentence B follows A, 1 = random sentence B)
218
next_sentence_label = torch.tensor([0])
219

220
# Forward pass
221
outputs = model(**masked_inputs, 
222
                masked_lm_labels=masked_lm_labels,
223
                next_sentence_label=next_sentence_label)
224

225
print(f"MLM loss: {outputs.loss}")
226
print(f"NSP predictions: {torch.softmax(outputs.seq_relationship_logits, dim=-1)}")
227
```
228

229
### BertForNextSentencePrediction
230

231
BERT model with only a next sentence prediction head for determining if two sentences are consecutive.
232

233
```python { .api }
234
class BertForNextSentencePrediction(BertPreTrainedModel):
235
    def __init__(self, config):
236
        """
237
        Initialize BERT for next sentence prediction task.
238
        
239
        Parameters:
240
        - config (BertConfig): Model configuration
241
        """
242
    
243
    def forward(
244
        self,
245
        input_ids=None,
246
        attention_mask=None,
247
        token_type_ids=None,
248
        position_ids=None,
249
        head_mask=None,
250
        inputs_embeds=None,
251
        next_sentence_label=None
252
    ):
253
        """
254
        Forward pass for next sentence prediction.
255
        
256
        Parameters:
257
        - input_ids (torch.Tensor): Token IDs for sentence pair
258
        - attention_mask (torch.Tensor): Attention mask
259
        - token_type_ids (torch.Tensor): Segment token indices (0 for sentence A, 1 for sentence B)
260
        - position_ids (torch.Tensor): Position indices
261
        - head_mask (torch.Tensor): Head mask
262
        - inputs_embeds (torch.Tensor): Pre-computed embeddings
263
        - next_sentence_label (torch.Tensor): Labels (0=consecutive, 1=random)
264
        
265
        Returns:
266
        NextSentencePredictorOutput: Object with seq_relationship_logits and loss
267
        """
268
```
269

270
**Usage Example:**
271

272
```python
273
from pytorch_transformers import BertForNextSentencePrediction, BertTokenizer
274
import torch
275

276
# Load model and tokenizer
277
model = BertForNextSentencePrediction.from_pretrained("bert-base-uncased")
278
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
279

280
# Prepare sentence pairs
281
sentence_a = "The weather is nice today"
282
sentence_b = "I think I'll go for a walk"  # Consecutive sentence
283
sentence_c = "Machine learning is fascinating"  # Random sentence
284

285
# Encode pairs
286
consecutive_inputs = tokenizer(sentence_a, sentence_b, return_tensors="pt")
287
random_inputs = tokenizer(sentence_a, sentence_c, return_tensors="pt")
288

289
# Predict
290
with torch.no_grad():
291
    consecutive_outputs = model(**consecutive_inputs)
292
    random_outputs = model(**random_inputs)
293
    
294
# Get predictions (0=consecutive, 1=random)
295
consecutive_probs = torch.softmax(consecutive_outputs.logits, dim=-1)
296
random_probs = torch.softmax(random_outputs.logits, dim=-1)
297

298
print(f"Consecutive pair - P(consecutive): {consecutive_probs[0, 0]:.3f}")
299
print(f"Random pair - P(consecutive): {random_probs[0, 0]:.3f}")
300
```
301

302
### BertForMaskedLM
303

304
BERT model with a language modeling head for masked language modeling (MLM) tasks.
305

306
```python { .api }
307
class BertForMaskedLM(PreTrainedModel):
308
    def __init__(self, config):
309
        """
310
        Initialize BERT for masked language modeling.
311
        
312
        Parameters:
313
        - config (BertConfig): Model configuration
314
        """
315
    
316
    def forward(
317
        self,
318
        input_ids=None,
319
        attention_mask=None,
320
        token_type_ids=None,
321
        position_ids=None,
322
        head_mask=None,
323
        inputs_embeds=None,
324
        labels=None
325
    ):
326
        """
327
        Forward pass for masked language modeling.
328
        
329
        Parameters:
330
        - input_ids (torch.Tensor): Token IDs with [MASK] tokens
331
        - attention_mask (torch.Tensor): Attention mask
332
        - token_type_ids (torch.Tensor): Segment token indices
333
        - position_ids (torch.Tensor): Position indices
334
        - head_mask (torch.Tensor): Head mask
335
        - inputs_embeds (torch.Tensor): Pre-computed embeddings
336
        - labels (torch.Tensor): True token IDs for masked positions
337
        
338
        Returns:
339
        MaskedLMOutput: Object with loss and prediction_scores
340
        """
341
```
342

343
### BertForSequenceClassification
344

345
BERT model with a classification head for sequence-level classification tasks.
346

347
```python { .api }
348
class BertForSequenceClassification(PreTrainedModel):
349
    def __init__(self, config):
350
        """
351
        Initialize BERT for sequence classification.
352
        
353
        Parameters:
354
        - config (BertConfig): Model configuration with num_labels
355
        """
356
    
357
    def forward(
358
        self,
359
        input_ids=None,
360
        attention_mask=None,
361
        token_type_ids=None,
362
        position_ids=None,
363
        head_mask=None,
364
        inputs_embeds=None,
365
        labels=None
366
    ):
367
        """
368
        Forward pass for sequence classification.
369
        
370
        Parameters:
371
        - input_ids (torch.Tensor): Token IDs
372
        - attention_mask (torch.Tensor): Attention mask
373
        - token_type_ids (torch.Tensor): Segment token indices
374
        - position_ids (torch.Tensor): Position indices
375
        - head_mask (torch.Tensor): Head mask
376
        - inputs_embeds (torch.Tensor): Pre-computed embeddings
377
        - labels (torch.Tensor): Classification labels
378
        
379
        Returns:
380
        SequenceClassifierOutput: Object with loss and logits
381
        """
382
```
383

384
**Usage Example:**
385

386
```python
387
from pytorch_transformers import BertForSequenceClassification, BertTokenizer
388
import torch
389

390
# Load model for binary classification
391
model = BertForSequenceClassification.from_pretrained(
392
    "bert-base-uncased", 
393
    num_labels=2
394
)
395
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
396

397
# Prepare input
398
text = "This movie is fantastic!"
399
inputs = tokenizer(text, return_tensors="pt")
400

401
# Get predictions
402
with torch.no_grad():
403
    outputs = model(**inputs)
404
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
405
    
406
print(f"Positive probability: {predictions[0][1].item():.3f}")
407
```
408

409
### BertForQuestionAnswering
410

411
BERT model with a span classification head for extractive question answering.
412

413
```python { .api }
414
class BertForQuestionAnswering(PreTrainedModel):
415
    def __init__(self, config):
416
        """
417
        Initialize BERT for question answering.
418
        
419
        Parameters:
420
        - config (BertConfig): Model configuration
421
        """
422
    
423
    def forward(
424
        self,
425
        input_ids=None,
426
        attention_mask=None,
427
        token_type_ids=None,
428
        position_ids=None,
429
        head_mask=None,
430
        inputs_embeds=None,
431
        start_positions=None,
432
        end_positions=None
433
    ):
434
        """
435
        Forward pass for question answering.
436
        
437
        Parameters:
438
        - input_ids (torch.Tensor): Token IDs for question and context
439
        - attention_mask (torch.Tensor): Attention mask
440
        - token_type_ids (torch.Tensor): Segment IDs (0 for question, 1 for context)
441
        - position_ids (torch.Tensor): Position indices
442
        - head_mask (torch.Tensor): Head mask
443
        - inputs_embeds (torch.Tensor): Pre-computed embeddings
444
        - start_positions (torch.Tensor): Start positions of answer spans
445
        - end_positions (torch.Tensor): End positions of answer spans
446
        
447
        Returns:
448
        QuestionAnsweringModelOutput: Object with loss, start_logits, end_logits
449
        """
450
```
451

452
**Usage Example:**
453

454
```python
455
from pytorch_transformers import BertForQuestionAnswering, BertTokenizer
456
import torch
457

458
# Load model and tokenizer
459
model = BertForQuestionAnswering.from_pretrained("bert-base-uncased")
460
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
461

462
# Prepare question and context
463
question = "What is the capital of France?"
464
context = "France is a country in Europe. The capital of France is Paris."
465

466
# Tokenize with proper formatting
467
inputs = tokenizer.encode_plus(
468
    question,
469
    context,
470
    return_tensors="pt",
471
    max_length=512,
472
    truncation=True
473
)
474

475
# Get answer span predictions
476
with torch.no_grad():
477
    outputs = model(**inputs)
478
    start_scores = outputs.start_logits
479
    end_scores = outputs.end_logits
480
    
481
    # Find best answer span
482
    start_idx = torch.argmax(start_scores)
483
    end_idx = torch.argmax(end_scores)
484
    
485
    # Extract answer
486
    answer_tokens = inputs["input_ids"][0][start_idx:end_idx+1]
487
    answer = tokenizer.decode(answer_tokens)
488
    print(f"Answer: {answer}")
489
```
490

491
### BertForTokenClassification
492

493
BERT model with a token classification head for token-level tasks like named entity recognition.
494

495
```python { .api }
496
class BertForTokenClassification(PreTrainedModel):
497
    def __init__(self, config):
498
        """
499
        Initialize BERT for token classification.
500
        
501
        Parameters:
502
        - config (BertConfig): Model configuration with num_labels
503
        """
504
    
505
    def forward(
506
        self,
507
        input_ids=None,
508
        attention_mask=None,
509
        token_type_ids=None,
510
        position_ids=None,
511
        head_mask=None,
512
        inputs_embeds=None,
513
        labels=None
514
    ):
515
        """
516
        Forward pass for token classification.
517
        
518
        Parameters:
519
        - input_ids (torch.Tensor): Token IDs
520
        - attention_mask (torch.Tensor): Attention mask
521
        - token_type_ids (torch.Tensor): Segment token indices
522
        - position_ids (torch.Tensor): Position indices
523
        - head_mask (torch.Tensor): Head mask
524
        - inputs_embeds (torch.Tensor): Pre-computed embeddings
525
        - labels (torch.Tensor): Token-level labels
526
        
527
        Returns:
528
        TokenClassifierOutput: Object with loss and logits
529
        """
530
```
531

532
### BertForMultipleChoice
533

534
BERT model for multiple choice tasks with a classification head over multiple choice options.
535

536
```python { .api }
537
class BertForMultipleChoice(PreTrainedModel):
538
    def __init__(self, config):
539
        """
540
        Initialize BERT for multiple choice.
541
        
542
        Parameters:
543
        - config (BertConfig): Model configuration
544
        """
545
    
546
    def forward(
547
        self,
548
        input_ids=None,
549
        attention_mask=None,
550
        token_type_ids=None,
551
        position_ids=None,
552
        head_mask=None,
553
        inputs_embeds=None,
554
        labels=None
555
    ):
556
        """
557
        Forward pass for multiple choice.
558
        
559
        Parameters:
560
        - input_ids (torch.Tensor): Token IDs of shape (batch_size, num_choices, seq_len)
561
        - attention_mask (torch.Tensor): Attention mask
562
        - token_type_ids (torch.Tensor): Segment token indices
563
        - position_ids (torch.Tensor): Position indices
564
        - head_mask (torch.Tensor): Head mask
565
        - inputs_embeds (torch.Tensor): Pre-computed embeddings
566
        - labels (torch.Tensor): Correct choice indices
567
        
568
        Returns:
569
        MultipleChoiceModelOutput: Object with loss and logits
570
        """
571
```
572

573
### BertTokenizer
574

575
WordPiece tokenizer for BERT models with proper handling of special tokens and subword tokenization.
576

577
```python { .api }
578
class BertTokenizer(PreTrainedTokenizer):
579
    def __init__(
580
        self,
581
        vocab_file,
582
        do_lower_case=True,
583
        do_basic_tokenize=True,
584
        never_split=None,
585
        unk_token="[UNK]",
586
        sep_token="[SEP]",
587
        pad_token="[PAD]",
588
        cls_token="[CLS]",
589
        mask_token="[MASK]",
590
        tokenize_chinese_chars=True,
591
        **kwargs
592
    ):
593
        """
594
        Initialize BERT tokenizer.
595
        
596
        Parameters:
597
        - vocab_file (str): Path to vocabulary file
598
        - do_lower_case (bool): Whether to lowercase input
599
        - do_basic_tokenize (bool): Whether to do basic tokenization
600
        - never_split (List[str]): Tokens never to split
601
        - unk_token (str): Unknown token
602
        - sep_token (str): Separator token
603
        - pad_token (str): Padding token
604
        - cls_token (str): Classification token
605
        - mask_token (str): Mask token
606
        - tokenize_chinese_chars (bool): Whether to tokenize Chinese characters
607
        """
608
```
609

610
## Utility Functions
611

612
### load_tf_weights_in_bert
613

614
```python { .api }
615
def load_tf_weights_in_bert(model, tf_checkpoint_path):
616
    """
617
    Load TensorFlow BERT checkpoint weights into a PyTorch BERT model.
618
    
619
    Parameters:
620
    - model (BertModel): PyTorch BERT model
621
    - tf_checkpoint_path (str): Path to TensorFlow checkpoint
622
    
623
    Returns:
624
    BertModel: Model with loaded weights
625
    """
626
```
627

628
## Archive Maps
629

630
```python { .api }
631
BERT_PRETRAINED_MODEL_ARCHIVE_MAP: Dict[str, str]
632
# Maps model names to download URLs for pre-trained weights
633

634
BERT_PRETRAINED_CONFIG_ARCHIVE_MAP: Dict[str, str]  
635
# Maps model names to download URLs for configurations
636
```
637

638
**Available Pre-trained Models:**
639
- `bert-base-uncased`: 12-layer, 768-hidden, 12-heads, 110M parameters
640
- `bert-large-uncased`: 24-layer, 1024-hidden, 16-heads, 340M parameters
641
- `bert-base-cased`: 12-layer, 768-hidden, 12-heads, 110M parameters (cased)
642
- `bert-large-cased`: 24-layer, 1024-hidden, 16-heads, 340M parameters (cased)
643
- `bert-base-multilingual-uncased`: 12-layer, 768-hidden, 12-heads, 110M parameters (multilingual)
644
- `bert-base-multilingual-cased`: 12-layer, 768-hidden, 12-heads, 110M parameters (multilingual, cased)
645
- `bert-base-chinese`: 12-layer, 768-hidden, 12-heads, 110M parameters (Chinese)

Version

Tile

Files

bert-models.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

bert-models.mddocs/