Tessl Tile for pypi/accelerate@1.10.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

big-modeling.md cli-commands.md configuration.md core-training.md distributed-operations.md index.md utilities.md

big-modeling.mddocs/

0
# Big Modeling
1

2
Device management utilities for handling large models that exceed single device memory through CPU/disk offloading, automatic device mapping, and efficient initialization strategies. These utilities enable training and inference with models that would otherwise be impossible to run.
3

4
## Capabilities
5

6
### CPU Offloading
7

8
Functions for offloading model parameters to CPU memory when not in use, automatically moving them to GPU during forward/backward passes.
9

10
```python { .api }
11
def cpu_offload(
12
    model: torch.nn.Module,
13
    execution_device: torch.device | None = None,
14
    offload_buffers: bool = False,
15
    state_dict: dict[str, torch.Tensor] | None = None,
16
    preload_module_classes: list[str] | None = None
17
):
18
    """
19
    Offload model to CPU with automatic device management hooks.
20
    
21
    Model parameters are moved to CPU and automatically transferred to
22
    execution device during forward pass, then moved back to CPU.
23
    
24
    Parameters:
25
    - model: Model to offload to CPU
26
    - execution_device: Device to use during computation (default: auto-detect)
27
    - offload_buffers: Whether to also offload buffer tensors
28
    - state_dict: Optional state dict to use for model parameters
29
    - preload_module_classes: Module classes to preload on execution device
30
    
31
    Returns:
32
    Model with CPU offloading hooks attached
33
    """
34

35
def cpu_offload_with_hook(
36
    model: torch.nn.Module,
37
    execution_device: torch.device | str | int | None = None,
38
    prev_module_hook: UserCpuOffloadHook | None = None
39
):
40
    """
41
    Advanced CPU offloading with custom hook chaining.
42
    
43
    Provides more control over offloading behavior and allows chaining
44
    multiple offloading hooks for complex model architectures.
45
    
46
    Parameters:
47
    - model: Model to offload
48
    - execution_device: Computation device
49
    - prev_module_hook: Previous hook in the chain
50
    
51
    Returns:
52
    Tuple of (model, hook) for hook chaining
53
    """
54
```
55

56
### Disk Offloading
57

58
Functions for offloading model parameters to disk storage for extremely large models that exceed total system memory.
59

60
```python { .api }
61
def disk_offload(
62
    model: torch.nn.Module,
63
    offload_dir: str | os.PathLike,
64
    execution_device: torch.device | str | int | None = None,
65
    offload_buffers: bool = False
66
):
67
    """
68
    Offload model parameters to disk storage.
69
    
70
    Parameters are saved to disk and loaded on-demand during computation.
71
    Slower than CPU offloading but enables handling arbitrarily large models.
72
    
73
    Parameters:
74
    - model: Model to offload to disk
75
    - offload_dir: Directory to store offloaded parameters
76
    - execution_device: Device for computation
77
    - offload_buffers: Whether to offload buffer tensors
78
    
79
    Returns:
80
    Model with disk offloading hooks
81
    """
82
```
83

84
### Device Mapping and Dispatch
85

86
Functions for automatically distributing model layers across multiple devices based on memory constraints and performance considerations.
87

88
```python { .api }
89
def dispatch_model(
90
    model: torch.nn.Module,
91
    device_map: dict[str, torch.device | str | int] | None = None,
92
    main_device: torch.device | str | int | None = None,
93
    state_dict: dict[str, torch.Tensor] | None = None,
94
    strict: bool = False,
95
    preload_module_classes: list[str] | None = None
96
):
97
    """
98
    Dispatch model layers across multiple devices.
99
    
100
    Automatically places model components on specified devices and sets up
101
    hooks for moving tensors between devices during forward pass.
102
    
103
    Parameters:
104
    - model: Model to dispatch across devices
105
    - device_map: Mapping of layer names to devices
106
    - main_device: Primary device for model execution
107
    - state_dict: Optional state dict to load
108
    - strict: Whether to strictly enforce device mapping
109
    - preload_module_classes: Module classes to preload
110
    
111
    Returns:
112
    Model with device dispatch hooks configured
113
    """
114

115
def infer_auto_device_map(
116
    model: torch.nn.Module,
117
    max_memory: dict[int | str, int | str] | None = None,
118
    no_split_module_classes: list[str] | None = None,
119
    dtype: torch.dtype | str | None = None,
120
    special_dtypes: dict[str, torch.dtype | str] | None = None,
121
    verbose: bool = False
122
):
123
    """
124
    Automatically infer optimal device mapping for model.
125
    
126
    Analyzes model architecture and memory constraints to determine
127
    the best placement of layers across available devices.
128
    
129
    Parameters:
130
    - model: Model to analyze
131
    - max_memory: Maximum memory per device (dict of device_id: memory)
132
    - no_split_module_classes: Module types that shouldn't be split
133
    - dtype: Data type for memory calculation
134
    - special_dtypes: Special data types for specific parameters
135
    - verbose: Whether to print mapping details
136
    
137
    Returns:
138
    Dict mapping layer names to optimal devices
139
    """
140
```
141

142
### Efficient Initialization
143

144
Functions for memory-efficient model initialization, particularly useful for large models.
145

146
```python { .api }
147
def init_empty_weights(include_buffers: bool = None):
148
    """
149
    Context manager for initializing models with empty tensors.
150
    
151
    Creates model structure without allocating memory for parameters,
152
    enabling initialization of models larger than available memory.
153
    
154
    Parameters:
155
    - include_buffers: Whether to initialize buffers as empty too
156
    
157
    Returns:
158
    Context manager for empty weight initialization
159
    """
160

161
def init_on_device(
162
    device: torch.device | str,
163
    include_buffers: bool = False
164
):
165
    """
166
    Context manager to initialize model directly on specified device.
167
    
168
    Avoids creating tensors on CPU first, reducing memory usage and
169
    improving initialization speed for large models.
170
    
171
    Parameters:
172
    - device: Target device for initialization
173
    - include_buffers: Whether to initialize buffers on device
174
    
175
    Returns:
176
    Context manager for device-specific initialization
177
    """
178
```
179

180
### Checkpoint Loading and Management
181

182
Functions for loading and managing model checkpoints with device mapping support.
183

184
```python { .api }
185
def load_checkpoint_and_dispatch(
186
    model: torch.nn.Module,
187
    checkpoint: str | os.PathLike,
188
    device_map: dict[str, torch.device | str | int] | None = None,
189
    max_memory: dict[int | str, int | str] | None = None,
190
    no_split_module_classes: list[str] | None = None,
191
    dtype: torch.dtype | str | None = None,
192
    offload_folder: str | os.PathLike | None = None,
193
    offload_state_dict: bool = False,
194
    strict: bool = False
195
):
196
    """
197
    Load checkpoint and dispatch model across devices.
198
    
199
    Combines checkpoint loading with automatic device mapping and
200
    offloading for large models that exceed device memory.
201
    
202
    Parameters:
203
    - model: Model to load checkpoint into
204
    - checkpoint: Path to checkpoint file
205
    - device_map: Manual device mapping (optional)
206
    - max_memory: Memory constraints per device
207
    - no_split_module_classes: Modules that shouldn't be split
208
    - dtype: Data type for parameters
209
    - offload_folder: Directory for offloaded parameters
210
    - offload_state_dict: Whether to offload full state dict
211
    - strict: Strict checkpoint loading
212
    
213
    Returns:
214
    Model with loaded checkpoint and device mapping applied
215
    """
216

217
def load_checkpoint_in_model(
218
    model: torch.nn.Module,
219
    checkpoint: str | os.PathLike,
220
    device_map: dict[str, torch.device | str | int] | None = None,
221
    offload_folder: str | os.PathLike | None = None,
222
    dtype: torch.dtype | str | None = None,
223
    offload_state_dict: bool = False,
224
    offload_buffers: bool = False,
225
    keep_in_fp32_modules: list[str] | None = None,
226
    strict: bool = False
227
):
228
    """
229
    Load checkpoint into model with advanced offloading options.
230
    
231
    Provides fine-grained control over checkpoint loading with support
232
    for mixed precision, selective offloading, and memory optimization.
233
    
234
    Parameters:
235
    - model: Target model
236
    - checkpoint: Checkpoint path
237
    - device_map: Device placement mapping
238
    - offload_folder: Offloading directory
239
    - dtype: Target data type
240
    - offload_state_dict: Offload entire state dict
241
    - offload_buffers: Offload buffer tensors
242
    - keep_in_fp32_modules: Modules to keep in FP32
243
    - strict: Strict loading mode
244
    
245
    Returns:
246
    List of missing and unexpected keys from checkpoint
247
    """
248
```
249

250
## Usage Examples
251

252
### Basic CPU Offloading
253

254
```python
255
from accelerate import cpu_offload
256
import torch
257
import torch.nn as nn
258

259
# Create large model
260
model = nn.Sequential(
261
    nn.Linear(10000, 10000),
262
    nn.ReLU(),
263
    nn.Linear(10000, 1000)
264
)
265

266
# Enable CPU offloading - model parameters move to CPU when not in use
267
model = cpu_offload(model, execution_device="cuda:0")
268

269
# Model automatically moves parameters to GPU during forward pass
270
with torch.no_grad():
271
    output = model(torch.randn(32, 10000, device="cuda:0"))
272
```
273

274
### Automatic Device Mapping
275

276
```python
277
from accelerate import infer_auto_device_map, dispatch_model
278
import torch
279

280
# Define memory constraints (in bytes or human-readable format)
281
max_memory = {
282
    0: "10GB",  # GPU 0 has 10GB available
283
    1: "10GB",  # GPU 1 has 10GB available
284
    "cpu": "50GB"  # 50GB CPU memory available
285
}
286

287
# Automatically determine optimal device placement
288
device_map = infer_auto_device_map(
289
    model,
290
    max_memory=max_memory,
291
    no_split_module_classes=["LlamaDecoderLayer", "GPT2Block"]
292
)
293

294
# Apply the device mapping
295
model = dispatch_model(model, device_map=device_map)
296
```
297

298
### Memory-Efficient Initialization
299

300
```python
301
from accelerate import init_empty_weights, load_checkpoint_and_dispatch
302

303
# Initialize model without allocating memory
304
with init_empty_weights():
305
    model = MyLargeModel(config)
306

307
# Load checkpoint with automatic device mapping
308
model = load_checkpoint_and_dispatch(
309
    model,
310
    checkpoint="path/to/checkpoint.bin",
311
    device_map="auto",
312
    max_memory={"0": "15GB", "1": "15GB", "cpu": "50GB"},
313
    offload_folder="./offload_weights"
314
)
315
```
316

317
### Disk Offloading for Extremely Large Models
318

319
```python
320
from accelerate import disk_offload
321
import tempfile
322

323
# Create temporary directory for offloaded weights
324
with tempfile.TemporaryDirectory() as temp_dir:
325
    # Offload model to disk - enables models larger than total RAM
326
    model = disk_offload(
327
        model,
328
        offload_dir=temp_dir,
329
        execution_device="cuda:0"
330
    )
331
    
332
    # Model parameters are loaded from disk on-demand
333
    output = model(input_tensor)
334
```
335

336
### Loading Large Model Checkpoints
337

338
```python
339
from accelerate import init_empty_weights, load_checkpoint_and_dispatch
340

341
# Initialize model structure without memory allocation
342
with init_empty_weights(include_buffers=True):
343
    model = AutoModel.from_pretrained(
344
        "microsoft/DialoGPT-large",
345
        torch_dtype=torch.float16
346
    )
347

348
# Load and dispatch with automatic device mapping
349
model = load_checkpoint_and_dispatch(
350
    model,
351
    "path/to/sharded/checkpoint",
352
    device_map="auto",
353
    max_memory={"0": "12GB", "cpu": "30GB"},
354
    dtype=torch.float16,
355
    offload_folder="./offload"
356
)
357
```

Version

Tile

Files

big-modeling.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

big-modeling.mddocs/