Tessl Tile for pypi/accelerate@1.10.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

big-modeling.md cli-commands.md configuration.md core-training.md distributed-operations.md index.md utilities.md

cli-commands.mddocs/

0
# CLI Commands
1

2
Command-line tools for configuration, launching distributed training, memory estimation, and environment management. These tools provide an easy interface for setting up and managing distributed training workflows.
3

4
## Capabilities
5

6
### Configuration Management
7

8
Interactive configuration setup and management commands.
9

10
```bash { .api }
11
accelerate config
12
```
13

14
**Description**: Interactive configuration wizard that guides users through setting up distributed training configuration.
15

16
**Options**:
17
- `--config_file PATH` - Specify custom config file location (default: `~/.cache/huggingface/accelerate/default_config.yaml`)
18

19
**Usage**:
20
```bash
21
# Run interactive configuration
22
accelerate config
23

24
# Use custom config file location  
25
accelerate config --config_file ./my_config.yaml
26
```
27

28
**Interactive Options**:
29
- Compute environment (local machine, SageMaker, etc.)
30
- Distributed training type (no distributed, multi-GPU, multi-node, etc.)
31
- Number of processes/GPUs to use
32
- Mixed precision training mode (no, fp16, bf16, fp8)
33
- DeepSpeed or FSDP configuration
34
- Machine rank and addresses for multi-node training
35

36
### Training Launch
37

38
Launch distributed training scripts with automatic environment setup.
39

40
```bash { .api }
41
accelerate launch [OPTIONS] SCRIPT [SCRIPT_ARGS...]
42
```
43

44
**Description**: Launch training scripts with automatic distributed training setup based on configuration.
45

46
**Common Options**:
47
- `--config_file PATH` - Use specific config file
48
- `--cpu` - Force CPU usage even if GPU available
49
- `--multi_gpu` - Use multi-GPU training
50
- `--mixed_precision {no,fp16,bf16,fp8}` - Mixed precision mode
51
- `--num_processes NUM` - Number of processes to use
52
- `--num_machines NUM` - Number of machines for multi-node training
53
- `--machine_rank RANK` - Rank of current machine
54
- `--main_process_ip IP` - IP address of main process
55
- `--main_process_port PORT` - Port for main process communication
56
- `--deepspeed_config_file PATH` - DeepSpeed configuration file
57
- `--fsdp_config_file PATH` - FSDP configuration file
58
- `--dynamo_backend BACKEND` - Torch Dynamo backend for compilation
59

60
**Usage Examples**:
61
```bash
62
# Basic single-GPU training
63
accelerate launch train.py --batch_size 32
64

65
# Multi-GPU training with mixed precision
66
accelerate launch --mixed_precision fp16 --num_processes 4 train.py
67

68
# Multi-node training
69
accelerate launch \
70
    --num_machines 2 \
71
    --num_processes 8 \
72
    --machine_rank 0 \
73
    --main_process_ip 192.168.1.100 \
74
    --main_process_port 29500 \
75
    train.py
76

77
# DeepSpeed training
78
accelerate launch \
79
    --deepspeed_config_file ds_config.json \
80
    --num_processes 4 \
81
    train.py
82

83
# With Torch compilation
84
accelerate launch \
85
    --dynamo_backend inductor \
86
    --mixed_precision bf16 \
87
    train.py
88
```
89

90
### Environment Information
91

92
Display environment and configuration information for debugging.
93

94
```bash { .api }
95
accelerate env
96
```
97

98
**Description**: Show detailed information about the current Accelerate installation, hardware, and configuration.
99

100
**Output includes**:
101
- Accelerate version and installation details
102
- PyTorch version and CUDA availability
103
- Hardware information (GPUs, memory, etc.)
104
- Current configuration settings
105
- Available optional dependencies
106

107
**Usage**:
108
```bash
109
# Show environment information
110
accelerate env
111
```
112

113
### Memory Estimation
114

115
Estimate memory requirements for model training and inference.
116

117
```bash { .api }
118
accelerate estimate-memory [OPTIONS] MODEL_NAME
119
```
120

121
**Description**: Estimate GPU memory requirements for training or inference with specific models.
122

123
**Options**:
124
- `--library_name {transformers,timm,diffusers}` - Model library (default: transformers)
125
- `--dtypes DTYPES` - Data types to test (comma-separated: float32,float16,bfloat16,int8,int4)
126
- `--num_gpus NUM` - Number of GPUs available
127
- `--trust_remote_code` - Trust remote code in model loading
128
- `--access_token TOKEN` - Hugging Face access token
129

130
**Usage Examples**:
131
```bash
132
# Estimate memory for a model
133
accelerate estimate-memory microsoft/DialoGPT-medium
134

135
# Test multiple data types
136
accelerate estimate-memory \
137
    --dtypes float32,float16,bfloat16 \
138
    --num_gpus 2 \
139
    microsoft/DialoGPT-large
140

141
# With custom library
142
accelerate estimate-memory \
143
    --library_name timm \
144
    --dtypes float16,bfloat16 \
145
    resnet50
146
```
147

148
### Training Setup Testing
149

150
Test distributed training setup and communication.
151

152
```bash { .api }
153
accelerate test [OPTIONS]
154
```
155

156
**Description**: Test distributed training setup by running a simple training loop to verify configuration.
157

158
**Options**:
159
- `--config_file PATH` - Use specific config file
160
- `--num_processes NUM` - Override number of processes
161

162
**Usage**:
163
```bash
164
# Test current configuration
165
accelerate test
166

167
# Test with specific number of processes
168
accelerate test --num_processes 4
169
```
170

171
### Model Weight Merging
172

173
Merge sharded model checkpoints into a single file.
174

175
```bash { .api }
176
accelerate merge-weights [OPTIONS] INPUT_DIR OUTPUT_DIR
177
```
178

179
**Description**: Merge model weights that have been sharded across multiple files back into a single checkpoint file.
180

181
**Options**:
182
- `--model_name_or_path PATH` - Model name or path for configuration
183
- `--torch_dtype {float16,bfloat16,float32}` - Target data type
184
- `--safe_serialization` - Use safetensors format for output
185

186
**Usage**:
187
```bash
188
# Merge sharded weights
189
accelerate merge-weights ./sharded_model ./merged_model
190

191
# With specific dtype and safe serialization
192
accelerate merge-weights \
193
    --torch_dtype float16 \
194
    --safe_serialization \
195
    ./sharded_model ./merged_model
196
```
197

198
### TPU Utilities
199

200
TPU-specific utilities and commands.
201

202
```bash { .api }
203
accelerate tpu-config
204
```
205

206
**Description**: Configure TPU-specific settings for training.
207

208
**Usage**:
209
```bash
210
# Configure TPU settings
211
accelerate tpu-config
212
```
213

214
### Configuration Migration
215

216
Migrate configuration to newer formats.
217

218
```bash { .api }
219
accelerate to-fsdp2 [OPTIONS]
220
```
221

222
**Description**: Convert existing FSDP configuration to FSDP2 format.
223

224
**Options**:
225
- `--config_file PATH` - Input config file
226
- `--output_file PATH` - Output config file
227

228
**Usage**:
229
```bash
230
# Convert FSDP config to FSDP2
231
accelerate to-fsdp2 --config_file old_config.yaml --output_file new_config.yaml
232
```
233

234
## Configuration File Format
235

236
The configuration file uses YAML format and contains distributed training settings:
237

238
```yaml
239
# Example configuration file
240
compute_environment: LOCAL_MACHINE
241
distributed_type: MULTI_GPU
242
downcast_bf16: 'no'
243
gpu_ids: all
244
machine_rank: 0
245
main_training_function: main
246
mixed_precision: fp16
247
num_machines: 1
248
num_processes: 4
249
rdzv_backend: static
250
same_network: true
251
tpu_env: []
252
tpu_use_cluster: false
253
tpu_use_sudo: false
254
use_cpu: false
255
```
256

257
## Usage Examples
258

259
### Complete Setup Workflow
260

261
```bash
262
# 1. Configure Accelerate
263
accelerate config
264
# Follow interactive prompts to set up configuration
265

266
# 2. Test the setup
267
accelerate test
268

269
# 3. Check environment
270
accelerate env
271

272
# 4. Estimate memory for your model
273
accelerate estimate-memory microsoft/DialoGPT-medium
274

275
# 5. Launch training
276
accelerate launch train.py --learning_rate 1e-4 --batch_size 16
277
```
278

279
### Multi-Node Training Setup
280

281
```bash
282
# On main node (machine rank 0)
283
accelerate launch \
284
    --num_machines 2 \
285
    --num_processes 8 \
286
    --machine_rank 0 \
287
    --main_process_ip 192.168.1.100 \
288
    --main_process_port 29500 \
289
    --mixed_precision fp16 \
290
    train.py
291

292
# On worker node (machine rank 1)  
293
accelerate launch \
294
    --num_machines 2 \
295
    --num_processes 8 \
296
    --machine_rank 1 \
297
    --main_process_ip 192.168.1.100 \
298
    --main_process_port 29500 \
299
    --mixed_precision fp16 \
300
    train.py
301
```
302

303
### DeepSpeed Integration
304

305
```bash
306
# Create DeepSpeed config file (ds_config.json)
307
cat > ds_config.json << EOF
308
{
309
    "train_batch_size": 16,
310
    "gradient_accumulation_steps": 4,
311
    "optimizer": {
312
        "type": "Adam",
313
        "params": {
314
            "lr": 1e-4
315
        }
316
    },
317
    "zero_optimization": {
318
        "stage": 2,
319
        "offload_optimizer": {
320
            "device": "cpu"
321
        }
322
    },
323
    "fp16": {
324
        "enabled": true
325
    }
326
}
327
EOF
328

329
# Launch with DeepSpeed
330
accelerate launch \
331
    --deepspeed_config_file ds_config.json \
332
    --num_processes 4 \
333
    train.py
334
```
335

336
### Memory Optimization Workflow
337

338
```bash
339
# 1. Estimate memory requirements  
340
accelerate estimate-memory \
341
    --dtypes float32,float16,bfloat16,int8 \
342
    --num_gpus 2 \
343
    microsoft/DialoGPT-large
344

345
# 2. Based on results, configure with appropriate settings
346
accelerate config
347
# Select mixed precision based on memory estimates
348

349
# 3. Test configuration
350
accelerate test
351

352
# 4. Launch optimized training
353
accelerate launch \
354
    --mixed_precision bf16 \
355
    --gradient_accumulation_steps 8 \
356
    train.py --batch_size 4
357
```
358

359
### Development and Debugging
360

361
```bash
362
# Debug distributed setup
363
accelerate test --num_processes 2
364

365
# Check detailed environment info
366
accelerate env
367

368
# Launch with verbose output for debugging
369
accelerate launch --debug train.py
370

371
# Test with different configurations
372
accelerate launch --cpu train.py  # Force CPU
373
accelerate launch --mixed_precision no train.py  # Disable mixed precision
374
```
375

376
### Checkpoint Management
377

378
```bash
379
# After training with model sharding
380
ls ./my_model/
381
# pytorch_model-00001-of-00004.bin
382
# pytorch_model-00002-of-00004.bin  
383
# pytorch_model-00003-of-00004.bin
384
# pytorch_model-00004-of-00004.bin
385
# pytorch_model.bin.index.json
386

387
# Merge sharded weights
388
accelerate merge-weights \
389
    --safe_serialization \
390
    ./my_model ./my_model_merged
391

392
ls ./my_model_merged/
393
# model.safetensors (single merged file)
394
```

Version

Tile

Files

cli-commands.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

cli-commands.mddocs/