0
# CLI Commands
1
2
Command-line tools for configuration, launching distributed training, memory estimation, and environment management. These tools provide an easy interface for setting up and managing distributed training workflows.
3
4
## Capabilities
5
6
### Configuration Management
7
8
Interactive configuration setup and management commands.
9
10
```bash { .api }
11
accelerate config
12
```
13
14
**Description**: Interactive configuration wizard that guides users through setting up distributed training configuration.
15
16
**Options**:
17
- `--config_file PATH` - Specify custom config file location (default: `~/.cache/huggingface/accelerate/default_config.yaml`)
18
19
**Usage**:
20
```bash
21
# Run interactive configuration
22
accelerate config
23
24
# Use custom config file location
25
accelerate config --config_file ./my_config.yaml
26
```
27
28
**Interactive Options**:
29
- Compute environment (local machine, SageMaker, etc.)
30
- Distributed training type (no distributed, multi-GPU, multi-node, etc.)
31
- Number of processes/GPUs to use
32
- Mixed precision training mode (no, fp16, bf16, fp8)
33
- DeepSpeed or FSDP configuration
34
- Machine rank and addresses for multi-node training
35
36
### Training Launch
37
38
Launch distributed training scripts with automatic environment setup.
39
40
```bash { .api }
41
accelerate launch [OPTIONS] SCRIPT [SCRIPT_ARGS...]
42
```
43
44
**Description**: Launch training scripts with automatic distributed training setup based on configuration.
45
46
**Common Options**:
47
- `--config_file PATH` - Use specific config file
48
- `--cpu` - Force CPU usage even if GPU available
49
- `--multi_gpu` - Use multi-GPU training
50
- `--mixed_precision {no,fp16,bf16,fp8}` - Mixed precision mode
51
- `--num_processes NUM` - Number of processes to use
52
- `--num_machines NUM` - Number of machines for multi-node training
53
- `--machine_rank RANK` - Rank of current machine
54
- `--main_process_ip IP` - IP address of main process
55
- `--main_process_port PORT` - Port for main process communication
56
- `--deepspeed_config_file PATH` - DeepSpeed configuration file
57
- `--fsdp_config_file PATH` - FSDP configuration file
58
- `--dynamo_backend BACKEND` - Torch Dynamo backend for compilation
59
60
**Usage Examples**:
61
```bash
62
# Basic single-GPU training
63
accelerate launch train.py --batch_size 32
64
65
# Multi-GPU training with mixed precision
66
accelerate launch --mixed_precision fp16 --num_processes 4 train.py
67
68
# Multi-node training
69
accelerate launch \
70
--num_machines 2 \
71
--num_processes 8 \
72
--machine_rank 0 \
73
--main_process_ip 192.168.1.100 \
74
--main_process_port 29500 \
75
train.py
76
77
# DeepSpeed training
78
accelerate launch \
79
--deepspeed_config_file ds_config.json \
80
--num_processes 4 \
81
train.py
82
83
# With Torch compilation
84
accelerate launch \
85
--dynamo_backend inductor \
86
--mixed_precision bf16 \
87
train.py
88
```
89
90
### Environment Information
91
92
Display environment and configuration information for debugging.
93
94
```bash { .api }
95
accelerate env
96
```
97
98
**Description**: Show detailed information about the current Accelerate installation, hardware, and configuration.
99
100
**Output includes**:
101
- Accelerate version and installation details
102
- PyTorch version and CUDA availability
103
- Hardware information (GPUs, memory, etc.)
104
- Current configuration settings
105
- Available optional dependencies
106
107
**Usage**:
108
```bash
109
# Show environment information
110
accelerate env
111
```
112
113
### Memory Estimation
114
115
Estimate memory requirements for model training and inference.
116
117
```bash { .api }
118
accelerate estimate-memory [OPTIONS] MODEL_NAME
119
```
120
121
**Description**: Estimate GPU memory requirements for training or inference with specific models.
122
123
**Options**:
124
- `--library_name {transformers,timm,diffusers}` - Model library (default: transformers)
125
- `--dtypes DTYPES` - Data types to test (comma-separated: float32,float16,bfloat16,int8,int4)
126
- `--num_gpus NUM` - Number of GPUs available
127
- `--trust_remote_code` - Trust remote code in model loading
128
- `--access_token TOKEN` - Hugging Face access token
129
130
**Usage Examples**:
131
```bash
132
# Estimate memory for a model
133
accelerate estimate-memory microsoft/DialoGPT-medium
134
135
# Test multiple data types
136
accelerate estimate-memory \
137
--dtypes float32,float16,bfloat16 \
138
--num_gpus 2 \
139
microsoft/DialoGPT-large
140
141
# With custom library
142
accelerate estimate-memory \
143
--library_name timm \
144
--dtypes float16,bfloat16 \
145
resnet50
146
```
147
148
### Training Setup Testing
149
150
Test distributed training setup and communication.
151
152
```bash { .api }
153
accelerate test [OPTIONS]
154
```
155
156
**Description**: Test distributed training setup by running a simple training loop to verify configuration.
157
158
**Options**:
159
- `--config_file PATH` - Use specific config file
160
- `--num_processes NUM` - Override number of processes
161
162
**Usage**:
163
```bash
164
# Test current configuration
165
accelerate test
166
167
# Test with specific number of processes
168
accelerate test --num_processes 4
169
```
170
171
### Model Weight Merging
172
173
Merge sharded model checkpoints into a single file.
174
175
```bash { .api }
176
accelerate merge-weights [OPTIONS] INPUT_DIR OUTPUT_DIR
177
```
178
179
**Description**: Merge model weights that have been sharded across multiple files back into a single checkpoint file.
180
181
**Options**:
182
- `--model_name_or_path PATH` - Model name or path for configuration
183
- `--torch_dtype {float16,bfloat16,float32}` - Target data type
184
- `--safe_serialization` - Use safetensors format for output
185
186
**Usage**:
187
```bash
188
# Merge sharded weights
189
accelerate merge-weights ./sharded_model ./merged_model
190
191
# With specific dtype and safe serialization
192
accelerate merge-weights \
193
--torch_dtype float16 \
194
--safe_serialization \
195
./sharded_model ./merged_model
196
```
197
198
### TPU Utilities
199
200
TPU-specific utilities and commands.
201
202
```bash { .api }
203
accelerate tpu-config
204
```
205
206
**Description**: Configure TPU-specific settings for training.
207
208
**Usage**:
209
```bash
210
# Configure TPU settings
211
accelerate tpu-config
212
```
213
214
### Configuration Migration
215
216
Migrate configuration to newer formats.
217
218
```bash { .api }
219
accelerate to-fsdp2 [OPTIONS]
220
```
221
222
**Description**: Convert existing FSDP configuration to FSDP2 format.
223
224
**Options**:
225
- `--config_file PATH` - Input config file
226
- `--output_file PATH` - Output config file
227
228
**Usage**:
229
```bash
230
# Convert FSDP config to FSDP2
231
accelerate to-fsdp2 --config_file old_config.yaml --output_file new_config.yaml
232
```
233
234
## Configuration File Format
235
236
The configuration file uses YAML format and contains distributed training settings:
237
238
```yaml
239
# Example configuration file
240
compute_environment: LOCAL_MACHINE
241
distributed_type: MULTI_GPU
242
downcast_bf16: 'no'
243
gpu_ids: all
244
machine_rank: 0
245
main_training_function: main
246
mixed_precision: fp16
247
num_machines: 1
248
num_processes: 4
249
rdzv_backend: static
250
same_network: true
251
tpu_env: []
252
tpu_use_cluster: false
253
tpu_use_sudo: false
254
use_cpu: false
255
```
256
257
## Usage Examples
258
259
### Complete Setup Workflow
260
261
```bash
262
# 1. Configure Accelerate
263
accelerate config
264
# Follow interactive prompts to set up configuration
265
266
# 2. Test the setup
267
accelerate test
268
269
# 3. Check environment
270
accelerate env
271
272
# 4. Estimate memory for your model
273
accelerate estimate-memory microsoft/DialoGPT-medium
274
275
# 5. Launch training
276
accelerate launch train.py --learning_rate 1e-4 --batch_size 16
277
```
278
279
### Multi-Node Training Setup
280
281
```bash
282
# On main node (machine rank 0)
283
accelerate launch \
284
--num_machines 2 \
285
--num_processes 8 \
286
--machine_rank 0 \
287
--main_process_ip 192.168.1.100 \
288
--main_process_port 29500 \
289
--mixed_precision fp16 \
290
train.py
291
292
# On worker node (machine rank 1)
293
accelerate launch \
294
--num_machines 2 \
295
--num_processes 8 \
296
--machine_rank 1 \
297
--main_process_ip 192.168.1.100 \
298
--main_process_port 29500 \
299
--mixed_precision fp16 \
300
train.py
301
```
302
303
### DeepSpeed Integration
304
305
```bash
306
# Create DeepSpeed config file (ds_config.json)
307
cat > ds_config.json << EOF
308
{
309
"train_batch_size": 16,
310
"gradient_accumulation_steps": 4,
311
"optimizer": {
312
"type": "Adam",
313
"params": {
314
"lr": 1e-4
315
}
316
},
317
"zero_optimization": {
318
"stage": 2,
319
"offload_optimizer": {
320
"device": "cpu"
321
}
322
},
323
"fp16": {
324
"enabled": true
325
}
326
}
327
EOF
328
329
# Launch with DeepSpeed
330
accelerate launch \
331
--deepspeed_config_file ds_config.json \
332
--num_processes 4 \
333
train.py
334
```
335
336
### Memory Optimization Workflow
337
338
```bash
339
# 1. Estimate memory requirements
340
accelerate estimate-memory \
341
--dtypes float32,float16,bfloat16,int8 \
342
--num_gpus 2 \
343
microsoft/DialoGPT-large
344
345
# 2. Based on results, configure with appropriate settings
346
accelerate config
347
# Select mixed precision based on memory estimates
348
349
# 3. Test configuration
350
accelerate test
351
352
# 4. Launch optimized training
353
accelerate launch \
354
--mixed_precision bf16 \
355
--gradient_accumulation_steps 8 \
356
train.py --batch_size 4
357
```
358
359
### Development and Debugging
360
361
```bash
362
# Debug distributed setup
363
accelerate test --num_processes 2
364
365
# Check detailed environment info
366
accelerate env
367
368
# Launch with verbose output for debugging
369
accelerate launch --debug train.py
370
371
# Test with different configurations
372
accelerate launch --cpu train.py # Force CPU
373
accelerate launch --mixed_precision no train.py # Disable mixed precision
374
```
375
376
### Checkpoint Management
377
378
```bash
379
# After training with model sharding
380
ls ./my_model/
381
# pytorch_model-00001-of-00004.bin
382
# pytorch_model-00002-of-00004.bin
383
# pytorch_model-00003-of-00004.bin
384
# pytorch_model-00004-of-00004.bin
385
# pytorch_model.bin.index.json
386
387
# Merge sharded weights
388
accelerate merge-weights \
389
--safe_serialization \
390
./my_model ./my_model_merged
391
392
ls ./my_model_merged/
393
# model.safetensors (single merged file)
394
```