or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

big-modeling.mdcli-commands.mdconfiguration.mdcore-training.mddistributed-operations.mdindex.mdutilities.md

cli-commands.mddocs/

0

# CLI Commands

1

2

Command-line tools for configuration, launching distributed training, memory estimation, and environment management. These tools provide an easy interface for setting up and managing distributed training workflows.

3

4

## Capabilities

5

6

### Configuration Management

7

8

Interactive configuration setup and management commands.

9

10

```bash { .api }

11

accelerate config

12

```

13

14

**Description**: Interactive configuration wizard that guides users through setting up distributed training configuration.

15

16

**Options**:

17

- `--config_file PATH` - Specify custom config file location (default: `~/.cache/huggingface/accelerate/default_config.yaml`)

18

19

**Usage**:

20

```bash

21

# Run interactive configuration

22

accelerate config

23

24

# Use custom config file location

25

accelerate config --config_file ./my_config.yaml

26

```

27

28

**Interactive Options**:

29

- Compute environment (local machine, SageMaker, etc.)

30

- Distributed training type (no distributed, multi-GPU, multi-node, etc.)

31

- Number of processes/GPUs to use

32

- Mixed precision training mode (no, fp16, bf16, fp8)

33

- DeepSpeed or FSDP configuration

34

- Machine rank and addresses for multi-node training

35

36

### Training Launch

37

38

Launch distributed training scripts with automatic environment setup.

39

40

```bash { .api }

41

accelerate launch [OPTIONS] SCRIPT [SCRIPT_ARGS...]

42

```

43

44

**Description**: Launch training scripts with automatic distributed training setup based on configuration.

45

46

**Common Options**:

47

- `--config_file PATH` - Use specific config file

48

- `--cpu` - Force CPU usage even if GPU available

49

- `--multi_gpu` - Use multi-GPU training

50

- `--mixed_precision {no,fp16,bf16,fp8}` - Mixed precision mode

51

- `--num_processes NUM` - Number of processes to use

52

- `--num_machines NUM` - Number of machines for multi-node training

53

- `--machine_rank RANK` - Rank of current machine

54

- `--main_process_ip IP` - IP address of main process

55

- `--main_process_port PORT` - Port for main process communication

56

- `--deepspeed_config_file PATH` - DeepSpeed configuration file

57

- `--fsdp_config_file PATH` - FSDP configuration file

58

- `--dynamo_backend BACKEND` - Torch Dynamo backend for compilation

59

60

**Usage Examples**:

61

```bash

62

# Basic single-GPU training

63

accelerate launch train.py --batch_size 32

64

65

# Multi-GPU training with mixed precision

66

accelerate launch --mixed_precision fp16 --num_processes 4 train.py

67

68

# Multi-node training

69

accelerate launch \

70

--num_machines 2 \

71

--num_processes 8 \

72

--machine_rank 0 \

73

--main_process_ip 192.168.1.100 \

74

--main_process_port 29500 \

75

train.py

76

77

# DeepSpeed training

78

accelerate launch \

79

--deepspeed_config_file ds_config.json \

80

--num_processes 4 \

81

train.py

82

83

# With Torch compilation

84

accelerate launch \

85

--dynamo_backend inductor \

86

--mixed_precision bf16 \

87

train.py

88

```

89

90

### Environment Information

91

92

Display environment and configuration information for debugging.

93

94

```bash { .api }

95

accelerate env

96

```

97

98

**Description**: Show detailed information about the current Accelerate installation, hardware, and configuration.

99

100

**Output includes**:

101

- Accelerate version and installation details

102

- PyTorch version and CUDA availability

103

- Hardware information (GPUs, memory, etc.)

104

- Current configuration settings

105

- Available optional dependencies

106

107

**Usage**:

108

```bash

109

# Show environment information

110

accelerate env

111

```

112

113

### Memory Estimation

114

115

Estimate memory requirements for model training and inference.

116

117

```bash { .api }

118

accelerate estimate-memory [OPTIONS] MODEL_NAME

119

```

120

121

**Description**: Estimate GPU memory requirements for training or inference with specific models.

122

123

**Options**:

124

- `--library_name {transformers,timm,diffusers}` - Model library (default: transformers)

125

- `--dtypes DTYPES` - Data types to test (comma-separated: float32,float16,bfloat16,int8,int4)

126

- `--num_gpus NUM` - Number of GPUs available

127

- `--trust_remote_code` - Trust remote code in model loading

128

- `--access_token TOKEN` - Hugging Face access token

129

130

**Usage Examples**:

131

```bash

132

# Estimate memory for a model

133

accelerate estimate-memory microsoft/DialoGPT-medium

134

135

# Test multiple data types

136

accelerate estimate-memory \

137

--dtypes float32,float16,bfloat16 \

138

--num_gpus 2 \

139

microsoft/DialoGPT-large

140

141

# With custom library

142

accelerate estimate-memory \

143

--library_name timm \

144

--dtypes float16,bfloat16 \

145

resnet50

146

```

147

148

### Training Setup Testing

149

150

Test distributed training setup and communication.

151

152

```bash { .api }

153

accelerate test [OPTIONS]

154

```

155

156

**Description**: Test distributed training setup by running a simple training loop to verify configuration.

157

158

**Options**:

159

- `--config_file PATH` - Use specific config file

160

- `--num_processes NUM` - Override number of processes

161

162

**Usage**:

163

```bash

164

# Test current configuration

165

accelerate test

166

167

# Test with specific number of processes

168

accelerate test --num_processes 4

169

```

170

171

### Model Weight Merging

172

173

Merge sharded model checkpoints into a single file.

174

175

```bash { .api }

176

accelerate merge-weights [OPTIONS] INPUT_DIR OUTPUT_DIR

177

```

178

179

**Description**: Merge model weights that have been sharded across multiple files back into a single checkpoint file.

180

181

**Options**:

182

- `--model_name_or_path PATH` - Model name or path for configuration

183

- `--torch_dtype {float16,bfloat16,float32}` - Target data type

184

- `--safe_serialization` - Use safetensors format for output

185

186

**Usage**:

187

```bash

188

# Merge sharded weights

189

accelerate merge-weights ./sharded_model ./merged_model

190

191

# With specific dtype and safe serialization

192

accelerate merge-weights \

193

--torch_dtype float16 \

194

--safe_serialization \

195

./sharded_model ./merged_model

196

```

197

198

### TPU Utilities

199

200

TPU-specific utilities and commands.

201

202

```bash { .api }

203

accelerate tpu-config

204

```

205

206

**Description**: Configure TPU-specific settings for training.

207

208

**Usage**:

209

```bash

210

# Configure TPU settings

211

accelerate tpu-config

212

```

213

214

### Configuration Migration

215

216

Migrate configuration to newer formats.

217

218

```bash { .api }

219

accelerate to-fsdp2 [OPTIONS]

220

```

221

222

**Description**: Convert existing FSDP configuration to FSDP2 format.

223

224

**Options**:

225

- `--config_file PATH` - Input config file

226

- `--output_file PATH` - Output config file

227

228

**Usage**:

229

```bash

230

# Convert FSDP config to FSDP2

231

accelerate to-fsdp2 --config_file old_config.yaml --output_file new_config.yaml

232

```

233

234

## Configuration File Format

235

236

The configuration file uses YAML format and contains distributed training settings:

237

238

```yaml

239

# Example configuration file

240

compute_environment: LOCAL_MACHINE

241

distributed_type: MULTI_GPU

242

downcast_bf16: 'no'

243

gpu_ids: all

244

machine_rank: 0

245

main_training_function: main

246

mixed_precision: fp16

247

num_machines: 1

248

num_processes: 4

249

rdzv_backend: static

250

same_network: true

251

tpu_env: []

252

tpu_use_cluster: false

253

tpu_use_sudo: false

254

use_cpu: false

255

```

256

257

## Usage Examples

258

259

### Complete Setup Workflow

260

261

```bash

262

# 1. Configure Accelerate

263

accelerate config

264

# Follow interactive prompts to set up configuration

265

266

# 2. Test the setup

267

accelerate test

268

269

# 3. Check environment

270

accelerate env

271

272

# 4. Estimate memory for your model

273

accelerate estimate-memory microsoft/DialoGPT-medium

274

275

# 5. Launch training

276

accelerate launch train.py --learning_rate 1e-4 --batch_size 16

277

```

278

279

### Multi-Node Training Setup

280

281

```bash

282

# On main node (machine rank 0)

283

accelerate launch \

284

--num_machines 2 \

285

--num_processes 8 \

286

--machine_rank 0 \

287

--main_process_ip 192.168.1.100 \

288

--main_process_port 29500 \

289

--mixed_precision fp16 \

290

train.py

291

292

# On worker node (machine rank 1)

293

accelerate launch \

294

--num_machines 2 \

295

--num_processes 8 \

296

--machine_rank 1 \

297

--main_process_ip 192.168.1.100 \

298

--main_process_port 29500 \

299

--mixed_precision fp16 \

300

train.py

301

```

302

303

### DeepSpeed Integration

304

305

```bash

306

# Create DeepSpeed config file (ds_config.json)

307

cat > ds_config.json << EOF

308

{

309

"train_batch_size": 16,

310

"gradient_accumulation_steps": 4,

311

"optimizer": {

312

"type": "Adam",

313

"params": {

314

"lr": 1e-4

315

}

316

},

317

"zero_optimization": {

318

"stage": 2,

319

"offload_optimizer": {

320

"device": "cpu"

321

}

322

},

323

"fp16": {

324

"enabled": true

325

}

326

}

327

EOF

328

329

# Launch with DeepSpeed

330

accelerate launch \

331

--deepspeed_config_file ds_config.json \

332

--num_processes 4 \

333

train.py

334

```

335

336

### Memory Optimization Workflow

337

338

```bash

339

# 1. Estimate memory requirements

340

accelerate estimate-memory \

341

--dtypes float32,float16,bfloat16,int8 \

342

--num_gpus 2 \

343

microsoft/DialoGPT-large

344

345

# 2. Based on results, configure with appropriate settings

346

accelerate config

347

# Select mixed precision based on memory estimates

348

349

# 3. Test configuration

350

accelerate test

351

352

# 4. Launch optimized training

353

accelerate launch \

354

--mixed_precision bf16 \

355

--gradient_accumulation_steps 8 \

356

train.py --batch_size 4

357

```

358

359

### Development and Debugging

360

361

```bash

362

# Debug distributed setup

363

accelerate test --num_processes 2

364

365

# Check detailed environment info

366

accelerate env

367

368

# Launch with verbose output for debugging

369

accelerate launch --debug train.py

370

371

# Test with different configurations

372

accelerate launch --cpu train.py # Force CPU

373

accelerate launch --mixed_precision no train.py # Disable mixed precision

374

```

375

376

### Checkpoint Management

377

378

```bash

379

# After training with model sharding

380

ls ./my_model/

381

# pytorch_model-00001-of-00004.bin

382

# pytorch_model-00002-of-00004.bin

383

# pytorch_model-00003-of-00004.bin

384

# pytorch_model-00004-of-00004.bin

385

# pytorch_model.bin.index.json

386

387

# Merge sharded weights

388

accelerate merge-weights \

389

--safe_serialization \

390

./my_model ./my_model_merged

391

392

ls ./my_model_merged/

393

# model.safetensors (single merged file)

394

```