or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

cleaning.mdconfiguration.mddevelopment-tools.mddocumentation.mdexport.mdgit-integration.mdindex.mdrelease-management.mdsynchronization.mdtesting.md

cleaning.mddocs/

0

# Notebook Cleaning

1

2

Clean notebook metadata, outputs, and configure git integration for notebooks. The cleaning system removes superfluous metadata, clears outputs, and makes notebooks git-friendly by handling merge conflicts and cell IDs.

3

4

## Capabilities

5

6

### Main Cleaning Functions

7

8

Clean notebooks and remove unnecessary metadata for version control.

9

10

```python { .api }

11

def nbdev_clean(fname: str = None, clear_all: bool = False,

12

disp: bool = False, read_input_dir: str = None,

13

write_input_dir: str = None):

14

"""

15

Clean notebooks in project.

16

17

Args:

18

fname: Specific notebook file or glob pattern to clean

19

clear_all: Remove all metadata and outputs (overrides settings)

20

disp: Display cleaning progress and changes

21

read_input_dir: Directory to read notebooks from

22

write_input_dir: Directory to write cleaned notebooks to

23

24

Removes unnecessary metadata, cell outputs, and execution counts

25

from notebooks to make them git-friendly and reduce diff noise.

26

"""

27

28

def clean_nb(nb):

29

"""

30

Clean a single notebook object.

31

32

Args:

33

nb: Notebook object to clean

34

35

Returns:

36

Cleaned notebook with metadata and outputs removed

37

according to configuration settings.

38

"""

39

```

40

41

**Usage Examples:**

42

43

```python

44

from nbdev.clean import nbdev_clean, clean_nb

45

from execnb.nbio import read_nb, write_nb

46

47

# Clean all notebooks in project

48

nbdev_clean()

49

50

# Clean specific notebook file

51

nbdev_clean('notebooks/01_core.ipynb')

52

53

# Clean with verbose output

54

nbdev_clean(disp=True)

55

56

# Clean and remove all metadata/outputs

57

nbdev_clean(clear_all=True)

58

59

# Clean notebook object directly

60

nb = read_nb('example.ipynb')

61

cleaned_nb = clean_nb(nb)

62

write_nb(cleaned_nb, 'cleaned_example.ipynb')

63

```

64

65

### Trust Management

66

67

Trust notebooks to enable execution of JavaScript and other dynamic content.

68

69

```python { .api }

70

def nbdev_trust(fname: str = None, force_all: bool = False):

71

"""

72

Trust notebooks matching fname pattern.

73

74

Args:

75

fname: Notebook name or glob pattern to trust

76

force_all: Trust notebooks even if they haven't changed

77

78

Trusts notebooks for execution by signing them with Jupyter's

79

trust system. Only processes notebooks that have changed since

80

last trust operation unless force_all is True.

81

"""

82

```

83

84

**Usage Examples:**

85

86

```python

87

from nbdev.clean import nbdev_trust

88

89

# Trust all notebooks in project

90

nbdev_trust()

91

92

# Trust specific notebook

93

nbdev_trust('notebooks/analysis.ipynb')

94

95

# Force trust all notebooks regardless of change status

96

nbdev_trust(force_all=True)

97

98

# Trust notebooks matching pattern

99

nbdev_trust('notebooks/experimental/*.ipynb')

100

```

101

102

### Git Hook Installation

103

104

Install git hooks for automatic notebook cleaning and processing.

105

106

```python { .api }

107

def nbdev_install_hooks():

108

"""

109

Install git hooks for notebook processing.

110

111

Installs pre-commit and other git hooks that automatically:

112

- Clean notebooks before commits

113

- Handle notebook merge conflicts

114

- Process notebook metadata

115

- Ensure consistent notebook formatting

116

"""

117

```

118

119

**Usage Example:**

120

121

```python

122

from nbdev.clean import nbdev_install_hooks

123

124

# Install git hooks for automatic cleaning

125

nbdev_install_hooks()

126

# Now git commits will automatically clean notebooks

127

```

128

129

### Jupyter Integration

130

131

Configure Jupyter to work seamlessly with nbdev cleaning.

132

133

```python { .api }

134

def clean_jupyter():

135

"""

136

Clean Jupyter-specific metadata and configuration.

137

138

Removes Jupyter-specific metadata that can cause merge conflicts

139

or unnecessary version control noise, including:

140

- Kernel specifications that may vary between environments

141

- Execution timing information

142

- Widget state that doesn't serialize well

143

"""

144

```

145

146

### File Processing

147

148

Process and write notebooks with cleaning applied.

149

150

```python { .api }

151

def process_write(nb, fname: str):

152

"""

153

Process and write notebook with cleaning.

154

155

Args:

156

nb: Notebook object to process

157

fname: Output filename for processed notebook

158

159

Applies all configured cleaning operations and writes

160

the cleaned notebook to the specified file.

161

"""

162

```

163

164

## Cleaning Configuration

165

166

Control cleaning behavior through `settings.ini` configuration:

167

168

### Metadata Control

169

170

```ini

171

# Remove cell IDs from notebooks

172

clean_ids = True

173

174

# Remove all metadata and outputs

175

clear_all = False

176

177

# Preserve specific metadata keys

178

allowed_metadata_keys = language_info,kernelspec

179

allowed_cell_metadata_keys = tags,id

180

```

181

182

### Jupyter Hooks

183

184

```ini

185

# Enable Jupyter git hooks

186

jupyter_hooks = True

187

```

188

189

## What Gets Cleaned

190

191

### Cell-Level Cleaning

192

193

- **Execution counts**: Removed to prevent meaningless diffs

194

- **Output data**: Cleared unless specifically configured to keep

195

- **Cell IDs**: Removed if `clean_ids=True`

196

- **Execution timing**: Stripped from metadata

197

- **Widget states**: Removed as they don't serialize consistently

198

199

### Notebook-Level Cleaning

200

201

- **Kernel information**: Standardized or removed

202

- **Language info**: Kept only if in `allowed_metadata_keys`

203

- **Custom metadata**: Filtered based on configuration

204

- **Notebook format**: Ensured to be consistent version

205

206

### Example Before/After

207

208

**Before Cleaning:**

209

```json

210

{

211

"cell_type": "code",

212

"execution_count": 42,

213

"id": "a1b2c3d4-e5f6-7890",

214

"metadata": {

215

"scrolled": true,

216

"execution": {"iopub.execute_input": "2023-01-01T12:00:00.000Z"}

217

},

218

"outputs": [{"output_type": "stream", "text": "Hello World"}],

219

"source": "print('Hello World')"

220

}

221

```

222

223

**After Cleaning:**

224

```json

225

{

226

"cell_type": "code",

227

"execution_count": null,

228

"metadata": {},

229

"outputs": [],

230

"source": "print('Hello World')"

231

}

232

```

233

234

## Git Integration

235

236

### Pre-commit Hooks

237

238

When `nbdev_install_hooks()` is run, git pre-commit hooks automatically:

239

240

```bash

241

# Before each commit, hooks run:

242

nbdev_clean --clear_all # Clean all notebooks

243

# Then proceed with commit

244

```

245

246

### Merge Conflict Resolution

247

248

Cleaning helps resolve notebook merge conflicts by:

249

250

- Removing execution counts that often conflict

251

- Standardizing metadata format

252

- Clearing outputs that may differ between runs

253

- Normalizing cell IDs and other volatile data

254

255

## Advanced Usage

256

257

### Selective Cleaning

258

259

```python

260

from nbdev.clean import nbdev_clean

261

262

# Clean only specific types of notebooks

263

nbdev_clean('notebooks/experiments/*.ipynb')

264

265

# Clean with custom settings

266

nbdev_clean(clear_all=True, disp=True)

267

```

268

269

### Integration with Workflow

270

271

```python

272

from nbdev.clean import nbdev_clean, nbdev_trust

273

from nbdev.export import nb_export

274

275

# Complete workflow

276

def prepare_notebooks():

277

"""Prepare notebooks for version control and export."""

278

# Clean notebooks

279

nbdev_clean(disp=True)

280

281

# Trust for execution

282

nbdev_trust()

283

284

# Export to modules

285

nb_export()

286

287

print("Notebooks prepared successfully")

288

289

prepare_notebooks()

290

```

291

292

### Custom Cleaning Pipeline

293

294

```python

295

from nbdev.clean import clean_nb, process_write

296

from execnb.nbio import read_nb

297

from pathlib import Path

298

299

def custom_clean_workflow(nb_path):

300

"""Custom cleaning with additional processing."""

301

nb = read_nb(nb_path)

302

303

# Apply standard cleaning

304

cleaned_nb = clean_nb(nb)

305

306

# Custom processing

307

# Remove specific metadata, add custom fields, etc.

308

309

# Write back

310

process_write(cleaned_nb, nb_path)

311

312

# Apply to all notebooks

313

for nb_file in Path('notebooks').glob('*.ipynb'):

314

custom_clean_workflow(nb_file)

315

```

316

317

## Best Practices

318

319

### Regular Cleaning

320

321

```bash

322

# Add to your development workflow

323

nbdev_clean # Before committing changes

324

nbdev_trust # After pulling changes

325

```

326

327

### Team Collaboration

328

329

- Install hooks on all development machines: `nbdev_install_hooks()`

330

- Use consistent cleaning settings across team

331

- Clean notebooks before sharing or committing

332

- Trust notebooks after pulling from shared repository

333

334

### CI/CD Integration

335

336

```yaml

337

# In GitHub Actions

338

- name: Clean notebooks

339

run: nbdev_clean --clear_all

340

341

- name: Verify notebooks are clean

342

run: |

343

nbdev_clean

344

git diff --exit-code # Fail if notebooks weren't already clean

345

```