0
# Notebook Cleaning
1
2
Clean notebook metadata, outputs, and configure git integration for notebooks. The cleaning system removes superfluous metadata, clears outputs, and makes notebooks git-friendly by handling merge conflicts and cell IDs.
3
4
## Capabilities
5
6
### Main Cleaning Functions
7
8
Clean notebooks and remove unnecessary metadata for version control.
9
10
```python { .api }
11
def nbdev_clean(fname: str = None, clear_all: bool = False,
12
disp: bool = False, read_input_dir: str = None,
13
write_input_dir: str = None):
14
"""
15
Clean notebooks in project.
16
17
Args:
18
fname: Specific notebook file or glob pattern to clean
19
clear_all: Remove all metadata and outputs (overrides settings)
20
disp: Display cleaning progress and changes
21
read_input_dir: Directory to read notebooks from
22
write_input_dir: Directory to write cleaned notebooks to
23
24
Removes unnecessary metadata, cell outputs, and execution counts
25
from notebooks to make them git-friendly and reduce diff noise.
26
"""
27
28
def clean_nb(nb):
29
"""
30
Clean a single notebook object.
31
32
Args:
33
nb: Notebook object to clean
34
35
Returns:
36
Cleaned notebook with metadata and outputs removed
37
according to configuration settings.
38
"""
39
```
40
41
**Usage Examples:**
42
43
```python
44
from nbdev.clean import nbdev_clean, clean_nb
45
from execnb.nbio import read_nb, write_nb
46
47
# Clean all notebooks in project
48
nbdev_clean()
49
50
# Clean specific notebook file
51
nbdev_clean('notebooks/01_core.ipynb')
52
53
# Clean with verbose output
54
nbdev_clean(disp=True)
55
56
# Clean and remove all metadata/outputs
57
nbdev_clean(clear_all=True)
58
59
# Clean notebook object directly
60
nb = read_nb('example.ipynb')
61
cleaned_nb = clean_nb(nb)
62
write_nb(cleaned_nb, 'cleaned_example.ipynb')
63
```
64
65
### Trust Management
66
67
Trust notebooks to enable execution of JavaScript and other dynamic content.
68
69
```python { .api }
70
def nbdev_trust(fname: str = None, force_all: bool = False):
71
"""
72
Trust notebooks matching fname pattern.
73
74
Args:
75
fname: Notebook name or glob pattern to trust
76
force_all: Trust notebooks even if they haven't changed
77
78
Trusts notebooks for execution by signing them with Jupyter's
79
trust system. Only processes notebooks that have changed since
80
last trust operation unless force_all is True.
81
"""
82
```
83
84
**Usage Examples:**
85
86
```python
87
from nbdev.clean import nbdev_trust
88
89
# Trust all notebooks in project
90
nbdev_trust()
91
92
# Trust specific notebook
93
nbdev_trust('notebooks/analysis.ipynb')
94
95
# Force trust all notebooks regardless of change status
96
nbdev_trust(force_all=True)
97
98
# Trust notebooks matching pattern
99
nbdev_trust('notebooks/experimental/*.ipynb')
100
```
101
102
### Git Hook Installation
103
104
Install git hooks for automatic notebook cleaning and processing.
105
106
```python { .api }
107
def nbdev_install_hooks():
108
"""
109
Install git hooks for notebook processing.
110
111
Installs pre-commit and other git hooks that automatically:
112
- Clean notebooks before commits
113
- Handle notebook merge conflicts
114
- Process notebook metadata
115
- Ensure consistent notebook formatting
116
"""
117
```
118
119
**Usage Example:**
120
121
```python
122
from nbdev.clean import nbdev_install_hooks
123
124
# Install git hooks for automatic cleaning
125
nbdev_install_hooks()
126
# Now git commits will automatically clean notebooks
127
```
128
129
### Jupyter Integration
130
131
Configure Jupyter to work seamlessly with nbdev cleaning.
132
133
```python { .api }
134
def clean_jupyter():
135
"""
136
Clean Jupyter-specific metadata and configuration.
137
138
Removes Jupyter-specific metadata that can cause merge conflicts
139
or unnecessary version control noise, including:
140
- Kernel specifications that may vary between environments
141
- Execution timing information
142
- Widget state that doesn't serialize well
143
"""
144
```
145
146
### File Processing
147
148
Process and write notebooks with cleaning applied.
149
150
```python { .api }
151
def process_write(nb, fname: str):
152
"""
153
Process and write notebook with cleaning.
154
155
Args:
156
nb: Notebook object to process
157
fname: Output filename for processed notebook
158
159
Applies all configured cleaning operations and writes
160
the cleaned notebook to the specified file.
161
"""
162
```
163
164
## Cleaning Configuration
165
166
Control cleaning behavior through `settings.ini` configuration:
167
168
### Metadata Control
169
170
```ini
171
# Remove cell IDs from notebooks
172
clean_ids = True
173
174
# Remove all metadata and outputs
175
clear_all = False
176
177
# Preserve specific metadata keys
178
allowed_metadata_keys = language_info,kernelspec
179
allowed_cell_metadata_keys = tags,id
180
```
181
182
### Jupyter Hooks
183
184
```ini
185
# Enable Jupyter git hooks
186
jupyter_hooks = True
187
```
188
189
## What Gets Cleaned
190
191
### Cell-Level Cleaning
192
193
- **Execution counts**: Removed to prevent meaningless diffs
194
- **Output data**: Cleared unless specifically configured to keep
195
- **Cell IDs**: Removed if `clean_ids=True`
196
- **Execution timing**: Stripped from metadata
197
- **Widget states**: Removed as they don't serialize consistently
198
199
### Notebook-Level Cleaning
200
201
- **Kernel information**: Standardized or removed
202
- **Language info**: Kept only if in `allowed_metadata_keys`
203
- **Custom metadata**: Filtered based on configuration
204
- **Notebook format**: Ensured to be consistent version
205
206
### Example Before/After
207
208
**Before Cleaning:**
209
```json
210
{
211
"cell_type": "code",
212
"execution_count": 42,
213
"id": "a1b2c3d4-e5f6-7890",
214
"metadata": {
215
"scrolled": true,
216
"execution": {"iopub.execute_input": "2023-01-01T12:00:00.000Z"}
217
},
218
"outputs": [{"output_type": "stream", "text": "Hello World"}],
219
"source": "print('Hello World')"
220
}
221
```
222
223
**After Cleaning:**
224
```json
225
{
226
"cell_type": "code",
227
"execution_count": null,
228
"metadata": {},
229
"outputs": [],
230
"source": "print('Hello World')"
231
}
232
```
233
234
## Git Integration
235
236
### Pre-commit Hooks
237
238
When `nbdev_install_hooks()` is run, git pre-commit hooks automatically:
239
240
```bash
241
# Before each commit, hooks run:
242
nbdev_clean --clear_all # Clean all notebooks
243
# Then proceed with commit
244
```
245
246
### Merge Conflict Resolution
247
248
Cleaning helps resolve notebook merge conflicts by:
249
250
- Removing execution counts that often conflict
251
- Standardizing metadata format
252
- Clearing outputs that may differ between runs
253
- Normalizing cell IDs and other volatile data
254
255
## Advanced Usage
256
257
### Selective Cleaning
258
259
```python
260
from nbdev.clean import nbdev_clean
261
262
# Clean only specific types of notebooks
263
nbdev_clean('notebooks/experiments/*.ipynb')
264
265
# Clean with custom settings
266
nbdev_clean(clear_all=True, disp=True)
267
```
268
269
### Integration with Workflow
270
271
```python
272
from nbdev.clean import nbdev_clean, nbdev_trust
273
from nbdev.export import nb_export
274
275
# Complete workflow
276
def prepare_notebooks():
277
"""Prepare notebooks for version control and export."""
278
# Clean notebooks
279
nbdev_clean(disp=True)
280
281
# Trust for execution
282
nbdev_trust()
283
284
# Export to modules
285
nb_export()
286
287
print("Notebooks prepared successfully")
288
289
prepare_notebooks()
290
```
291
292
### Custom Cleaning Pipeline
293
294
```python
295
from nbdev.clean import clean_nb, process_write
296
from execnb.nbio import read_nb
297
from pathlib import Path
298
299
def custom_clean_workflow(nb_path):
300
"""Custom cleaning with additional processing."""
301
nb = read_nb(nb_path)
302
303
# Apply standard cleaning
304
cleaned_nb = clean_nb(nb)
305
306
# Custom processing
307
# Remove specific metadata, add custom fields, etc.
308
309
# Write back
310
process_write(cleaned_nb, nb_path)
311
312
# Apply to all notebooks
313
for nb_file in Path('notebooks').glob('*.ipynb'):
314
custom_clean_workflow(nb_file)
315
```
316
317
## Best Practices
318
319
### Regular Cleaning
320
321
```bash
322
# Add to your development workflow
323
nbdev_clean # Before committing changes
324
nbdev_trust # After pulling changes
325
```
326
327
### Team Collaboration
328
329
- Install hooks on all development machines: `nbdev_install_hooks()`
330
- Use consistent cleaning settings across team
331
- Clean notebooks before sharing or committing
332
- Trust notebooks after pulling from shared repository
333
334
### CI/CD Integration
335
336
```yaml
337
# In GitHub Actions
338
- name: Clean notebooks
339
run: nbdev_clean --clear_all
340
341
- name: Verify notebooks are clean
342
run: |
343
nbdev_clean
344
git diff --exit-code # Fail if notebooks weren't already clean
345
```