0
# Image Scrapers
1
2
System for extracting and processing images from executed Python code, with built-in matplotlib support and extensible architecture for custom scrapers. Scrapers automatically capture visualizations generated during example execution.
3
4
## Capabilities
5
6
### Matplotlib Scraper
7
8
The primary built-in scraper for capturing matplotlib figures.
9
10
```python { .api }
11
def matplotlib_scraper(block, block_vars, gallery_conf, **kwargs):
12
"""
13
Scrapes matplotlib figures from code execution.
14
15
Automatically detects and saves matplotlib figures created during
16
code block execution, handling both explicit plt.show() calls and
17
figures created but not explicitly shown.
18
19
Parameters:
20
- block: dict, code block information with 'code_obj' and execution context
21
- block_vars: dict, variables from code execution including local/global scope
22
- gallery_conf: dict, gallery configuration options
23
- **kwargs: Additional scraper arguments
24
25
Returns:
26
list: Filenames of images that were saved
27
"""
28
```
29
30
#### Usage in Configuration
31
32
```python
33
# conf.py
34
sphinx_gallery_conf = {
35
'image_scrapers': ['matplotlib'], # Default scraper
36
# or with custom settings:
37
'image_scrapers': [matplotlib_scraper],
38
}
39
```
40
41
#### Automatic Figure Detection
42
43
The matplotlib scraper automatically:
44
- Captures all open matplotlib figures
45
- Handles multiple figures per code block
46
- Supports subplots and complex layouts
47
- Saves in PNG format with configurable DPI
48
- Generates thumbnails for gallery display
49
50
### Figure Saving System
51
52
Main function for saving figures using configured scrapers.
53
54
```python { .api }
55
def save_figures(block, block_vars, gallery_conf):
56
"""
57
Main function to save figures using configured scrapers.
58
59
Iterates through all configured scrapers and saves any figures
60
they detect from the executed code block.
61
62
Parameters:
63
- block: dict, code block information
64
- block_vars: dict, execution variables
65
- gallery_conf: dict, gallery configuration
66
67
Returns:
68
list: All saved image filenames from all scrapers
69
"""
70
```
71
72
### Image Path Iterator
73
74
Utility class for generating sequential image filenames.
75
76
```python { .api }
77
class ImagePathIterator:
78
"""
79
Iterator for generating sequential image paths.
80
81
Generates sequential filenames for images within an example,
82
ensuring unique names and proper organization.
83
"""
84
85
def __init__(self, image_path):
86
"""
87
Parameters:
88
- image_path: str, base image path template
89
"""
90
91
def __iter__(self):
92
"""Returns iterator instance."""
93
94
def __next__(self):
95
"""
96
Returns:
97
str: Next sequential image filename
98
"""
99
```
100
101
#### Usage Example
102
103
```python
104
from sphinx_gallery.scrapers import ImagePathIterator
105
106
iterator = ImagePathIterator('/path/to/images/sphx_glr_example_{:03d}.png')
107
first_image = next(iterator) # sphx_glr_example_001.png
108
second_image = next(iterator) # sphx_glr_example_002.png
109
```
110
111
### RST Generation
112
113
Function for generating RST code to embed images in documentation.
114
115
```python { .api }
116
def figure_rst(figure_list, sources_dir, fig_titles="", srcsetpaths=None):
117
"""
118
Generates RST code for embedding images in documentation.
119
120
Creates properly formatted RST image directives with responsive
121
srcset support and appropriate styling classes.
122
123
Parameters:
124
- figure_list: list, image filenames to embed
125
- sources_dir: str, source directory path for resolving relative paths
126
- fig_titles: str or list, titles for images (optional)
127
- srcsetpaths: list, responsive image paths for srcset (optional)
128
129
Returns:
130
str: RST code for embedding the images
131
"""
132
```
133
134
#### Generated RST Example
135
136
```rst
137
.. image-sg:: /auto_examples/images/sphx_glr_plot_001.png
138
:alt: Plot output
139
:srcset: /auto_examples/images/sphx_glr_plot_001.png, /auto_examples/images/sphx_glr_plot_001_2x.png 2x
140
:class: sphx-glr-single-img
141
```
142
143
### Module Cleanup
144
145
Function for resetting Python modules between example executions.
146
147
```python { .api }
148
def clean_modules(gallery_conf, fname, when):
149
"""
150
Resets/cleans modules between example executions.
151
152
Removes specified modules from sys.modules to ensure clean
153
execution environment for each example.
154
155
Parameters:
156
- gallery_conf: dict, gallery configuration with 'reset_modules'
157
- fname: str, current filename being processed
158
- when: str, when cleanup is happening ('before' or 'after')
159
"""
160
```
161
162
## Custom Scrapers
163
164
### Creating Custom Scrapers
165
166
You can create custom scrapers for other visualization libraries:
167
168
```python
169
def plotly_scraper(block, block_vars, gallery_conf):
170
"""Custom scraper for Plotly figures."""
171
import plotly.io as pio
172
173
# Get all current plotly figures
174
figures = []
175
176
# Check for plotly figures in the execution namespace
177
for var_name, var_value in block_vars.items():
178
if hasattr(var_value, '_plot_html'):
179
# This is a plotly figure
180
image_path_iterator = block_vars['image_path_iterator']
181
img_fname = next(image_path_iterator)
182
183
# Save as static image
184
pio.write_image(var_value, img_fname)
185
figures.append(img_fname)
186
187
return figures
188
189
# Configuration
190
sphinx_gallery_conf = {
191
'image_scrapers': [matplotlib_scraper, plotly_scraper],
192
}
193
```
194
195
### Scraper Requirements
196
197
Custom scrapers must:
198
199
1. Accept `(block, block_vars, gallery_conf)` parameters
200
2. Return list of saved image filenames
201
3. Handle cleanup of any temporary resources
202
4. Use the provided `image_path_iterator` for filenames
203
204
### Multi-Library Support
205
206
Configure multiple scrapers for different visualization libraries:
207
208
```python
209
from sphinx_gallery.scrapers import matplotlib_scraper
210
211
def mayavi_scraper(block, block_vars, gallery_conf):
212
"""Scraper for Mayavi 3D visualizations."""
213
# Implementation for Mayavi figure detection and saving
214
pass
215
216
def seaborn_scraper(block, block_vars, gallery_conf):
217
"""Scraper for Seaborn statistical plots."""
218
# Seaborn uses matplotlib backend, so matplotlib_scraper handles it
219
# This is just an example of how you might extend it
220
pass
221
222
sphinx_gallery_conf = {
223
'image_scrapers': [
224
matplotlib_scraper,
225
mayavi_scraper,
226
'plotly', # Built-in plotly scraper (if available)
227
],
228
}
229
```
230
231
## Configuration Options
232
233
### Image Quality and Format
234
235
```python
236
sphinx_gallery_conf = {
237
'image_scrapers': ['matplotlib'],
238
'compress_images': ['images'], # Directories to compress
239
'compress_images_args': ['-quality', '85'], # ImageMagick args
240
'thumbnail_size': (200, 200), # Thumbnail dimensions
241
}
242
```
243
244
### Module Management
245
246
```python
247
sphinx_gallery_conf = {
248
'reset_modules': ('matplotlib.pyplot', 'seaborn'), # Reset between examples
249
'capture_repr': ('matplotlib.figure.Figure',), # Capture object representations
250
}
251
```
252
253
## Advanced Usage
254
255
### Error Handling in Scrapers
256
257
```python
258
def robust_scraper(block, block_vars, gallery_conf):
259
"""Example of robust error handling in scrapers."""
260
figures = []
261
262
try:
263
# Scraper logic here
264
pass
265
except Exception as e:
266
# Log error but don't break build
267
print(f"Scraper error: {e}")
268
269
return figures
270
```
271
272
### Conditional Scraping
273
274
```python
275
def conditional_scraper(block, block_vars, gallery_conf):
276
"""Scraper that only runs under certain conditions."""
277
278
# Only run if specific library is imported
279
if 'my_viz_lib' not in block_vars:
280
return []
281
282
# Scraping logic here
283
return saved_images
284
```
285
286
### Integration with Sphinx Events
287
288
Scrapers integrate with Sphinx's build process:
289
290
1. **Code Execution**: Example code runs in isolated namespace
291
2. **Scraper Execution**: All configured scrapers run after each code block
292
3. **Image Processing**: Images are processed, resized, and optimized
293
4. **RST Generation**: Image directives are added to generated RST
294
5. **HTML Generation**: Final HTML includes responsive images
295
296
## Troubleshooting
297
298
### Common Issues
299
300
- **Missing Images**: Ensure figures are created before scraper runs
301
- **Memory Issues**: Use `reset_modules` to clean up between examples
302
- **Format Issues**: Check that scraper saves in supported formats (PNG, JPG)
303
- **Path Issues**: Use provided `image_path_iterator` for consistent naming
304
305
### Debugging Scrapers
306
307
```python
308
sphinx_gallery_conf = {
309
'log_level': {'examples_log_level': 'DEBUG'}, # Enable debug logging
310
'only_warn_on_example_error': True, # Continue on errors
311
}
312
```