or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

directives.mdextension-setup.mdindex.mdnotebooks.mdscrapers.mdsorting.mdutilities.md

scrapers.mddocs/

0

# Image Scrapers

1

2

System for extracting and processing images from executed Python code, with built-in matplotlib support and extensible architecture for custom scrapers. Scrapers automatically capture visualizations generated during example execution.

3

4

## Capabilities

5

6

### Matplotlib Scraper

7

8

The primary built-in scraper for capturing matplotlib figures.

9

10

```python { .api }

11

def matplotlib_scraper(block, block_vars, gallery_conf, **kwargs):

12

"""

13

Scrapes matplotlib figures from code execution.

14

15

Automatically detects and saves matplotlib figures created during

16

code block execution, handling both explicit plt.show() calls and

17

figures created but not explicitly shown.

18

19

Parameters:

20

- block: dict, code block information with 'code_obj' and execution context

21

- block_vars: dict, variables from code execution including local/global scope

22

- gallery_conf: dict, gallery configuration options

23

- **kwargs: Additional scraper arguments

24

25

Returns:

26

list: Filenames of images that were saved

27

"""

28

```

29

30

#### Usage in Configuration

31

32

```python

33

# conf.py

34

sphinx_gallery_conf = {

35

'image_scrapers': ['matplotlib'], # Default scraper

36

# or with custom settings:

37

'image_scrapers': [matplotlib_scraper],

38

}

39

```

40

41

#### Automatic Figure Detection

42

43

The matplotlib scraper automatically:

44

- Captures all open matplotlib figures

45

- Handles multiple figures per code block

46

- Supports subplots and complex layouts

47

- Saves in PNG format with configurable DPI

48

- Generates thumbnails for gallery display

49

50

### Figure Saving System

51

52

Main function for saving figures using configured scrapers.

53

54

```python { .api }

55

def save_figures(block, block_vars, gallery_conf):

56

"""

57

Main function to save figures using configured scrapers.

58

59

Iterates through all configured scrapers and saves any figures

60

they detect from the executed code block.

61

62

Parameters:

63

- block: dict, code block information

64

- block_vars: dict, execution variables

65

- gallery_conf: dict, gallery configuration

66

67

Returns:

68

list: All saved image filenames from all scrapers

69

"""

70

```

71

72

### Image Path Iterator

73

74

Utility class for generating sequential image filenames.

75

76

```python { .api }

77

class ImagePathIterator:

78

"""

79

Iterator for generating sequential image paths.

80

81

Generates sequential filenames for images within an example,

82

ensuring unique names and proper organization.

83

"""

84

85

def __init__(self, image_path):

86

"""

87

Parameters:

88

- image_path: str, base image path template

89

"""

90

91

def __iter__(self):

92

"""Returns iterator instance."""

93

94

def __next__(self):

95

"""

96

Returns:

97

str: Next sequential image filename

98

"""

99

```

100

101

#### Usage Example

102

103

```python

104

from sphinx_gallery.scrapers import ImagePathIterator

105

106

iterator = ImagePathIterator('/path/to/images/sphx_glr_example_{:03d}.png')

107

first_image = next(iterator) # sphx_glr_example_001.png

108

second_image = next(iterator) # sphx_glr_example_002.png

109

```

110

111

### RST Generation

112

113

Function for generating RST code to embed images in documentation.

114

115

```python { .api }

116

def figure_rst(figure_list, sources_dir, fig_titles="", srcsetpaths=None):

117

"""

118

Generates RST code for embedding images in documentation.

119

120

Creates properly formatted RST image directives with responsive

121

srcset support and appropriate styling classes.

122

123

Parameters:

124

- figure_list: list, image filenames to embed

125

- sources_dir: str, source directory path for resolving relative paths

126

- fig_titles: str or list, titles for images (optional)

127

- srcsetpaths: list, responsive image paths for srcset (optional)

128

129

Returns:

130

str: RST code for embedding the images

131

"""

132

```

133

134

#### Generated RST Example

135

136

```rst

137

.. image-sg:: /auto_examples/images/sphx_glr_plot_001.png

138

:alt: Plot output

139

:srcset: /auto_examples/images/sphx_glr_plot_001.png, /auto_examples/images/sphx_glr_plot_001_2x.png 2x

140

:class: sphx-glr-single-img

141

```

142

143

### Module Cleanup

144

145

Function for resetting Python modules between example executions.

146

147

```python { .api }

148

def clean_modules(gallery_conf, fname, when):

149

"""

150

Resets/cleans modules between example executions.

151

152

Removes specified modules from sys.modules to ensure clean

153

execution environment for each example.

154

155

Parameters:

156

- gallery_conf: dict, gallery configuration with 'reset_modules'

157

- fname: str, current filename being processed

158

- when: str, when cleanup is happening ('before' or 'after')

159

"""

160

```

161

162

## Custom Scrapers

163

164

### Creating Custom Scrapers

165

166

You can create custom scrapers for other visualization libraries:

167

168

```python

169

def plotly_scraper(block, block_vars, gallery_conf):

170

"""Custom scraper for Plotly figures."""

171

import plotly.io as pio

172

173

# Get all current plotly figures

174

figures = []

175

176

# Check for plotly figures in the execution namespace

177

for var_name, var_value in block_vars.items():

178

if hasattr(var_value, '_plot_html'):

179

# This is a plotly figure

180

image_path_iterator = block_vars['image_path_iterator']

181

img_fname = next(image_path_iterator)

182

183

# Save as static image

184

pio.write_image(var_value, img_fname)

185

figures.append(img_fname)

186

187

return figures

188

189

# Configuration

190

sphinx_gallery_conf = {

191

'image_scrapers': [matplotlib_scraper, plotly_scraper],

192

}

193

```

194

195

### Scraper Requirements

196

197

Custom scrapers must:

198

199

1. Accept `(block, block_vars, gallery_conf)` parameters

200

2. Return list of saved image filenames

201

3. Handle cleanup of any temporary resources

202

4. Use the provided `image_path_iterator` for filenames

203

204

### Multi-Library Support

205

206

Configure multiple scrapers for different visualization libraries:

207

208

```python

209

from sphinx_gallery.scrapers import matplotlib_scraper

210

211

def mayavi_scraper(block, block_vars, gallery_conf):

212

"""Scraper for Mayavi 3D visualizations."""

213

# Implementation for Mayavi figure detection and saving

214

pass

215

216

def seaborn_scraper(block, block_vars, gallery_conf):

217

"""Scraper for Seaborn statistical plots."""

218

# Seaborn uses matplotlib backend, so matplotlib_scraper handles it

219

# This is just an example of how you might extend it

220

pass

221

222

sphinx_gallery_conf = {

223

'image_scrapers': [

224

matplotlib_scraper,

225

mayavi_scraper,

226

'plotly', # Built-in plotly scraper (if available)

227

],

228

}

229

```

230

231

## Configuration Options

232

233

### Image Quality and Format

234

235

```python

236

sphinx_gallery_conf = {

237

'image_scrapers': ['matplotlib'],

238

'compress_images': ['images'], # Directories to compress

239

'compress_images_args': ['-quality', '85'], # ImageMagick args

240

'thumbnail_size': (200, 200), # Thumbnail dimensions

241

}

242

```

243

244

### Module Management

245

246

```python

247

sphinx_gallery_conf = {

248

'reset_modules': ('matplotlib.pyplot', 'seaborn'), # Reset between examples

249

'capture_repr': ('matplotlib.figure.Figure',), # Capture object representations

250

}

251

```

252

253

## Advanced Usage

254

255

### Error Handling in Scrapers

256

257

```python

258

def robust_scraper(block, block_vars, gallery_conf):

259

"""Example of robust error handling in scrapers."""

260

figures = []

261

262

try:

263

# Scraper logic here

264

pass

265

except Exception as e:

266

# Log error but don't break build

267

print(f"Scraper error: {e}")

268

269

return figures

270

```

271

272

### Conditional Scraping

273

274

```python

275

def conditional_scraper(block, block_vars, gallery_conf):

276

"""Scraper that only runs under certain conditions."""

277

278

# Only run if specific library is imported

279

if 'my_viz_lib' not in block_vars:

280

return []

281

282

# Scraping logic here

283

return saved_images

284

```

285

286

### Integration with Sphinx Events

287

288

Scrapers integrate with Sphinx's build process:

289

290

1. **Code Execution**: Example code runs in isolated namespace

291

2. **Scraper Execution**: All configured scrapers run after each code block

292

3. **Image Processing**: Images are processed, resized, and optimized

293

4. **RST Generation**: Image directives are added to generated RST

294

5. **HTML Generation**: Final HTML includes responsive images

295

296

## Troubleshooting

297

298

### Common Issues

299

300

- **Missing Images**: Ensure figures are created before scraper runs

301

- **Memory Issues**: Use `reset_modules` to clean up between examples

302

- **Format Issues**: Check that scraper saves in supported formats (PNG, JPG)

303

- **Path Issues**: Use provided `image_path_iterator` for consistent naming

304

305

### Debugging Scrapers

306

307

```python

308

sphinx_gallery_conf = {

309

'log_level': {'examples_log_level': 'DEBUG'}, # Enable debug logging

310

'only_warn_on_example_error': True, # Continue on errors

311

}

312

```