CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/pypi-sphinx-gallery

A Sphinx extension that builds an HTML gallery of examples from any set of Python scripts.

Overview
Eval results
Files

scrapers.mddocs/

Image Scrapers

System for extracting and processing images from executed Python code, with built-in matplotlib support and extensible architecture for custom scrapers. Scrapers automatically capture visualizations generated during example execution.

Capabilities

Matplotlib Scraper

The primary built-in scraper for capturing matplotlib figures.

def matplotlib_scraper(block, block_vars, gallery_conf, **kwargs):
    """
    Scrapes matplotlib figures from code execution.
    
    Automatically detects and saves matplotlib figures created during
    code block execution, handling both explicit plt.show() calls and
    figures created but not explicitly shown.
    
    Parameters:
    - block: dict, code block information with 'code_obj' and execution context
    - block_vars: dict, variables from code execution including local/global scope
    - gallery_conf: dict, gallery configuration options
    - **kwargs: Additional scraper arguments
    
    Returns:
    list: Filenames of images that were saved
    """

Usage in Configuration

# conf.py
sphinx_gallery_conf = {
    'image_scrapers': ['matplotlib'],  # Default scraper
    # or with custom settings:
    'image_scrapers': [matplotlib_scraper],
}

Automatic Figure Detection

The matplotlib scraper automatically:

  • Captures all open matplotlib figures
  • Handles multiple figures per code block
  • Supports subplots and complex layouts
  • Saves in PNG format with configurable DPI
  • Generates thumbnails for gallery display

Figure Saving System

Main function for saving figures using configured scrapers.

def save_figures(block, block_vars, gallery_conf):
    """
    Main function to save figures using configured scrapers.
    
    Iterates through all configured scrapers and saves any figures
    they detect from the executed code block.
    
    Parameters:
    - block: dict, code block information
    - block_vars: dict, execution variables
    - gallery_conf: dict, gallery configuration
    
    Returns:
    list: All saved image filenames from all scrapers
    """

Image Path Iterator

Utility class for generating sequential image filenames.

class ImagePathIterator:
    """
    Iterator for generating sequential image paths.
    
    Generates sequential filenames for images within an example,
    ensuring unique names and proper organization.
    """
    
    def __init__(self, image_path):
        """
        Parameters:
        - image_path: str, base image path template
        """
    
    def __iter__(self):
        """Returns iterator instance."""
    
    def __next__(self):
        """
        Returns:
        str: Next sequential image filename
        """

Usage Example

from sphinx_gallery.scrapers import ImagePathIterator

iterator = ImagePathIterator('/path/to/images/sphx_glr_example_{:03d}.png')
first_image = next(iterator)   # sphx_glr_example_001.png
second_image = next(iterator)  # sphx_glr_example_002.png

RST Generation

Function for generating RST code to embed images in documentation.

def figure_rst(figure_list, sources_dir, fig_titles="", srcsetpaths=None):
    """
    Generates RST code for embedding images in documentation.
    
    Creates properly formatted RST image directives with responsive
    srcset support and appropriate styling classes.
    
    Parameters:
    - figure_list: list, image filenames to embed
    - sources_dir: str, source directory path for resolving relative paths
    - fig_titles: str or list, titles for images (optional)
    - srcsetpaths: list, responsive image paths for srcset (optional)
    
    Returns:
    str: RST code for embedding the images
    """

Generated RST Example

.. image-sg:: /auto_examples/images/sphx_glr_plot_001.png
    :alt: Plot output
    :srcset: /auto_examples/images/sphx_glr_plot_001.png, /auto_examples/images/sphx_glr_plot_001_2x.png 2x
    :class: sphx-glr-single-img

Module Cleanup

Function for resetting Python modules between example executions.

def clean_modules(gallery_conf, fname, when):
    """
    Resets/cleans modules between example executions.
    
    Removes specified modules from sys.modules to ensure clean
    execution environment for each example.
    
    Parameters:
    - gallery_conf: dict, gallery configuration with 'reset_modules'
    - fname: str, current filename being processed
    - when: str, when cleanup is happening ('before' or 'after')
    """

Custom Scrapers

Creating Custom Scrapers

You can create custom scrapers for other visualization libraries:

def plotly_scraper(block, block_vars, gallery_conf):
    """Custom scraper for Plotly figures."""
    import plotly.io as pio
    
    # Get all current plotly figures
    figures = []
    
    # Check for plotly figures in the execution namespace
    for var_name, var_value in block_vars.items():
        if hasattr(var_value, '_plot_html'):
            # This is a plotly figure
            image_path_iterator = block_vars['image_path_iterator']
            img_fname = next(image_path_iterator)
            
            # Save as static image
            pio.write_image(var_value, img_fname)
            figures.append(img_fname)
    
    return figures

# Configuration
sphinx_gallery_conf = {
    'image_scrapers': [matplotlib_scraper, plotly_scraper],
}

Scraper Requirements

Custom scrapers must:

  1. Accept (block, block_vars, gallery_conf) parameters
  2. Return list of saved image filenames
  3. Handle cleanup of any temporary resources
  4. Use the provided image_path_iterator for filenames

Multi-Library Support

Configure multiple scrapers for different visualization libraries:

from sphinx_gallery.scrapers import matplotlib_scraper

def mayavi_scraper(block, block_vars, gallery_conf):
    """Scraper for Mayavi 3D visualizations."""
    # Implementation for Mayavi figure detection and saving
    pass

def seaborn_scraper(block, block_vars, gallery_conf):
    """Scraper for Seaborn statistical plots."""
    # Seaborn uses matplotlib backend, so matplotlib_scraper handles it
    # This is just an example of how you might extend it
    pass

sphinx_gallery_conf = {
    'image_scrapers': [
        matplotlib_scraper,
        mayavi_scraper,
        'plotly',  # Built-in plotly scraper (if available)
    ],
}

Configuration Options

Image Quality and Format

sphinx_gallery_conf = {
    'image_scrapers': ['matplotlib'],
    'compress_images': ['images'],  # Directories to compress
    'compress_images_args': ['-quality', '85'],  # ImageMagick args
    'thumbnail_size': (200, 200),  # Thumbnail dimensions
}

Module Management

sphinx_gallery_conf = {
    'reset_modules': ('matplotlib.pyplot', 'seaborn'),  # Reset between examples
    'capture_repr': ('matplotlib.figure.Figure',),      # Capture object representations
}

Advanced Usage

Error Handling in Scrapers

def robust_scraper(block, block_vars, gallery_conf):
    """Example of robust error handling in scrapers."""
    figures = []
    
    try:
        # Scraper logic here
        pass
    except Exception as e:
        # Log error but don't break build
        print(f"Scraper error: {e}")
    
    return figures

Conditional Scraping

def conditional_scraper(block, block_vars, gallery_conf):
    """Scraper that only runs under certain conditions."""
    
    # Only run if specific library is imported
    if 'my_viz_lib' not in block_vars:
        return []
    
    # Scraping logic here
    return saved_images

Integration with Sphinx Events

Scrapers integrate with Sphinx's build process:

  1. Code Execution: Example code runs in isolated namespace
  2. Scraper Execution: All configured scrapers run after each code block
  3. Image Processing: Images are processed, resized, and optimized
  4. RST Generation: Image directives are added to generated RST
  5. HTML Generation: Final HTML includes responsive images

Troubleshooting

Common Issues

  • Missing Images: Ensure figures are created before scraper runs
  • Memory Issues: Use reset_modules to clean up between examples
  • Format Issues: Check that scraper saves in supported formats (PNG, JPG)
  • Path Issues: Use provided image_path_iterator for consistent naming

Debugging Scrapers

sphinx_gallery_conf = {
    'log_level': {'examples_log_level': 'DEBUG'},  # Enable debug logging
    'only_warn_on_example_error': True,           # Continue on errors
}

Install with Tessl CLI

npx tessl i tessl/pypi-sphinx-gallery

docs

directives.md

extension-setup.md

index.md

notebooks.md

scrapers.md

sorting.md

utilities.md

tile.json