CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/pypi-nbsphinx

Jupyter Notebook Tools for Sphinx - a Sphinx extension that provides a source parser for .ipynb files with custom directives

Pending
Overview
Eval results
Files

text-processing.mddocs/

Text Processing

Utilities for converting between formats, handling Markdown/RST conversion, and processing notebook content. These functions provide the text transformation capabilities needed for converting notebook markup to Sphinx-compatible formats.

Capabilities

Markdown to RST Conversion

Core function for converting Markdown text to reStructuredText with LaTeX math support and custom filters.

def markdown2rst(text):
    """
    Convert a Markdown string to reST via pandoc.
    
    This is very similar to nbconvert.filters.markdown.markdown2rst(),
    except that it uses a pandoc filter to convert raw LaTeX blocks to
    "math" directives (instead of "raw:: latex" directives).
    
    Parameters:
    - text: str, Markdown text to convert
    
    Returns:
    str: Converted reStructuredText with proper math directive formatting,
         image definitions, and citation processing
    """

Usage example:

from nbsphinx import markdown2rst

# Convert Markdown with math to RST
markdown_text = """
# My Title

This is some text with inline math $x = y + z$ and display math:

$$
\\int_0^\\infty e^{-x} dx = 1
$$

![Image](image.png)
"""

rst_text = markdown2rst(markdown_text)
print(rst_text)
# Output includes proper RST math directives and image handling

Pandoc Wrapper

Direct interface to pandoc for format conversion with optional filter functions.

def pandoc(source, fmt, to, filter_func=None):
    """
    Convert a string in format `from` to format `to` via pandoc.
    
    This is based on nbconvert.utils.pandoc.pandoc() and extended to
    allow passing a filter function.
    
    Parameters:
    - source: str, source text to convert
    - fmt: str, input format ('markdown', 'html', etc.)
    - to: str, output format ('rst', 'latex', etc.) 
    - filter_func: callable, optional filter function for JSON processing
    
    Returns:
    str: Converted text in target format
    """

Usage example:

from nbsphinx import pandoc

# Basic conversion
html_text = "<p>Hello <strong>world</strong></p>"
rst_text = pandoc(html_text, 'html', 'rst')

# With custom filter
def my_filter(json_text):
    # Custom processing of pandoc JSON AST
    return json_text

rst_text = pandoc(html_text, 'html', 'rst', filter_func=my_filter)

Legacy Compatibility

Compatibility wrapper for older nbconvert versions.

def convert_pandoc(text, from_format, to_format):
    """
    Simple wrapper for markdown2rst.
    
    In nbconvert version 5.0, the use of markdown2rst in the RST
    template was replaced by the new filter function convert_pandoc.
    
    Parameters:
    - text: str, text to convert
    - from_format: str, input format (must be 'markdown')
    - to_format: str, output format (must be 'rst')
    
    Returns:
    str: Converted reStructuredText
    
    Raises:
    ValueError: If formats other than markdown->rst are requested
    """

HTML Parsing

Specialized HTML parsers for handling citations and images in notebook content.

class CitationParser(html.parser.HTMLParser):
    """
    HTML parser for citation elements.
    
    Processes HTML elements with citation data attributes
    and converts them to Sphinx citation references.
    
    Methods:
    - handle_starttag(tag, attrs): Process opening tags
    - handle_endtag(tag): Process closing tags  
    - handle_startendtag(tag, attrs): Process self-closing tags
    - reset(): Reset parser state
    
    Attributes:
    - starttag: str, current opening tag
    - endtag: str, current closing tag
    - cite: str, formatted citation reference
    """

class ImgParser(html.parser.HTMLParser):
    """
    Turn HTML <img> tags into raw RST blocks.
    
    Converts HTML image elements to reStructuredText image directives
    with proper attribute handling and data URI support.
    
    Methods:
    - handle_starttag(tag, attrs): Process opening img tags
    - handle_startendtag(tag, attrs): Process self-closing img tags
    - reset(): Reset parser state
    
    Attributes:
    - obj: dict, pandoc AST object for the image
    - definition: str, RST image directive definition
    """

Usage example:

from nbsphinx import CitationParser, ImgParser

# Parse citations
citation_html = '<span data-cite="author2023">Citation text</span>'
parser = CitationParser()
parser.feed(citation_html)
print(parser.cite)  # :cite:`author2023`

# Parse images  
img_html = '<img src="plot.png" alt="My Plot" width="500">'
img_parser = ImgParser()
img_parser.feed(img_html)
print(img_parser.definition)  # RST image directive

Utility Functions

Helper functions for text processing and content extraction.

def _extract_gallery_or_toctree(cell):
    """
    Extract links from Markdown cell and create gallery/toctree.
    
    Parameters:
    - cell: NotebookNode, notebook cell with gallery metadata
    
    Returns:
    str: RST directive for gallery or toctree
    """

def _get_empty_lines(text):
    """
    Get number of empty lines before and after code.
    
    Parameters:
    - text: str, code text to analyze
    
    Returns:
    tuple: (before, after) - number of empty lines
    """

def _get_output_type(output):
    """
    Choose appropriate output data types for HTML and LaTeX.
    
    Parameters:
    - output: NotebookNode, notebook output cell
    
    Returns:
    tuple: (html_datatype, latex_datatype) - appropriate MIME types
    """

def _local_file_from_reference(node, document):
    """
    Get local file path from document reference node.
    
    Parameters:
    - node: docutils node with reference
    - document: docutils document containing the node
    
    Returns:
    str: Local file path or None if not a local file reference
    """

Format Constants

Pre-defined MIME type priorities for different output formats.

# Display data priority for HTML output
DISPLAY_DATA_PRIORITY_HTML = (
    'application/vnd.jupyter.widget-state+json',
    'application/vnd.jupyter.widget-view+json', 
    'application/javascript',
    'text/html',
    'text/markdown',
    'image/svg+xml',
    'text/latex',
    'image/png',
    'image/jpeg',
    'text/plain',
)

# Display data priority for LaTeX output  
DISPLAY_DATA_PRIORITY_LATEX = (
    'text/latex',
    'application/pdf',
    'image/png', 
    'image/jpeg',
    'image/svg+xml',
    'text/markdown',
    'text/plain',
)

# Thumbnail MIME type mappings
THUMBNAIL_MIME_TYPES = {
    'image/svg+xml': '.svg',
    'image/png': '.png', 
    'image/jpeg': '.jpg',
}

These constants control how different types of notebook output are prioritized and processed for display in HTML and LaTeX formats.

Install with Tessl CLI

npx tessl i tessl/pypi-nbsphinx

docs

configuration.md

custom-directives.md

index.md

notebook-processing.md

sphinx-extension.md

text-processing.md

tile.json