Jupyter Notebook Tools for Sphinx - a Sphinx extension that provides a source parser for .ipynb files with custom directives
—
Utilities for converting between formats, handling Markdown/RST conversion, and processing notebook content. These functions provide the text transformation capabilities needed for converting notebook markup to Sphinx-compatible formats.
Core function for converting Markdown text to reStructuredText with LaTeX math support and custom filters.
def markdown2rst(text):
"""
Convert a Markdown string to reST via pandoc.
This is very similar to nbconvert.filters.markdown.markdown2rst(),
except that it uses a pandoc filter to convert raw LaTeX blocks to
"math" directives (instead of "raw:: latex" directives).
Parameters:
- text: str, Markdown text to convert
Returns:
str: Converted reStructuredText with proper math directive formatting,
image definitions, and citation processing
"""Usage example:
from nbsphinx import markdown2rst
# Convert Markdown with math to RST
markdown_text = """
# My Title
This is some text with inline math $x = y + z$ and display math:
$$
\\int_0^\\infty e^{-x} dx = 1
$$

"""
rst_text = markdown2rst(markdown_text)
print(rst_text)
# Output includes proper RST math directives and image handlingDirect interface to pandoc for format conversion with optional filter functions.
def pandoc(source, fmt, to, filter_func=None):
"""
Convert a string in format `from` to format `to` via pandoc.
This is based on nbconvert.utils.pandoc.pandoc() and extended to
allow passing a filter function.
Parameters:
- source: str, source text to convert
- fmt: str, input format ('markdown', 'html', etc.)
- to: str, output format ('rst', 'latex', etc.)
- filter_func: callable, optional filter function for JSON processing
Returns:
str: Converted text in target format
"""Usage example:
from nbsphinx import pandoc
# Basic conversion
html_text = "<p>Hello <strong>world</strong></p>"
rst_text = pandoc(html_text, 'html', 'rst')
# With custom filter
def my_filter(json_text):
# Custom processing of pandoc JSON AST
return json_text
rst_text = pandoc(html_text, 'html', 'rst', filter_func=my_filter)Compatibility wrapper for older nbconvert versions.
def convert_pandoc(text, from_format, to_format):
"""
Simple wrapper for markdown2rst.
In nbconvert version 5.0, the use of markdown2rst in the RST
template was replaced by the new filter function convert_pandoc.
Parameters:
- text: str, text to convert
- from_format: str, input format (must be 'markdown')
- to_format: str, output format (must be 'rst')
Returns:
str: Converted reStructuredText
Raises:
ValueError: If formats other than markdown->rst are requested
"""Specialized HTML parsers for handling citations and images in notebook content.
class CitationParser(html.parser.HTMLParser):
"""
HTML parser for citation elements.
Processes HTML elements with citation data attributes
and converts them to Sphinx citation references.
Methods:
- handle_starttag(tag, attrs): Process opening tags
- handle_endtag(tag): Process closing tags
- handle_startendtag(tag, attrs): Process self-closing tags
- reset(): Reset parser state
Attributes:
- starttag: str, current opening tag
- endtag: str, current closing tag
- cite: str, formatted citation reference
"""
class ImgParser(html.parser.HTMLParser):
"""
Turn HTML <img> tags into raw RST blocks.
Converts HTML image elements to reStructuredText image directives
with proper attribute handling and data URI support.
Methods:
- handle_starttag(tag, attrs): Process opening img tags
- handle_startendtag(tag, attrs): Process self-closing img tags
- reset(): Reset parser state
Attributes:
- obj: dict, pandoc AST object for the image
- definition: str, RST image directive definition
"""Usage example:
from nbsphinx import CitationParser, ImgParser
# Parse citations
citation_html = '<span data-cite="author2023">Citation text</span>'
parser = CitationParser()
parser.feed(citation_html)
print(parser.cite) # :cite:`author2023`
# Parse images
img_html = '<img src="plot.png" alt="My Plot" width="500">'
img_parser = ImgParser()
img_parser.feed(img_html)
print(img_parser.definition) # RST image directiveHelper functions for text processing and content extraction.
def _extract_gallery_or_toctree(cell):
"""
Extract links from Markdown cell and create gallery/toctree.
Parameters:
- cell: NotebookNode, notebook cell with gallery metadata
Returns:
str: RST directive for gallery or toctree
"""
def _get_empty_lines(text):
"""
Get number of empty lines before and after code.
Parameters:
- text: str, code text to analyze
Returns:
tuple: (before, after) - number of empty lines
"""
def _get_output_type(output):
"""
Choose appropriate output data types for HTML and LaTeX.
Parameters:
- output: NotebookNode, notebook output cell
Returns:
tuple: (html_datatype, latex_datatype) - appropriate MIME types
"""
def _local_file_from_reference(node, document):
"""
Get local file path from document reference node.
Parameters:
- node: docutils node with reference
- document: docutils document containing the node
Returns:
str: Local file path or None if not a local file reference
"""Pre-defined MIME type priorities for different output formats.
# Display data priority for HTML output
DISPLAY_DATA_PRIORITY_HTML = (
'application/vnd.jupyter.widget-state+json',
'application/vnd.jupyter.widget-view+json',
'application/javascript',
'text/html',
'text/markdown',
'image/svg+xml',
'text/latex',
'image/png',
'image/jpeg',
'text/plain',
)
# Display data priority for LaTeX output
DISPLAY_DATA_PRIORITY_LATEX = (
'text/latex',
'application/pdf',
'image/png',
'image/jpeg',
'image/svg+xml',
'text/markdown',
'text/plain',
)
# Thumbnail MIME type mappings
THUMBNAIL_MIME_TYPES = {
'image/svg+xml': '.svg',
'image/png': '.png',
'image/jpeg': '.jpg',
}These constants control how different types of notebook output are prioritized and processed for display in HTML and LaTeX formats.
Install with Tessl CLI
npx tessl i tessl/pypi-nbsphinx