Tessl Tile for pypi/nbsphinx@0.9.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

configuration.md custom-directives.md index.md notebook-processing.md sphinx-extension.md text-processing.md

text-processing.mddocs/

0
# Text Processing
1

2
Utilities for converting between formats, handling Markdown/RST conversion, and processing notebook content. These functions provide the text transformation capabilities needed for converting notebook markup to Sphinx-compatible formats.
3

4
## Capabilities
5

6
### Markdown to RST Conversion
7

8
Core function for converting Markdown text to reStructuredText with LaTeX math support and custom filters.
9

10
```python { .api }
11
def markdown2rst(text):
12
    """
13
    Convert a Markdown string to reST via pandoc.
14
    
15
    This is very similar to nbconvert.filters.markdown.markdown2rst(),
16
    except that it uses a pandoc filter to convert raw LaTeX blocks to
17
    "math" directives (instead of "raw:: latex" directives).
18
    
19
    Parameters:
20
    - text: str, Markdown text to convert
21
    
22
    Returns:
23
    str: Converted reStructuredText with proper math directive formatting,
24
         image definitions, and citation processing
25
    """
26
```
27

28
Usage example:
29

30
```python
31
from nbsphinx import markdown2rst
32

33
# Convert Markdown with math to RST
34
markdown_text = """
35
# My Title
36

37
This is some text with inline math $x = y + z$ and display math:
38

39
$$
40
\\int_0^\\infty e^{-x} dx = 1
41
$$
42

43
![Image](image.png)
44
"""
45

46
rst_text = markdown2rst(markdown_text)
47
print(rst_text)
48
# Output includes proper RST math directives and image handling
49
```
50

51
### Pandoc Wrapper
52

53
Direct interface to pandoc for format conversion with optional filter functions.
54

55
```python { .api }
56
def pandoc(source, fmt, to, filter_func=None):
57
    """
58
    Convert a string in format `from` to format `to` via pandoc.
59
    
60
    This is based on nbconvert.utils.pandoc.pandoc() and extended to
61
    allow passing a filter function.
62
    
63
    Parameters:
64
    - source: str, source text to convert
65
    - fmt: str, input format ('markdown', 'html', etc.)
66
    - to: str, output format ('rst', 'latex', etc.) 
67
    - filter_func: callable, optional filter function for JSON processing
68
    
69
    Returns:
70
    str: Converted text in target format
71
    """
72
```
73

74
Usage example:
75

76
```python
77
from nbsphinx import pandoc
78

79
# Basic conversion
80
html_text = "<p>Hello <strong>world</strong></p>"
81
rst_text = pandoc(html_text, 'html', 'rst')
82

83
# With custom filter
84
def my_filter(json_text):
85
    # Custom processing of pandoc JSON AST
86
    return json_text
87

88
rst_text = pandoc(html_text, 'html', 'rst', filter_func=my_filter)
89
```
90

91
### Legacy Compatibility
92

93
Compatibility wrapper for older nbconvert versions.
94

95
```python { .api }
96
def convert_pandoc(text, from_format, to_format):
97
    """
98
    Simple wrapper for markdown2rst.
99
    
100
    In nbconvert version 5.0, the use of markdown2rst in the RST
101
    template was replaced by the new filter function convert_pandoc.
102
    
103
    Parameters:
104
    - text: str, text to convert
105
    - from_format: str, input format (must be 'markdown')
106
    - to_format: str, output format (must be 'rst')
107
    
108
    Returns:
109
    str: Converted reStructuredText
110
    
111
    Raises:
112
    ValueError: If formats other than markdown->rst are requested
113
    """
114
```
115

116
### HTML Parsing
117

118
Specialized HTML parsers for handling citations and images in notebook content.
119

120
```python { .api }
121
class CitationParser(html.parser.HTMLParser):
122
    """
123
    HTML parser for citation elements.
124
    
125
    Processes HTML elements with citation data attributes
126
    and converts them to Sphinx citation references.
127
    
128
    Methods:
129
    - handle_starttag(tag, attrs): Process opening tags
130
    - handle_endtag(tag): Process closing tags  
131
    - handle_startendtag(tag, attrs): Process self-closing tags
132
    - reset(): Reset parser state
133
    
134
    Attributes:
135
    - starttag: str, current opening tag
136
    - endtag: str, current closing tag
137
    - cite: str, formatted citation reference
138
    """
139

140
class ImgParser(html.parser.HTMLParser):
141
    """
142
    Turn HTML <img> tags into raw RST blocks.
143
    
144
    Converts HTML image elements to reStructuredText image directives
145
    with proper attribute handling and data URI support.
146
    
147
    Methods:
148
    - handle_starttag(tag, attrs): Process opening img tags
149
    - handle_startendtag(tag, attrs): Process self-closing img tags
150
    - reset(): Reset parser state
151
    
152
    Attributes:
153
    - obj: dict, pandoc AST object for the image
154
    - definition: str, RST image directive definition
155
    """
156
```
157

158
Usage example:
159

160
```python
161
from nbsphinx import CitationParser, ImgParser
162

163
# Parse citations
164
citation_html = '<span data-cite="author2023">Citation text</span>'
165
parser = CitationParser()
166
parser.feed(citation_html)
167
print(parser.cite)  # :cite:`author2023`
168

169
# Parse images  
170
img_html = '<img src="plot.png" alt="My Plot" width="500">'
171
img_parser = ImgParser()
172
img_parser.feed(img_html)
173
print(img_parser.definition)  # RST image directive
174
```
175

176
### Utility Functions
177

178
Helper functions for text processing and content extraction.
179

180
```python { .api }
181
def _extract_gallery_or_toctree(cell):
182
    """
183
    Extract links from Markdown cell and create gallery/toctree.
184
    
185
    Parameters:
186
    - cell: NotebookNode, notebook cell with gallery metadata
187
    
188
    Returns:
189
    str: RST directive for gallery or toctree
190
    """
191

192
def _get_empty_lines(text):
193
    """
194
    Get number of empty lines before and after code.
195
    
196
    Parameters:
197
    - text: str, code text to analyze
198
    
199
    Returns:
200
    tuple: (before, after) - number of empty lines
201
    """
202

203
def _get_output_type(output):
204
    """
205
    Choose appropriate output data types for HTML and LaTeX.
206
    
207
    Parameters:
208
    - output: NotebookNode, notebook output cell
209
    
210
    Returns:
211
    tuple: (html_datatype, latex_datatype) - appropriate MIME types
212
    """
213

214
def _local_file_from_reference(node, document):
215
    """
216
    Get local file path from document reference node.
217
    
218
    Parameters:
219
    - node: docutils node with reference
220
    - document: docutils document containing the node
221
    
222
    Returns:
223
    str: Local file path or None if not a local file reference
224
    """
225
```
226

227
## Format Constants
228

229
Pre-defined MIME type priorities for different output formats.
230

231
```python { .api }
232
# Display data priority for HTML output
233
DISPLAY_DATA_PRIORITY_HTML = (
234
    'application/vnd.jupyter.widget-state+json',
235
    'application/vnd.jupyter.widget-view+json', 
236
    'application/javascript',
237
    'text/html',
238
    'text/markdown',
239
    'image/svg+xml',
240
    'text/latex',
241
    'image/png',
242
    'image/jpeg',
243
    'text/plain',
244
)
245

246
# Display data priority for LaTeX output  
247
DISPLAY_DATA_PRIORITY_LATEX = (
248
    'text/latex',
249
    'application/pdf',
250
    'image/png', 
251
    'image/jpeg',
252
    'image/svg+xml',
253
    'text/markdown',
254
    'text/plain',
255
)
256

257
# Thumbnail MIME type mappings
258
THUMBNAIL_MIME_TYPES = {
259
    'image/svg+xml': '.svg',
260
    'image/png': '.png', 
261
    'image/jpeg': '.jpg',
262
}
263
```
264

265
These constants control how different types of notebook output are prioritized and processed for display in HTML and LaTeX formats.

Version

Tile

Files

text-processing.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

text-processing.mddocs/