0
# Block and Inline Parsing
1
2
Low-level parsing components that handle the conversion of Markdown text into structured tokens. The parsing system is split into block-level elements (paragraphs, headings, lists) and inline elements (bold, italic, links), with state management for tracking parsing progress and context.
3
4
## Capabilities
5
6
### Block Parser
7
8
Handles block-level Markdown elements like headings, paragraphs, lists, code blocks, and blockquotes.
9
10
```python { .api }
11
class BlockParser(Parser[BlockState]):
12
"""
13
Parser for block-level Markdown elements.
14
15
Handles elements that form document structure: headings, paragraphs,
16
lists, code blocks, blockquotes, tables, etc.
17
"""
18
19
def __init__(self):
20
"""Initialize block parser with default rules."""
21
22
def parse(self, state: BlockState, rules: Optional[List[str]] = None) -> None:
23
"""
24
Parse state source and populate with block tokens.
25
26
Parameters:
27
- state: BlockState to parse and populate with tokens
28
- rules: Optional list of rules to use for parsing
29
"""
30
```
31
32
### Inline Parser
33
34
Processes inline Markdown elements within block content like emphasis, links, code spans, and images.
35
36
```python { .api }
37
class InlineParser(Parser[InlineState]):
38
"""
39
Parser for inline-level Markdown elements.
40
41
Handles elements within block content: bold, italic, links,
42
images, code spans, line breaks, etc.
43
"""
44
45
def __init__(self, hard_wrap: bool = False):
46
"""
47
Initialize inline parser.
48
49
Parameters:
50
- hard_wrap: Whether to convert line breaks to <br> tags
51
"""
52
53
def __call__(self, text: str, env: MutableMapping[str, Any]) -> List[Dict[str, Any]]:
54
"""
55
Process text and return inline tokens.
56
57
Parameters:
58
- text: Text to process
59
- env: Environment mapping for parsing context
60
61
Returns:
62
List of inline tokens
63
"""
64
```
65
66
### Block State
67
68
State management for block-level parsing including cursor position, token accumulation, and parsing environment.
69
70
```python { .api }
71
class BlockState:
72
"""
73
State management for block-level parsing.
74
75
Tracks parsing progress, accumulated tokens, and contextual information
76
during the block parsing process.
77
78
Attributes:
79
- src: str - Source text being parsed
80
- tokens: List[Dict[str, Any]] - Accumulated parsed tokens
81
- cursor: int - Current position in source text
82
- cursor_max: int - Maximum position (length of source)
83
- list_tight: bool - Whether current list is tight formatting
84
- parent: Any - Parent parsing context
85
- env: MutableMapping[str, Any] - Environment variables and data
86
"""
87
88
def __init__(self, parent: Optional[Any] = None):
89
"""
90
Initialize block parsing state.
91
92
Parameters:
93
- parent: Parent state context
94
"""
95
96
def child_state(self, src: str) -> Self:
97
"""
98
Create child state for nested parsing.
99
100
Parameters:
101
- src: Source text for child state
102
103
Returns:
104
New BlockState instance with this state as parent
105
"""
106
107
def process(self, text: str) -> Self:
108
"""
109
Process text and return populated state.
110
111
Parameters:
112
- text: Text to process
113
114
Returns:
115
Self with populated tokens and updated cursor
116
"""
117
```
118
119
### Inline State
120
121
State management for inline-level parsing within block elements.
122
123
```python { .api }
124
class InlineState:
125
"""
126
State management for inline-level parsing.
127
128
Tracks parsing of inline elements within block content including
129
position tracking and environment data.
130
131
Attributes:
132
- src: str - Source text being parsed
133
- tokens: List[Dict[str, Any]] - Accumulated inline tokens
134
- pos: int - Current position in source text
135
- env: MutableMapping[str, Any] - Environment variables and data
136
"""
137
138
def __init__(self):
139
"""Initialize inline parsing state."""
140
141
def append_token(self, token: Dict[str, Any]) -> None:
142
"""
143
Add token to the token list.
144
145
Parameters:
146
- token: Token to add
147
"""
148
```
149
150
### Base Parser
151
152
Abstract base class providing common parsing functionality.
153
154
```python { .api }
155
ST = TypeVar('ST', bound=Union[BlockState, InlineState])
156
157
class Parser(Generic[ST]):
158
"""
159
Base parser class with common parsing functionality.
160
161
Provides rule registration, method dispatch, and parsing utilities
162
for both block and inline parsers.
163
"""
164
165
def register(
166
self,
167
name: str,
168
pattern: Union[str, None],
169
func: Callable,
170
before: Optional[str] = None
171
) -> None:
172
"""
173
Register a new parsing rule.
174
175
Parameters:
176
- name: Rule name
177
- pattern: Regex pattern string or None
178
- func: Function to handle matches
179
- before: Insert rule before this existing rule
180
"""
181
```
182
183
## Usage Examples
184
185
### Custom Block Rule
186
187
Adding a custom block-level element:
188
189
```python
190
from mistune import create_markdown, BlockParser
191
import re
192
193
def custom_block_plugin(md):
194
"""Add support for custom block syntax: :::type content :::"""
195
196
def parse_custom_block(block, m, state):
197
block_type = m.group(1)
198
content = m.group(2).strip()
199
200
# Parse content as nested blocks
201
child = state.child_state(content)
202
block.parse(content, child)
203
204
return {
205
'type': 'custom_block',
206
'attrs': {'block_type': block_type},
207
'children': child.tokens
208
}
209
210
# Register rule with block parser
211
md.block.register(
212
'custom_block',
213
r'^:::(\w+)\n(.*?)\n:::$',
214
parse_custom_block
215
)
216
217
# Add renderer method
218
def render_custom_block(text, block_type):
219
return f'<div class="custom-{block_type}">{text}</div>\n'
220
221
md.renderer.register('custom_block', render_custom_block)
222
223
# Use custom plugin
224
md = create_markdown()
225
md.use(custom_block_plugin)
226
227
result = md("""
228
:::warning
229
This is a **warning** block.
230
:::
231
""")
232
```
233
234
### Custom Inline Rule
235
236
Adding a custom inline element:
237
238
```python
239
from mistune import create_markdown
240
import re
241
242
def emoji_plugin(md):
243
"""Add support for emoji syntax: :emoji_name:"""
244
245
def parse_emoji(inline, m, state):
246
emoji_name = m.group(1)
247
return 'emoji', emoji_name
248
249
# Register with inline parser
250
md.inline.register('emoji', r':(\w+):', parse_emoji)
251
252
# Add renderer method
253
def render_emoji(emoji_name):
254
emoji_map = {
255
'smile': 'π',
256
'heart': 'β€οΈ',
257
'thumbsup': 'π'
258
}
259
return emoji_map.get(emoji_name, f':{emoji_name}:')
260
261
md.renderer.register('emoji', render_emoji)
262
263
# Use emoji plugin
264
md = create_markdown()
265
md.use(emoji_plugin)
266
267
result = md('Hello :smile: world :heart:!')
268
# Output: Hello π world β€οΈ!
269
```
270
271
### State Access and Analysis
272
273
Accessing parsing state for analysis:
274
275
```python
276
from mistune import create_markdown
277
278
md = create_markdown()
279
280
# Parse with state access
281
text = """
282
# Heading 1
283
284
This is a paragraph with **bold** text.
285
286
## Heading 2
287
288
- List item 1
289
- List item 2
290
"""
291
292
output, state = md.parse(text)
293
294
# Analyze tokens
295
def analyze_tokens(tokens, level=0):
296
indent = " " * level
297
for token in tokens:
298
print(f"{indent}Token: {token['type']}")
299
if 'attrs' in token:
300
print(f"{indent} Attrs: {token['attrs']}")
301
if 'children' in token:
302
analyze_tokens(token['children'], level + 1)
303
304
analyze_tokens(state.tokens)
305
306
# Access environment data
307
print(f"Environment: {state.env}")
308
```
309
310
### Parser Customization
311
312
Customizing parser behavior:
313
314
```python
315
from mistune import BlockParser, InlineParser, Markdown, HTMLRenderer
316
317
# Create custom parsers
318
block = BlockParser()
319
inline = InlineParser(hard_wrap=True) # Convert line breaks to <br>
320
321
# Remove specific rules by modifying rules list
322
block.rules.remove('block_quote') # Disable blockquotes
323
inline.rules.remove('emphasis') # Disable italic text
324
325
# Create parser with custom components
326
renderer = HTMLRenderer(escape=False)
327
md = Markdown(renderer=renderer, block=block, inline=inline)
328
329
result = md('This is *not italic*\nThis is a line break.')
330
```
331
332
## Token Structure
333
334
Understanding the token format for custom processing:
335
336
```python
337
# Block token structure
338
block_token = {
339
'type': 'heading', # Token type
340
'attrs': {'level': 1}, # Element attributes
341
'children': [ # Child tokens (for container elements)
342
{
343
'type': 'text',
344
'raw': 'Heading Text'
345
}
346
]
347
}
348
349
# Inline token structure
350
inline_token = {
351
'type': 'strong', # Token type
352
'children': [ # Child tokens
353
{
354
'type': 'text',
355
'raw': 'Bold Text'
356
}
357
]
358
}
359
360
# Leaf token structure
361
text_token = {
362
'type': 'text', # Token type
363
'raw': 'Plain text content' # Raw text content
364
}
365
```
366
367
This parsing architecture provides the flexibility to extend mistune with custom syntax while maintaining high performance through optimized parsing algorithms and clear separation between block and inline processing stages.