0
# Parsing and AST Manipulation
1
2
Comprehensive parsing functionality with full control over the Abstract Syntax Tree (AST). The Parser class converts Markdown text into a tree of Node objects, which can then be manipulated programmatically.
3
4
## Capabilities
5
6
### Parser Class
7
8
Main parsing class that converts CommonMark Markdown text into an Abstract Syntax Tree representation.
9
10
```python { .api }
11
class Parser:
12
def __init__(self, options={}):
13
"""
14
Initialize a new Parser instance.
15
16
Args:
17
options (dict): Configuration options for parsing behavior
18
"""
19
20
def parse(self, my_input):
21
"""
22
Parse CommonMark text into an AST.
23
24
Args:
25
my_input (str): CommonMark Markdown text to parse
26
27
Returns:
28
Node: Root node of the parsed AST
29
"""
30
```
31
32
### Node Class
33
34
Represents individual nodes in the Abstract Syntax Tree with methods for tree manipulation and traversal.
35
36
```python { .api }
37
class Node:
38
def __init__(self, node_type, sourcepos):
39
"""
40
Create a new Node.
41
42
Args:
43
node_type (str): Type of the node (e.g., 'document', 'paragraph', 'text', 'heading')
44
sourcepos (SourcePos): Source position information [[start_line, start_col], [end_line, end_col]]
45
"""
46
47
def walker(self):
48
"""
49
Create a NodeWalker for traversing this node and its descendants.
50
51
Returns:
52
NodeWalker: Iterator for tree traversal
53
"""
54
55
def append_child(self, child):
56
"""
57
Append a child node to this node.
58
59
Args:
60
child (Node): Node to append as child
61
"""
62
63
def prepend_child(self, child):
64
"""
65
Prepend a child node to this node.
66
67
Args:
68
child (Node): Node to prepend as child
69
"""
70
71
def unlink(self):
72
"""
73
Remove this node from its parent, unlinking it from the tree.
74
"""
75
76
def insert_after(self, sibling):
77
"""
78
Insert this node after the specified sibling node.
79
80
Args:
81
sibling (Node): Node after which to insert this node
82
"""
83
84
def insert_before(self, sibling):
85
"""
86
Insert this node before the specified sibling node.
87
88
Args:
89
sibling (Node): Node before which to insert this node
90
"""
91
92
def pretty(self):
93
"""
94
Print pretty-printed representation of this node to stdout.
95
Uses pprint to display the node's internal dictionary structure.
96
97
Returns:
98
None: Prints to stdout rather than returning a value
99
"""
100
101
def normalize(self):
102
"""
103
Normalize the node by combining adjacent text nodes.
104
"""
105
106
def is_container(self):
107
"""
108
Check if this node can contain other nodes.
109
110
Returns:
111
bool: True if node can contain children, False otherwise
112
"""
113
```
114
115
### NodeWalker Class
116
117
Iterator for traversing AST nodes in document order, providing fine-grained control over tree traversal.
118
119
```python { .api }
120
class NodeWalker:
121
def __init__(self, root):
122
"""
123
Create a NodeWalker starting at the specified root node.
124
125
Args:
126
root (Node): Root node to start traversal from
127
"""
128
129
def nxt(self):
130
"""
131
Get the next node in the traversal.
132
133
Returns:
134
WalkEvent or None: Dictionary with 'node' (Node) and 'entering' (bool) keys,
135
or None if traversal is complete
136
"""
137
138
def resume_at(self, node, entering):
139
"""
140
Resume traversal at a specific node and entering state.
141
142
Args:
143
node (Node): Node to resume at
144
entering (bool): Whether we're entering or exiting the node
145
"""
146
```
147
148
## Usage Examples
149
150
### Basic Parsing
151
152
```python
153
from commonmark import Parser
154
155
parser = Parser()
156
markdown = """
157
# Hello World
158
159
This is a paragraph with **bold** text.
160
"""
161
162
ast = parser.parse(markdown)
163
ast.pretty() # Print AST structure to stdout
164
```
165
166
### AST Manipulation
167
168
```python
169
from commonmark import Parser
170
from commonmark.node import Node
171
172
parser = Parser()
173
ast = parser.parse("# Original Title")
174
175
# Create a new text node
176
new_text = Node('text', [[1, 1], [1, 9]])
177
new_text.literal = "New Title"
178
179
# Replace the title text
180
title_node = ast.first_child # Header node
181
old_text = title_node.first_child # Original text
182
title_node.append_child(new_text)
183
old_text.unlink()
184
```
185
186
### Tree Traversal
187
188
```python
189
from commonmark import Parser
190
191
parser = Parser()
192
ast = parser.parse("""
193
# Title
194
195
Some text with **bold** and *italic*.
196
""")
197
198
walker = ast.walker()
199
event = walker.nxt()
200
while event:
201
node, entering = event['node'], event['entering']
202
if entering:
203
print(f"Entering: {node.t}")
204
if hasattr(node, 'literal') and node.literal:
205
print(f" Content: '{node.literal}'")
206
event = walker.nxt()
207
```
208
209
## Types
210
211
```python { .api }
212
# Source position format: [[start_line, start_col], [end_line, end_col]]
213
SourcePos = list[list[int, int], list[int, int]]
214
215
# Node types
216
NodeType = str # 'document', 'paragraph', 'text', 'strong', 'emph', 'heading', etc.
217
218
# Walking event structure
219
WalkEvent = dict[str, Node | bool] # {'node': Node, 'entering': bool} or None
220
```
221
222
### Node Properties
223
224
Common node properties that can be accessed:
225
226
- `node.t`: Node type (e.g., 'document', 'paragraph', 'text', 'strong', 'emph', 'heading')
227
- `node.literal`: Text content for text nodes (str or None)
228
- `node.first_child`: First child node (Node or None)
229
- `node.last_child`: Last child node (Node or None)
230
- `node.parent`: Parent node (Node or None)
231
- `node.nxt`: Next sibling node (Node or None)
232
- `node.prv`: Previous sibling node (Node or None)
233
- `node.sourcepos`: Source position information [[start_line, start_col], [end_line, end_col]]
234
- `node.string_content`: String content for container nodes (str)
235
- `node.info`: Info string for code blocks (str or None)
236
- `node.destination`: URL for links and images (str or None)
237
- `node.title`: Title for links and images (str or None)
238
- `node.level`: Heading level 1-6 for heading nodes (int or None)
239
- `node.list_data`: List metadata dictionary for list and list item nodes (dict)