0
# Helper Functions and Utilities
1
2
High-level helper functions for common parsing patterns. These utilities simplify the creation of complex parsers by providing pre-built patterns for frequently encountered parsing scenarios like delimited lists, nested expressions, and markup parsing.
3
4
**Required imports for type annotations:**
5
6
```python
7
from typing import Union, Optional, Iterable, Callable
8
from pyparsing import ParserElement, ParseExpression, ParseResults
9
```
10
11
## Capabilities
12
13
### List and Array Parsing
14
15
Functions for parsing various list and array structures.
16
17
```python { .api }
18
def delimited_list(expr: ParserElement,
19
delim: str = ",",
20
combine: bool = False) -> ParserElement:
21
"""Create parser for delimited lists."""
22
23
class DelimitedList(ParseExpression):
24
"""Parse delimited lists with customizable delimiters."""
25
26
def __init__(self,
27
expr: ParserElement,
28
delim: str = ",",
29
combine: bool = False): ...
30
```
31
32
```python { .api }
33
def counted_array(expr: ParserElement,
34
int_expr: ParserElement = None) -> ParserElement:
35
"""Create parser for counted arrays (count followed by elements)."""
36
```
37
38
**Usage examples:**
39
```python
40
# Parse comma-separated values
41
csv_row = delimited_list(Word(alphanums))
42
# Matches: "apple,banana,cherry" -> ['apple', 'banana', 'cherry']
43
44
# Parse counted array
45
items = counted_array(Word(alphas))
46
# Matches: "3 red green blue" -> ['red', 'green', 'blue']
47
48
# Custom delimiter
49
pipe_list = delimited_list(Word(alphas), delim="|")
50
# Matches: "one|two|three" -> ['one', 'two', 'three']
51
```
52
53
### String Choice and Alternatives
54
55
Functions for creating choice expressions from strings.
56
57
```python { .api }
58
def one_of(strs: Union[Iterable[str], str],
59
caseless: bool = False,
60
use_regex: bool = True,
61
as_keyword: bool = False,
62
*,
63
# Backward compatibility parameters
64
useRegex: bool = True,
65
asKeyword: bool = False) -> ParserElement:
66
"""Create MatchFirst expression from string of alternatives."""
67
```
68
69
**Usage examples:**
70
```python
71
# Simple string alternatives
72
boolean = one_of("true false")
73
# Matches either "true" or "false"
74
75
# Case-insensitive matching
76
direction = one_of("North South East West", caseless=True)
77
# Matches "north", "SOUTH", "East", etc.
78
79
# Keyword matching (with word boundaries)
80
operator = one_of("and or not", asKeyword=True)
81
# Matches "and" but not "band"
82
```
83
84
### Nested Expression Parsing
85
86
Functions for parsing nested structures with delimiters.
87
88
```python { .api }
89
def nested_expr(opener: str = "(",
90
closer: str = ")",
91
content: ParserElement = None,
92
ignoreExpr: ParserElement = None) -> ParserElement:
93
"""Create parser for nested expressions with delimiters."""
94
```
95
96
**Usage examples:**
97
```python
98
# Parse nested parentheses
99
nested_parens = nested_expr("(", ")")
100
# Matches: "(a (b c) d)" -> [['a', ['b', 'c'], 'd']]
101
102
# Parse nested brackets with specific content
103
bracket_list = nested_expr("[", "]", content=delimited_list(Word(alphas)))
104
# Matches: "[apple, [banana, cherry], date]"
105
106
# Parse nested braces ignoring comments
107
code_block = nested_expr("{", "}", ignoreExpr=c_style_comment)
108
```
109
110
### HTML/XML Parsing Utilities
111
112
Functions for parsing markup languages.
113
114
```python { .api }
115
def make_html_tags(tagStr: str) -> tuple:
116
"""Create opening and closing HTML tag parsers."""
117
118
def make_xml_tags(tagStr: str) -> tuple:
119
"""Create opening and closing XML tag parsers."""
120
```
121
122
```python { .api }
123
def replace_html_entity(tokens: ParseResults) -> str:
124
"""Replace HTML entities with their character equivalents."""
125
```
126
127
**Usage examples:**
128
```python
129
# Create HTML tag parsers
130
div_start, div_end = make_html_tags("div")
131
div_content = div_start + SkipTo(div_end) + div_end
132
133
# Parse XML with attributes
134
para_start, para_end = make_xml_tags("para")
135
para_with_attrs = para_start + SkipTo(para_end) + para_end
136
137
# Handle HTML entities
138
entity_parser = common_html_entity.set_parse_action(replace_html_entity)
139
```
140
141
### Dictionary and Key-Value Parsing
142
143
Functions for parsing dictionary-like structures.
144
145
```python { .api }
146
def dict_of(key: ParserElement, value: ParserElement) -> ParserElement:
147
"""Create parser for dictionary-like structures."""
148
```
149
150
**Usage examples:**
151
```python
152
# Parse key-value pairs
153
config_item = dict_of(Word(alphas), QuotedString('"'))
154
# Matches: 'name "John"' -> {'name': 'John'}
155
156
# Parse multiple key-value pairs
157
config_dict = Dict(OneOrMore(config_item))
158
```
159
160
### Infix Notation Parsing
161
162
Function for parsing infix mathematical and logical expressions.
163
164
```python { .api }
165
def infix_notation(baseExpr: ParserElement,
166
opList: list,
167
lpar: str = "(",
168
rpar: str = ")") -> ParserElement:
169
"""Create parser for infix notation expressions."""
170
171
class OpAssoc:
172
"""Enumeration for operator associativity."""
173
LEFT = object()
174
RIGHT = object()
175
NONE = object()
176
```
177
178
**Usage example:**
179
```python
180
# Parse arithmetic expressions
181
number = Word(nums)
182
arith_expr = infix_notation(number, [
183
('+', 2, OpAssoc.LEFT), # Addition, precedence 2, left associative
184
('-', 2, OpAssoc.LEFT), # Subtraction
185
('*', 3, OpAssoc.LEFT), # Multiplication, precedence 3
186
('/', 3, OpAssoc.LEFT), # Division
187
('^', 4, OpAssoc.RIGHT), # Exponentiation, right associative
188
])
189
# Parses: "2 + 3 * 4" -> [[2, '+', [3, '*', 4]]]
190
```
191
192
### Previous Match Functions
193
194
Functions for matching previously parsed content.
195
196
```python { .api }
197
def match_previous_literal(expr: ParserElement) -> ParserElement:
198
"""Create parser that matches a previously parsed literal."""
199
200
def match_previous_expr(expr: ParserElement) -> ParserElement:
201
"""Create parser that matches a previously parsed expression."""
202
```
203
204
**Usage examples:**
205
```python
206
# Match repeated literals
207
first_word = Word(alphas)
208
repeat_word = match_previous_literal(first_word)
209
pattern = first_word + ":" + repeat_word
210
# Matches: "hello:hello" but not "hello:world"
211
212
# Match repeated expressions
213
tag_name = Word(alphas)
214
open_tag = "<" + tag_name + ">"
215
close_tag = "</" + match_previous_expr(tag_name) + ">"
216
xml_element = open_tag + SkipTo(close_tag) + close_tag
217
```
218
219
### Text Transformation Utilities
220
221
Functions for transforming parsed text.
222
223
```python { .api }
224
def original_text_for(expr: ParserElement, asString: bool = True) -> ParserElement:
225
"""Return original text instead of parsed tokens."""
226
227
def ungroup(expr: ParserElement) -> ParserElement:
228
"""Remove grouping from expression results."""
229
```
230
231
**Usage examples:**
232
```python
233
# Get original text of complex expression
234
date_pattern = Word(nums) + "/" + Word(nums) + "/" + Word(nums)
235
date_text = original_text_for(date_pattern)
236
# Returns "12/25/2023" instead of ['12', '/', '25', '/', '2023']
237
238
# Remove unwanted grouping
239
grouped_items = Group(Word(alphas) + Word(nums))
240
flat_items = ungroup(grouped_items)
241
```
242
243
### Action Creation Functions
244
245
Functions for creating parse actions.
246
247
```python { .api }
248
def replace_with(replStr: str) -> callable:
249
"""Create parse action that replaces tokens with specified string."""
250
251
def remove_quotes(s: str, loc: int, tokens: ParseResults) -> str:
252
"""Parse action to remove surrounding quotes."""
253
254
def with_attribute(**attrDict) -> callable:
255
"""Create parse action for matching HTML/XML attributes."""
256
257
def with_class(classname: str) -> callable:
258
"""Create parse action for matching HTML class attributes."""
259
```
260
261
**Usage examples:**
262
```python
263
# Replace matched tokens
264
placeholder = Literal("TBD").set_parse_action(replace_with("To Be Determined"))
265
266
# Remove quotes from strings
267
quoted_string = QuotedString('"').set_parse_action(remove_quotes)
268
269
# Match HTML elements with specific attributes
270
div_with_id = any_open_tag.set_parse_action(with_attribute(id="main"))
271
272
# Match elements with CSS class
273
highlighted = any_open_tag.set_parse_action(with_class("highlight"))
274
```
275
276
### Built-in Helper Expressions
277
278
Pre-built parser expressions for common patterns.
279
280
```python { .api }
281
# Comment parsers
282
c_style_comment: ParserElement # /* comment */
283
html_comment: ParserElement # <!-- comment -->
284
rest_of_line: ParserElement # Everything to end of line
285
dbl_slash_comment: ParserElement # // comment
286
cpp_style_comment: ParserElement # C++ style comments
287
java_style_comment: ParserElement # Java style comments
288
python_style_comment: ParserElement # # comment
289
290
# HTML/XML parsers
291
any_open_tag: ParserElement # Any opening HTML/XML tag
292
any_close_tag: ParserElement # Any closing HTML/XML tag
293
common_html_entity: ParserElement # Common HTML entities (&, <, etc.)
294
295
# String parsers
296
dbl_quoted_string: ParserElement # "double quoted string"
297
sgl_quoted_string: ParserElement # 'single quoted string'
298
quoted_string: ParserElement # Either single or double quoted
299
unicode_string: ParserElement # Unicode string literals
300
```
301
302
### Advanced Parsing Utilities
303
304
Specialized utilities for complex parsing scenarios.
305
306
```python { .api }
307
def condition_as_parse_action(condition: callable,
308
message: str = "failed user-defined condition") -> callable:
309
"""Convert boolean condition to parse action."""
310
311
def token_map(func: callable, *args) -> callable:
312
"""Create parse action that maps function over tokens."""
313
314
def autoname_elements() -> None:
315
"""Automatically assign names to parser elements for debugging."""
316
```
317
318
**Usage examples:**
319
```python
320
# Conditional parsing
321
positive_int = Word(nums).set_parse_action(
322
condition_as_parse_action(lambda t: int(t[0]) > 0, "must be positive")
323
)
324
325
# Transform all tokens
326
uppercase_words = OneOrMore(Word(alphas)).set_parse_action(token_map(str.upper))
327
328
# Enable automatic naming for debugging
329
autoname_elements()
330
parser = Word(alphas) + Word(nums) # Elements get auto-named
331
```
332
333
### Additional Utility Functions
334
335
Specialized utility functions for advanced parsing scenarios.
336
337
```python { .api }
338
def col(loc: int, strg: str) -> int:
339
"""Return column number of location in string."""
340
341
def line(loc: int, strg: str) -> int:
342
"""Return line number of location in string."""
343
344
def lineno(loc: int, strg: str) -> int:
345
"""Return line number of location in string."""
346
347
def match_only_at_col(n: int) -> ParserElement:
348
"""Match only at specified column."""
349
350
def srange(s: str) -> str:
351
"""Expand character range expression."""
352
```
353
354
**Usage examples:**
355
```python
356
# Column-specific matching
357
indent = match_only_at_col(1) # Match only at column 1
358
code_line = indent + rest_of_line
359
360
# Character range expansion
361
vowels = srange("[aeiouAEIOU]") # Expands to "aeiouAEIOU"
362
consonants = srange("[b-df-hj-np-tv-zB-DF-HJ-NP-TV-Z]")
363
364
# Position utilities (used in parse actions)
365
def report_position(s, loc, tokens):
366
print(f"Found at line {lineno(loc, s)}, column {col(loc, s)}")
367
return tokens
368
369
parser = Word(alphas).set_parse_action(report_position)
370
```