or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

common-expressions.mdcore-elements.mdenhancement.mdexceptions.mdhelpers.mdindex.mdtesting-debugging.md

helpers.mddocs/

0

# Helper Functions and Utilities

1

2

High-level helper functions for common parsing patterns. These utilities simplify the creation of complex parsers by providing pre-built patterns for frequently encountered parsing scenarios like delimited lists, nested expressions, and markup parsing.

3

4

**Required imports for type annotations:**

5

6

```python

7

from typing import Union, Optional, Iterable, Callable

8

from pyparsing import ParserElement, ParseExpression, ParseResults

9

```

10

11

## Capabilities

12

13

### List and Array Parsing

14

15

Functions for parsing various list and array structures.

16

17

```python { .api }

18

def delimited_list(expr: ParserElement,

19

delim: str = ",",

20

combine: bool = False) -> ParserElement:

21

"""Create parser for delimited lists."""

22

23

class DelimitedList(ParseExpression):

24

"""Parse delimited lists with customizable delimiters."""

25

26

def __init__(self,

27

expr: ParserElement,

28

delim: str = ",",

29

combine: bool = False): ...

30

```

31

32

```python { .api }

33

def counted_array(expr: ParserElement,

34

int_expr: ParserElement = None) -> ParserElement:

35

"""Create parser for counted arrays (count followed by elements)."""

36

```

37

38

**Usage examples:**

39

```python

40

# Parse comma-separated values

41

csv_row = delimited_list(Word(alphanums))

42

# Matches: "apple,banana,cherry" -> ['apple', 'banana', 'cherry']

43

44

# Parse counted array

45

items = counted_array(Word(alphas))

46

# Matches: "3 red green blue" -> ['red', 'green', 'blue']

47

48

# Custom delimiter

49

pipe_list = delimited_list(Word(alphas), delim="|")

50

# Matches: "one|two|three" -> ['one', 'two', 'three']

51

```

52

53

### String Choice and Alternatives

54

55

Functions for creating choice expressions from strings.

56

57

```python { .api }

58

def one_of(strs: Union[Iterable[str], str],

59

caseless: bool = False,

60

use_regex: bool = True,

61

as_keyword: bool = False,

62

*,

63

# Backward compatibility parameters

64

useRegex: bool = True,

65

asKeyword: bool = False) -> ParserElement:

66

"""Create MatchFirst expression from string of alternatives."""

67

```

68

69

**Usage examples:**

70

```python

71

# Simple string alternatives

72

boolean = one_of("true false")

73

# Matches either "true" or "false"

74

75

# Case-insensitive matching

76

direction = one_of("North South East West", caseless=True)

77

# Matches "north", "SOUTH", "East", etc.

78

79

# Keyword matching (with word boundaries)

80

operator = one_of("and or not", asKeyword=True)

81

# Matches "and" but not "band"

82

```

83

84

### Nested Expression Parsing

85

86

Functions for parsing nested structures with delimiters.

87

88

```python { .api }

89

def nested_expr(opener: str = "(",

90

closer: str = ")",

91

content: ParserElement = None,

92

ignoreExpr: ParserElement = None) -> ParserElement:

93

"""Create parser for nested expressions with delimiters."""

94

```

95

96

**Usage examples:**

97

```python

98

# Parse nested parentheses

99

nested_parens = nested_expr("(", ")")

100

# Matches: "(a (b c) d)" -> [['a', ['b', 'c'], 'd']]

101

102

# Parse nested brackets with specific content

103

bracket_list = nested_expr("[", "]", content=delimited_list(Word(alphas)))

104

# Matches: "[apple, [banana, cherry], date]"

105

106

# Parse nested braces ignoring comments

107

code_block = nested_expr("{", "}", ignoreExpr=c_style_comment)

108

```

109

110

### HTML/XML Parsing Utilities

111

112

Functions for parsing markup languages.

113

114

```python { .api }

115

def make_html_tags(tagStr: str) -> tuple:

116

"""Create opening and closing HTML tag parsers."""

117

118

def make_xml_tags(tagStr: str) -> tuple:

119

"""Create opening and closing XML tag parsers."""

120

```

121

122

```python { .api }

123

def replace_html_entity(tokens: ParseResults) -> str:

124

"""Replace HTML entities with their character equivalents."""

125

```

126

127

**Usage examples:**

128

```python

129

# Create HTML tag parsers

130

div_start, div_end = make_html_tags("div")

131

div_content = div_start + SkipTo(div_end) + div_end

132

133

# Parse XML with attributes

134

para_start, para_end = make_xml_tags("para")

135

para_with_attrs = para_start + SkipTo(para_end) + para_end

136

137

# Handle HTML entities

138

entity_parser = common_html_entity.set_parse_action(replace_html_entity)

139

```

140

141

### Dictionary and Key-Value Parsing

142

143

Functions for parsing dictionary-like structures.

144

145

```python { .api }

146

def dict_of(key: ParserElement, value: ParserElement) -> ParserElement:

147

"""Create parser for dictionary-like structures."""

148

```

149

150

**Usage examples:**

151

```python

152

# Parse key-value pairs

153

config_item = dict_of(Word(alphas), QuotedString('"'))

154

# Matches: 'name "John"' -> {'name': 'John'}

155

156

# Parse multiple key-value pairs

157

config_dict = Dict(OneOrMore(config_item))

158

```

159

160

### Infix Notation Parsing

161

162

Function for parsing infix mathematical and logical expressions.

163

164

```python { .api }

165

def infix_notation(baseExpr: ParserElement,

166

opList: list,

167

lpar: str = "(",

168

rpar: str = ")") -> ParserElement:

169

"""Create parser for infix notation expressions."""

170

171

class OpAssoc:

172

"""Enumeration for operator associativity."""

173

LEFT = object()

174

RIGHT = object()

175

NONE = object()

176

```

177

178

**Usage example:**

179

```python

180

# Parse arithmetic expressions

181

number = Word(nums)

182

arith_expr = infix_notation(number, [

183

('+', 2, OpAssoc.LEFT), # Addition, precedence 2, left associative

184

('-', 2, OpAssoc.LEFT), # Subtraction

185

('*', 3, OpAssoc.LEFT), # Multiplication, precedence 3

186

('/', 3, OpAssoc.LEFT), # Division

187

('^', 4, OpAssoc.RIGHT), # Exponentiation, right associative

188

])

189

# Parses: "2 + 3 * 4" -> [[2, '+', [3, '*', 4]]]

190

```

191

192

### Previous Match Functions

193

194

Functions for matching previously parsed content.

195

196

```python { .api }

197

def match_previous_literal(expr: ParserElement) -> ParserElement:

198

"""Create parser that matches a previously parsed literal."""

199

200

def match_previous_expr(expr: ParserElement) -> ParserElement:

201

"""Create parser that matches a previously parsed expression."""

202

```

203

204

**Usage examples:**

205

```python

206

# Match repeated literals

207

first_word = Word(alphas)

208

repeat_word = match_previous_literal(first_word)

209

pattern = first_word + ":" + repeat_word

210

# Matches: "hello:hello" but not "hello:world"

211

212

# Match repeated expressions

213

tag_name = Word(alphas)

214

open_tag = "<" + tag_name + ">"

215

close_tag = "</" + match_previous_expr(tag_name) + ">"

216

xml_element = open_tag + SkipTo(close_tag) + close_tag

217

```

218

219

### Text Transformation Utilities

220

221

Functions for transforming parsed text.

222

223

```python { .api }

224

def original_text_for(expr: ParserElement, asString: bool = True) -> ParserElement:

225

"""Return original text instead of parsed tokens."""

226

227

def ungroup(expr: ParserElement) -> ParserElement:

228

"""Remove grouping from expression results."""

229

```

230

231

**Usage examples:**

232

```python

233

# Get original text of complex expression

234

date_pattern = Word(nums) + "/" + Word(nums) + "/" + Word(nums)

235

date_text = original_text_for(date_pattern)

236

# Returns "12/25/2023" instead of ['12', '/', '25', '/', '2023']

237

238

# Remove unwanted grouping

239

grouped_items = Group(Word(alphas) + Word(nums))

240

flat_items = ungroup(grouped_items)

241

```

242

243

### Action Creation Functions

244

245

Functions for creating parse actions.

246

247

```python { .api }

248

def replace_with(replStr: str) -> callable:

249

"""Create parse action that replaces tokens with specified string."""

250

251

def remove_quotes(s: str, loc: int, tokens: ParseResults) -> str:

252

"""Parse action to remove surrounding quotes."""

253

254

def with_attribute(**attrDict) -> callable:

255

"""Create parse action for matching HTML/XML attributes."""

256

257

def with_class(classname: str) -> callable:

258

"""Create parse action for matching HTML class attributes."""

259

```

260

261

**Usage examples:**

262

```python

263

# Replace matched tokens

264

placeholder = Literal("TBD").set_parse_action(replace_with("To Be Determined"))

265

266

# Remove quotes from strings

267

quoted_string = QuotedString('"').set_parse_action(remove_quotes)

268

269

# Match HTML elements with specific attributes

270

div_with_id = any_open_tag.set_parse_action(with_attribute(id="main"))

271

272

# Match elements with CSS class

273

highlighted = any_open_tag.set_parse_action(with_class("highlight"))

274

```

275

276

### Built-in Helper Expressions

277

278

Pre-built parser expressions for common patterns.

279

280

```python { .api }

281

# Comment parsers

282

c_style_comment: ParserElement # /* comment */

283

html_comment: ParserElement # <!-- comment -->

284

rest_of_line: ParserElement # Everything to end of line

285

dbl_slash_comment: ParserElement # // comment

286

cpp_style_comment: ParserElement # C++ style comments

287

java_style_comment: ParserElement # Java style comments

288

python_style_comment: ParserElement # # comment

289

290

# HTML/XML parsers

291

any_open_tag: ParserElement # Any opening HTML/XML tag

292

any_close_tag: ParserElement # Any closing HTML/XML tag

293

common_html_entity: ParserElement # Common HTML entities (&amp;, &lt;, etc.)

294

295

# String parsers

296

dbl_quoted_string: ParserElement # "double quoted string"

297

sgl_quoted_string: ParserElement # 'single quoted string'

298

quoted_string: ParserElement # Either single or double quoted

299

unicode_string: ParserElement # Unicode string literals

300

```

301

302

### Advanced Parsing Utilities

303

304

Specialized utilities for complex parsing scenarios.

305

306

```python { .api }

307

def condition_as_parse_action(condition: callable,

308

message: str = "failed user-defined condition") -> callable:

309

"""Convert boolean condition to parse action."""

310

311

def token_map(func: callable, *args) -> callable:

312

"""Create parse action that maps function over tokens."""

313

314

def autoname_elements() -> None:

315

"""Automatically assign names to parser elements for debugging."""

316

```

317

318

**Usage examples:**

319

```python

320

# Conditional parsing

321

positive_int = Word(nums).set_parse_action(

322

condition_as_parse_action(lambda t: int(t[0]) > 0, "must be positive")

323

)

324

325

# Transform all tokens

326

uppercase_words = OneOrMore(Word(alphas)).set_parse_action(token_map(str.upper))

327

328

# Enable automatic naming for debugging

329

autoname_elements()

330

parser = Word(alphas) + Word(nums) # Elements get auto-named

331

```

332

333

### Additional Utility Functions

334

335

Specialized utility functions for advanced parsing scenarios.

336

337

```python { .api }

338

def col(loc: int, strg: str) -> int:

339

"""Return column number of location in string."""

340

341

def line(loc: int, strg: str) -> int:

342

"""Return line number of location in string."""

343

344

def lineno(loc: int, strg: str) -> int:

345

"""Return line number of location in string."""

346

347

def match_only_at_col(n: int) -> ParserElement:

348

"""Match only at specified column."""

349

350

def srange(s: str) -> str:

351

"""Expand character range expression."""

352

```

353

354

**Usage examples:**

355

```python

356

# Column-specific matching

357

indent = match_only_at_col(1) # Match only at column 1

358

code_line = indent + rest_of_line

359

360

# Character range expansion

361

vowels = srange("[aeiouAEIOU]") # Expands to "aeiouAEIOU"

362

consonants = srange("[b-df-hj-np-tv-zB-DF-HJ-NP-TV-Z]")

363

364

# Position utilities (used in parse actions)

365

def report_position(s, loc, tokens):

366

print(f"Found at line {lineno(loc, s)}, column {col(loc, s)}")

367

return tokens

368

369

parser = Word(alphas).set_parse_action(report_position)

370

```