or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

index.mdlexical-analysis.mdsyntax-parsing.md

syntax-parsing.mddocs/

0

# Syntax Parsing

1

2

The `ply.yacc` module provides LALR(1) parsing capabilities, converting token streams into structured data using grammar rules defined in function docstrings. It supports precedence rules, error recovery, parser generation optimization, and comprehensive debugging.

3

4

## Capabilities

5

6

### Parser Creation

7

8

Creates a parser instance by analyzing grammar rules defined in the calling module. Uses the LALR(1) algorithm to build parsing tables and validate the grammar specification.

9

10

```python { .api }

11

def yacc(*, debug=False, module=None, start=None, check_recursion=True, optimize=False, debugfile='parser.out', debuglog=None, errorlog=None):

12

"""

13

Build a parser from grammar rules.

14

15

Parameters:

16

- debug: Enable debug mode (default: False)

17

- module: Module containing grammar rules (default: calling module)

18

- start: Start symbol for grammar (default: first rule)

19

- check_recursion: Check for infinite recursion (default: True)

20

- optimize: Enable parser optimization (default: False)

21

- debugfile: Debug output filename (default: 'parser.out')

22

- debuglog: Logger for debug output

23

- errorlog: Logger for error messages

24

25

Returns:

26

LRParser instance

27

"""

28

29

def format_result(r):

30

"""

31

Format result message for debug mode.

32

33

Parameters:

34

- r: Result value to format

35

36

Returns:

37

Formatted string representation

38

"""

39

40

def format_stack_entry(r):

41

"""

42

Format stack entry for debug mode.

43

44

Parameters:

45

- r: Stack entry to format

46

47

Returns:

48

Formatted string representation

49

"""

50

```

51

52

### LALR(1) Parser

53

54

Main parser class implementing the LALR(1) parsing algorithm with support for error recovery and debugging.

55

56

```python { .api }

57

class LRParser:

58

def parse(self, input=None, lexer=None, debug=False, tracking=False):

59

"""

60

Parse input using the built grammar.

61

62

Parameters:

63

- input: Input string to parse (optional if lexer provided)

64

- lexer: Lexer instance for tokenization

65

- debug: Enable parse debugging

66

- tracking: Enable position tracking for line/column info

67

68

Returns:

69

Parse result (value of start symbol)

70

"""

71

72

def errok(self):

73

"""

74

Clear the parser error state.

75

Used in error recovery to continue parsing.

76

"""

77

78

def restart(self):

79

"""

80

Restart parsing from the beginning.

81

Clears all parser state and positions.

82

"""

83

84

def set_defaulted_states(self):

85

"""

86

Set defaulted states for optimized parsing.

87

Used internally for parser optimization.

88

"""

89

90

def disable_defaulted_states(self):

91

"""

92

Disable defaulted states.

93

Used internally for parser optimization control.

94

"""

95

```

96

97

### Production Rule Representation

98

99

Represents a grammar production rule and provides access to symbol attributes within grammar rule functions, including line numbers and lexer positions. The `p` parameter in grammar rules is a `YaccProduction` instance.

100

101

```python { .api }

102

class YaccProduction:

103

"""

104

Represents a grammar production rule.

105

Used in grammar rule functions to access symbols and their attributes.

106

"""

107

108

def __getitem__(self, n):

109

"""

110

Get symbol value by index.

111

112

Parameters:

113

- n: Symbol index (0 = left-hand side, 1+ = right-hand side)

114

115

Returns:

116

Symbol value

117

"""

118

119

def __setitem__(self, n, v):

120

"""

121

Set symbol value by index.

122

123

Parameters:

124

- n: Symbol index (0 = left-hand side, 1+ = right-hand side)

125

- v: Value to set

126

"""

127

128

def __len__(self):

129

"""

130

Get number of symbols in production.

131

132

Returns:

133

Number of symbols (including left-hand side)

134

"""

135

136

def lineno(self, n):

137

"""

138

Get line number for symbol n in grammar rule.

139

140

Parameters:

141

- n: Symbol index (0 = left-hand side, 1+ = right-hand side)

142

143

Returns:

144

Line number or None

145

"""

146

147

def set_lineno(self, n, lineno):

148

"""

149

Set line number for symbol n.

150

151

Parameters:

152

- n: Symbol index

153

- lineno: Line number to set

154

"""

155

156

def linespan(self, n):

157

"""

158

Get line number span for symbol n.

159

160

Parameters:

161

- n: Symbol index

162

163

Returns:

164

Tuple of (start_line, end_line) or None

165

"""

166

167

def lexpos(self, n):

168

"""

169

Get lexer position for symbol n.

170

171

Parameters:

172

- n: Symbol index

173

174

Returns:

175

Character position or None

176

"""

177

178

def set_lexpos(self, n, lexpos):

179

"""

180

Set lexer position for symbol n.

181

182

Parameters:

183

- n: Symbol index

184

- lexpos: Character position to set

185

"""

186

187

def lexspan(self, n):

188

"""

189

Get lexer position span for symbol n.

190

191

Parameters:

192

- n: Symbol index

193

194

Returns:

195

Tuple of (start_pos, end_pos) or None

196

"""

197

198

def error(self):

199

"""

200

Signal a syntax error.

201

Triggers error recovery mechanisms.

202

"""

203

204

# Public attributes

205

slice: list # List of symbols in the production

206

stack: list # Parser stack reference

207

lexer: object # Lexer instance reference

208

parser: object # Parser instance reference

209

```

210

211

### Internal Parser Symbol

212

213

Internal representation of parser symbols during parsing.

214

215

```python { .api }

216

class YaccSymbol:

217

"""

218

Internal parser symbol representation.

219

Used internally by the parser during parsing operations.

220

"""

221

```

222

223

### Parser Error Handling

224

225

Exception hierarchy for different types of parsing errors.

226

227

```python { .api }

228

class YaccError(Exception):

229

"""Base exception for parser errors."""

230

231

class GrammarError(YaccError):

232

"""

233

Exception for grammar specification errors.

234

Raised when grammar rules are invalid or conflicting.

235

"""

236

237

class LALRError(YaccError):

238

"""

239

Exception for LALR parsing algorithm errors.

240

Raised when the grammar is not LALR(1) parseable.

241

"""

242

```

243

244

### Logging Utilities

245

246

Logging classes for parser construction and operation debugging.

247

248

```python { .api }

249

class PlyLogger:

250

"""

251

Logging utility for PLY operations.

252

Provides structured logging for parser construction and operation.

253

"""

254

255

class NullLogger:

256

"""

257

Null logging implementation.

258

Used when logging is disabled.

259

"""

260

```

261

262

## Grammar Rule Conventions

263

264

### Basic Grammar Rules

265

266

Define grammar rules using functions with `p_` prefix and BNF in docstrings:

267

268

```python

269

def p_expression_binop(p):

270

'''expression : expression PLUS term

271

| expression MINUS term'''

272

if p[2] == '+':

273

p[0] = p[1] + p[3]

274

elif p[2] == '-':

275

p[0] = p[1] - p[3]

276

277

def p_expression_term(p):

278

'''expression : term'''

279

p[0] = p[1]

280

281

def p_term_factor(p):

282

'''term : factor'''

283

p[0] = p[1]

284

```

285

286

### Symbol Access

287

288

Access symbols in grammar rules through the `p` parameter:

289

290

```python

291

def p_assignment(p):

292

'''assignment : ID EQUALS expression'''

293

# p[0] = result (left-hand side)

294

# p[1] = ID token

295

# p[2] = EQUALS token

296

# p[3] = expression value

297

symbol_table[p[1]] = p[3]

298

p[0] = p[3]

299

```

300

301

### Precedence Rules

302

303

Define operator precedence and associativity:

304

305

```python

306

precedence = (

307

('left', 'PLUS', 'MINUS'),

308

('left', 'TIMES', 'DIVIDE'),

309

('right', 'UMINUS'), # Unary minus

310

)

311

312

def p_expression_uminus(p):

313

'''expression : MINUS expression %prec UMINUS'''

314

p[0] = -p[2]

315

```

316

317

### Error Recovery

318

319

Handle syntax errors with error productions and recovery:

320

321

```python

322

def p_error(p):

323

if p:

324

print(f"Syntax error at token {p.type} (line {p.lineno})")

325

else:

326

print("Syntax error at EOF")

327

328

def p_statement_error(p):

329

'''statement : error SEMICOLON'''

330

print("Syntax error in statement. Skipping to next semicolon.")

331

p[0] = None

332

```

333

334

## Parser Configuration

335

336

### Global Configuration Variables

337

338

Module-level configuration constants:

339

340

```python { .api }

341

yaccdebug = False # Global debug mode flag

342

debug_file = 'parser.out' # Default debug output filename

343

error_count = 3 # Number of error recovery symbols

344

resultlimit = 40 # Debug result display size limit

345

MAXINT = sys.maxsize # Maximum integer value

346

```

347

348

### Start Symbol

349

350

The parser automatically uses the first grammar rule as the start symbol, or you can specify it explicitly:

351

352

```python

353

# Automatic start symbol (first rule)

354

def p_program(p):

355

'''program : statement_list'''

356

p[0] = p[1]

357

358

# Or specify explicitly in yacc() call

359

parser = yacc.yacc(start='program')

360

```

361

362

## Error Recovery Mechanisms

363

364

The parser provides several error recovery strategies:

365

366

1. **Error productions**: Grammar rules with `error` token for local recovery

367

2. **Global error handler**: `p_error()` function for unhandled syntax errors

368

3. **Error state management**: `errok()` method to clear error state

369

4. **Token synchronization**: Skip tokens until synchronization point

370

5. **Parser restart**: `restart()` method for complete recovery

371

372

## Position Tracking

373

374

Track source position information through tokens and productions using the `YaccProduction` parameter:

375

376

```python

377

def p_assignment(p):

378

'''assignment : ID EQUALS expression'''

379

# p is a YaccProduction instance - access position information

380

id_line = p.lineno(1) # Line number of ID

381

id_pos = p.lexpos(1) # Character position of ID

382

span = p.linespan(1) # Line span of ID

383

384

# Set position for result

385

p.set_lineno(0, id_line)

386

p[0] = AST.Assignment(p[1], p[3], line=id_line)

387

```

388

389

## Global Variables

390

391

When `yacc()` is called, it sets a global variable:

392

393

- `parse`: Global parse function bound to the created parser

394

395

This allows for simplified usage: `result = parse(input, lexer=lexer)`

396

397

## Configuration Constants

398

399

```python { .api }

400

yaccdebug = False # Global debug mode flag

401

debug_file = 'parser.out' # Default debug output filename

402

error_count = 3 # Number of error recovery symbols

403

resultlimit = 40 # Debug result display size limit

404

MAXINT = sys.maxsize # Maximum integer value

405

```