0
# Syntax Parsing
1
2
The `ply.yacc` module provides LALR(1) parsing capabilities, converting token streams into structured data using grammar rules defined in function docstrings. It supports precedence rules, error recovery, parser generation optimization, and comprehensive debugging.
3
4
## Capabilities
5
6
### Parser Creation
7
8
Creates a parser instance by analyzing grammar rules defined in the calling module. Uses the LALR(1) algorithm to build parsing tables and validate the grammar specification.
9
10
```python { .api }
11
def yacc(*, debug=False, module=None, start=None, check_recursion=True, optimize=False, debugfile='parser.out', debuglog=None, errorlog=None):
12
"""
13
Build a parser from grammar rules.
14
15
Parameters:
16
- debug: Enable debug mode (default: False)
17
- module: Module containing grammar rules (default: calling module)
18
- start: Start symbol for grammar (default: first rule)
19
- check_recursion: Check for infinite recursion (default: True)
20
- optimize: Enable parser optimization (default: False)
21
- debugfile: Debug output filename (default: 'parser.out')
22
- debuglog: Logger for debug output
23
- errorlog: Logger for error messages
24
25
Returns:
26
LRParser instance
27
"""
28
29
def format_result(r):
30
"""
31
Format result message for debug mode.
32
33
Parameters:
34
- r: Result value to format
35
36
Returns:
37
Formatted string representation
38
"""
39
40
def format_stack_entry(r):
41
"""
42
Format stack entry for debug mode.
43
44
Parameters:
45
- r: Stack entry to format
46
47
Returns:
48
Formatted string representation
49
"""
50
```
51
52
### LALR(1) Parser
53
54
Main parser class implementing the LALR(1) parsing algorithm with support for error recovery and debugging.
55
56
```python { .api }
57
class LRParser:
58
def parse(self, input=None, lexer=None, debug=False, tracking=False):
59
"""
60
Parse input using the built grammar.
61
62
Parameters:
63
- input: Input string to parse (optional if lexer provided)
64
- lexer: Lexer instance for tokenization
65
- debug: Enable parse debugging
66
- tracking: Enable position tracking for line/column info
67
68
Returns:
69
Parse result (value of start symbol)
70
"""
71
72
def errok(self):
73
"""
74
Clear the parser error state.
75
Used in error recovery to continue parsing.
76
"""
77
78
def restart(self):
79
"""
80
Restart parsing from the beginning.
81
Clears all parser state and positions.
82
"""
83
84
def set_defaulted_states(self):
85
"""
86
Set defaulted states for optimized parsing.
87
Used internally for parser optimization.
88
"""
89
90
def disable_defaulted_states(self):
91
"""
92
Disable defaulted states.
93
Used internally for parser optimization control.
94
"""
95
```
96
97
### Production Rule Representation
98
99
Represents a grammar production rule and provides access to symbol attributes within grammar rule functions, including line numbers and lexer positions. The `p` parameter in grammar rules is a `YaccProduction` instance.
100
101
```python { .api }
102
class YaccProduction:
103
"""
104
Represents a grammar production rule.
105
Used in grammar rule functions to access symbols and their attributes.
106
"""
107
108
def __getitem__(self, n):
109
"""
110
Get symbol value by index.
111
112
Parameters:
113
- n: Symbol index (0 = left-hand side, 1+ = right-hand side)
114
115
Returns:
116
Symbol value
117
"""
118
119
def __setitem__(self, n, v):
120
"""
121
Set symbol value by index.
122
123
Parameters:
124
- n: Symbol index (0 = left-hand side, 1+ = right-hand side)
125
- v: Value to set
126
"""
127
128
def __len__(self):
129
"""
130
Get number of symbols in production.
131
132
Returns:
133
Number of symbols (including left-hand side)
134
"""
135
136
def lineno(self, n):
137
"""
138
Get line number for symbol n in grammar rule.
139
140
Parameters:
141
- n: Symbol index (0 = left-hand side, 1+ = right-hand side)
142
143
Returns:
144
Line number or None
145
"""
146
147
def set_lineno(self, n, lineno):
148
"""
149
Set line number for symbol n.
150
151
Parameters:
152
- n: Symbol index
153
- lineno: Line number to set
154
"""
155
156
def linespan(self, n):
157
"""
158
Get line number span for symbol n.
159
160
Parameters:
161
- n: Symbol index
162
163
Returns:
164
Tuple of (start_line, end_line) or None
165
"""
166
167
def lexpos(self, n):
168
"""
169
Get lexer position for symbol n.
170
171
Parameters:
172
- n: Symbol index
173
174
Returns:
175
Character position or None
176
"""
177
178
def set_lexpos(self, n, lexpos):
179
"""
180
Set lexer position for symbol n.
181
182
Parameters:
183
- n: Symbol index
184
- lexpos: Character position to set
185
"""
186
187
def lexspan(self, n):
188
"""
189
Get lexer position span for symbol n.
190
191
Parameters:
192
- n: Symbol index
193
194
Returns:
195
Tuple of (start_pos, end_pos) or None
196
"""
197
198
def error(self):
199
"""
200
Signal a syntax error.
201
Triggers error recovery mechanisms.
202
"""
203
204
# Public attributes
205
slice: list # List of symbols in the production
206
stack: list # Parser stack reference
207
lexer: object # Lexer instance reference
208
parser: object # Parser instance reference
209
```
210
211
### Internal Parser Symbol
212
213
Internal representation of parser symbols during parsing.
214
215
```python { .api }
216
class YaccSymbol:
217
"""
218
Internal parser symbol representation.
219
Used internally by the parser during parsing operations.
220
"""
221
```
222
223
### Parser Error Handling
224
225
Exception hierarchy for different types of parsing errors.
226
227
```python { .api }
228
class YaccError(Exception):
229
"""Base exception for parser errors."""
230
231
class GrammarError(YaccError):
232
"""
233
Exception for grammar specification errors.
234
Raised when grammar rules are invalid or conflicting.
235
"""
236
237
class LALRError(YaccError):
238
"""
239
Exception for LALR parsing algorithm errors.
240
Raised when the grammar is not LALR(1) parseable.
241
"""
242
```
243
244
### Logging Utilities
245
246
Logging classes for parser construction and operation debugging.
247
248
```python { .api }
249
class PlyLogger:
250
"""
251
Logging utility for PLY operations.
252
Provides structured logging for parser construction and operation.
253
"""
254
255
class NullLogger:
256
"""
257
Null logging implementation.
258
Used when logging is disabled.
259
"""
260
```
261
262
## Grammar Rule Conventions
263
264
### Basic Grammar Rules
265
266
Define grammar rules using functions with `p_` prefix and BNF in docstrings:
267
268
```python
269
def p_expression_binop(p):
270
'''expression : expression PLUS term
271
| expression MINUS term'''
272
if p[2] == '+':
273
p[0] = p[1] + p[3]
274
elif p[2] == '-':
275
p[0] = p[1] - p[3]
276
277
def p_expression_term(p):
278
'''expression : term'''
279
p[0] = p[1]
280
281
def p_term_factor(p):
282
'''term : factor'''
283
p[0] = p[1]
284
```
285
286
### Symbol Access
287
288
Access symbols in grammar rules through the `p` parameter:
289
290
```python
291
def p_assignment(p):
292
'''assignment : ID EQUALS expression'''
293
# p[0] = result (left-hand side)
294
# p[1] = ID token
295
# p[2] = EQUALS token
296
# p[3] = expression value
297
symbol_table[p[1]] = p[3]
298
p[0] = p[3]
299
```
300
301
### Precedence Rules
302
303
Define operator precedence and associativity:
304
305
```python
306
precedence = (
307
('left', 'PLUS', 'MINUS'),
308
('left', 'TIMES', 'DIVIDE'),
309
('right', 'UMINUS'), # Unary minus
310
)
311
312
def p_expression_uminus(p):
313
'''expression : MINUS expression %prec UMINUS'''
314
p[0] = -p[2]
315
```
316
317
### Error Recovery
318
319
Handle syntax errors with error productions and recovery:
320
321
```python
322
def p_error(p):
323
if p:
324
print(f"Syntax error at token {p.type} (line {p.lineno})")
325
else:
326
print("Syntax error at EOF")
327
328
def p_statement_error(p):
329
'''statement : error SEMICOLON'''
330
print("Syntax error in statement. Skipping to next semicolon.")
331
p[0] = None
332
```
333
334
## Parser Configuration
335
336
### Global Configuration Variables
337
338
Module-level configuration constants:
339
340
```python { .api }
341
yaccdebug = False # Global debug mode flag
342
debug_file = 'parser.out' # Default debug output filename
343
error_count = 3 # Number of error recovery symbols
344
resultlimit = 40 # Debug result display size limit
345
MAXINT = sys.maxsize # Maximum integer value
346
```
347
348
### Start Symbol
349
350
The parser automatically uses the first grammar rule as the start symbol, or you can specify it explicitly:
351
352
```python
353
# Automatic start symbol (first rule)
354
def p_program(p):
355
'''program : statement_list'''
356
p[0] = p[1]
357
358
# Or specify explicitly in yacc() call
359
parser = yacc.yacc(start='program')
360
```
361
362
## Error Recovery Mechanisms
363
364
The parser provides several error recovery strategies:
365
366
1. **Error productions**: Grammar rules with `error` token for local recovery
367
2. **Global error handler**: `p_error()` function for unhandled syntax errors
368
3. **Error state management**: `errok()` method to clear error state
369
4. **Token synchronization**: Skip tokens until synchronization point
370
5. **Parser restart**: `restart()` method for complete recovery
371
372
## Position Tracking
373
374
Track source position information through tokens and productions using the `YaccProduction` parameter:
375
376
```python
377
def p_assignment(p):
378
'''assignment : ID EQUALS expression'''
379
# p is a YaccProduction instance - access position information
380
id_line = p.lineno(1) # Line number of ID
381
id_pos = p.lexpos(1) # Character position of ID
382
span = p.linespan(1) # Line span of ID
383
384
# Set position for result
385
p.set_lineno(0, id_line)
386
p[0] = AST.Assignment(p[1], p[3], line=id_line)
387
```
388
389
## Global Variables
390
391
When `yacc()` is called, it sets a global variable:
392
393
- `parse`: Global parse function bound to the created parser
394
395
This allows for simplified usage: `result = parse(input, lexer=lexer)`
396
397
## Configuration Constants
398
399
```python { .api }
400
yaccdebug = False # Global debug mode flag
401
debug_file = 'parser.out' # Default debug output filename
402
error_count = 3 # Number of error recovery symbols
403
resultlimit = 40 # Debug result display size limit
404
MAXINT = sys.maxsize # Maximum integer value
405
```