0
# PLY (Python Lex-Yacc)
1
2
PLY is a pure Python implementation of the popular Unix parsing tools lex and yacc. It provides a complete framework for building lexical analyzers and parsers using the LALR(1) parsing algorithm, designed for creating compilers, interpreters, protocol decoders, and other language processing tools.
3
4
## Package Information
5
6
- **Package Name**: ply
7
- **Language**: Python
8
- **Installation**: Copy directly from GitHub (no longer distributed via PyPI)
9
- **Repository**: https://github.com/dabeaz/ply
10
- **Version**: 2022.10.27
11
12
## Core Imports
13
14
```python
15
import ply.lex as lex
16
import ply.yacc as yacc
17
```
18
19
Alternative import patterns:
20
21
```python
22
from ply import lex
23
from ply import yacc
24
```
25
26
## Basic Usage
27
28
```python
29
import ply.lex as lex
30
import ply.yacc as yacc
31
32
# Define tokens for lexical analysis
33
tokens = (
34
'NAME',
35
'NUMBER',
36
'PLUS',
37
'MINUS',
38
'TIMES',
39
'DIVIDE',
40
'LPAREN',
41
'RPAREN',
42
)
43
44
# Token rules
45
t_PLUS = r'\+'
46
t_MINUS = r'-'
47
t_TIMES = r'\*'
48
t_DIVIDE = r'/'
49
t_LPAREN = r'\('
50
t_RPAREN = r'\)'
51
t_ignore = ' \t'
52
53
def t_NAME(t):
54
r'[a-zA-Z_][a-zA-Z_0-9]*'
55
return t
56
57
def t_NUMBER(t):
58
r'\d+'
59
t.value = int(t.value)
60
return t
61
62
def t_newline(t):
63
r'\n+'
64
t.lexer.lineno += len(t.value)
65
66
def t_error(t):
67
print(f"Illegal character '{t.value[0]}'")
68
t.lexer.skip(1)
69
70
# Build the lexer
71
lexer = lex.lex()
72
73
# Define grammar rules for parsing
74
def p_expression_binop(p):
75
'''expression : expression PLUS term
76
| expression MINUS term'''
77
if p[2] == '+':
78
p[0] = p[1] + p[3]
79
elif p[2] == '-':
80
p[0] = p[1] - p[3]
81
82
def p_expression_term(p):
83
'''expression : term'''
84
p[0] = p[1]
85
86
def p_term_binop(p):
87
'''term : term TIMES factor
88
| term DIVIDE factor'''
89
if p[2] == '*':
90
p[0] = p[1] * p[3]
91
elif p[2] == '/':
92
p[0] = p[1] / p[3]
93
94
def p_term_factor(p):
95
'''term : factor'''
96
p[0] = p[1]
97
98
def p_factor_num(p):
99
'''factor : NUMBER'''
100
p[0] = p[1]
101
102
def p_factor_expr(p):
103
'''factor : LPAREN expression RPAREN'''
104
p[0] = p[2]
105
106
def p_error(p):
107
if p:
108
print(f"Syntax error at token {p.type}")
109
else:
110
print("Syntax error at EOF")
111
112
# Build the parser
113
parser = yacc.yacc()
114
115
# Parse input
116
result = parser.parse("3 + 4 * 2", lexer=lexer)
117
print(f"Result: {result}") # Output: Result: 11
118
```
119
120
## Architecture
121
122
PLY follows the traditional Unix lex/yacc design with two separate but coordinated phases:
123
124
- **Lexical Analysis (`lex`)**: Converts raw text into tokens using regular expressions and state machines
125
- **Syntax Analysis (`yacc`)**: Parses token streams into structured data using LALR(1) grammar rules
126
- **Convention-based API**: Uses function/variable naming patterns for automatic rule discovery
127
- **Error Recovery**: Comprehensive error handling and recovery mechanisms for both phases
128
129
The design emphasizes simplicity and educational value while providing production-ready parsing capabilities.
130
131
## Capabilities
132
133
### Lexical Analysis
134
135
Tokenizes input text using regular expressions and finite state machines. Supports multiple lexer states, line tracking, error handling, and flexible token rules defined through naming conventions.
136
137
```python { .api }
138
def lex(*, module=None, object=None, debug=False, reflags=int(re.VERBOSE), debuglog=None, errorlog=None): ...
139
def TOKEN(r): ...
140
def runmain(lexer=None, data=None): ...
141
142
class Lexer:
143
def input(self, s): ...
144
def token(self): ...
145
def clone(self, object=None): ...
146
def begin(self, state): ...
147
def push_state(self, state): ...
148
def pop_state(self): ...
149
def current_state(self): ...
150
def skip(self, n): ...
151
def __iter__(self): ...
152
def __next__(self): ...
153
lineno: int
154
lexpos: int
155
156
class LexToken:
157
type: str
158
value: any
159
lineno: int
160
lexpos: int
161
```
162
163
[Lexical Analysis](./lexical-analysis.md)
164
165
### Syntax Parsing
166
167
Parses token streams using LALR(1) algorithm with grammar rules defined in function docstrings. Supports precedence rules, error recovery, debugging, and ambiguity resolution.
168
169
```python { .api }
170
def yacc(*, debug=False, module=None, start=None, check_recursion=True, optimize=False, debugfile='parser.out', debuglog=None, errorlog=None): ...
171
def format_result(r): ...
172
def format_stack_entry(r): ...
173
174
class LRParser:
175
def parse(self, input=None, lexer=None, debug=False, tracking=False): ...
176
def errok(self): ...
177
def restart(self): ...
178
def set_defaulted_states(self): ...
179
def disable_defaulted_states(self): ...
180
181
class YaccProduction:
182
def lineno(self, n): ...
183
def set_lineno(self, n, lineno): ...
184
def linespan(self, n): ...
185
def lexpos(self, n): ...
186
def set_lexpos(self, n, lexpos): ...
187
def lexspan(self, n): ...
188
def error(self): ...
189
def __getitem__(self, n): ...
190
def __setitem__(self, n, v): ...
191
def __len__(self): ...
192
slice: list
193
stack: list
194
lexer: object
195
parser: object
196
```
197
198
[Syntax Parsing](./syntax-parsing.md)
199
200
## Types
201
202
```python { .api }
203
class LexError(Exception):
204
"""Exception raised for lexical analysis errors"""
205
text: str
206
207
class YaccError(Exception):
208
"""Base exception for parser errors"""
209
210
class GrammarError(YaccError):
211
"""Exception for grammar specification errors"""
212
213
class LALRError(YaccError):
214
"""Exception for LALR parsing algorithm errors"""
215
216
class PlyLogger:
217
"""Logging utility for PLY operations"""
218
def critical(self, msg, *args, **kwargs): ...
219
def warning(self, msg, *args, **kwargs): ...
220
def error(self, msg, *args, **kwargs): ...
221
def info(self, msg, *args, **kwargs): ...
222
def debug(self, msg, *args, **kwargs): ...
223
224
class NullLogger:
225
"""Null logging implementation"""
226
def debug(self, msg, *args, **kwargs): ...
227
def warning(self, msg, *args, **kwargs): ...
228
def error(self, msg, *args, **kwargs): ...
229
def info(self, msg, *args, **kwargs): ...
230
def critical(self, msg, *args, **kwargs): ...
231
232
class YaccSymbol:
233
"""Internal parser symbol representation"""
234
def __str__(self): ...
235
def __repr__(self): ...
236
237
# Configuration constants
238
yaccdebug: bool = False
239
debug_file: str = 'parser.out'
240
error_count: int = 3
241
resultlimit: int = 40
242
MAXINT: int
243
StringTypes: tuple = (str, bytes)
244
245
# Package version
246
__version__: str = '2022.10.27'
247
```