Tessl Tile for pypi/ply@2022.10.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

index.md lexical-analysis.md syntax-parsing.md

lexical-analysis.mddocs/

0
# Lexical Analysis
1

2
The `ply.lex` module provides lexical analysis capabilities, converting raw text input into a stream of tokens using regular expressions and finite state machines. It supports multiple lexer states, comprehensive error handling, and flexible token rule definitions.
3

4
## Capabilities
5

6
### Lexer Creation
7

8
Creates a lexer instance by analyzing token rules defined in the calling module or specified object. Automatically discovers token rules through naming conventions and validates the lexical specification.
9

10
```python { .api }
11
def lex(*, module=None, object=None, debug=False, reflags=int(re.VERBOSE), debuglog=None, errorlog=None):
12
    """
13
    Build a lexer from token rules.
14

15
    Parameters:
16
    - module: Module containing token rules (default: calling module)
17
    - object: Object containing token rules (alternative to module)
18
    - debug: Enable debug mode for lexer construction
19
    - reflags: Regular expression flags (default: re.VERBOSE)
20
    - debuglog: Logger for debug output
21
    - errorlog: Logger for error messages
22

23
    Returns:
24
    Lexer instance
25
    """
26
```
27

28
### Token Rule Decorator
29

30
Decorator for adding regular expression patterns to token rule functions, enabling more complex token processing while maintaining the pattern association.
31

32
```python { .api }
33
def TOKEN(r):
34
    """
35
    Decorator to add a regular expression pattern to a token rule function.
36

37
    Parameters:
38
    - r: Regular expression pattern string
39

40
    Returns:
41
    Decorated function with attached regex pattern
42
    """
43
```
44

45
Usage example:
46
```python
47
@TOKEN(r'\d+')
48
def t_NUMBER(t):
49
    t.value = int(t.value)
50
    return t
51
```
52

53
### Standalone Lexer Execution
54

55
Runs a lexer in standalone mode for testing and debugging purposes, reading input from command line arguments or standard input.
56

57
```python { .api }
58
def runmain(lexer=None, data=None):
59
    """
60
    Run lexer in standalone mode.
61

62
    Parameters:
63
    - lexer: Lexer instance to run (default: global lexer)
64
    - data: Input data to tokenize (default: from command line)
65
    """
66
```
67

68
### Lexer Class
69

70
The main lexer class that performs tokenization of input strings. Supports stateful tokenization, error recovery, and position tracking.
71

72
```python { .api }
73
class Lexer:
74
    def input(self, s):
75
        """
76
        Set the input string for tokenization.
77

78
        Parameters:
79
        - s: Input string to tokenize
80
        """
81

82
    def token(self):
83
        """
84
        Get the next token from input.
85

86
        Returns:
87
        LexToken instance or None if end of input
88
        """
89

90
    def clone(self, object=None):
91
        """
92
        Create a copy of the lexer.
93

94
        Parameters:
95
        - object: Object containing token rules (optional)
96

97
        Returns:
98
        New Lexer instance
99
        """
100

101
    def begin(self, state):
102
        """
103
        Change lexer to the specified state.
104

105
        Parameters:
106
        - state: State name to enter
107
        """
108

109
    def push_state(self, state):
110
        """
111
        Push current state and enter new state.
112

113
        Parameters:
114
        - state: State name to enter
115
        """
116

117
    def pop_state(self):
118
        """
119
        Pop state from stack and return to previous state.
120

121
        Returns:
122
        Previous state name
123
        """
124

125
    def current_state(self):
126
        """
127
        Get the current lexer state.
128

129
        Returns:
130
        Current state name
131
        """
132

133
    def skip(self, n):
134
        """
135
        Skip n characters in the input.
136

137
        Parameters:
138
        - n: Number of characters to skip
139
        """
140

141
    def __iter__(self):
142
        """
143
        Iterator interface for tokenization.
144

145
        Returns:
146
        Iterator object (self)
147
        """
148

149
    def __next__(self):
150
        """
151
        Get next token for iterator interface.
152

153
        Returns:
154
        Next LexToken or raises StopIteration
155
        """
156

157
    # Public attributes
158
    lineno: int     # Current line number
159
    lexpos: int     # Current position in input string
160
```
161

162
### Token Representation
163

164
Object representing a lexical token with type, value, and position information.
165

166
```python { .api }
167
class LexToken:
168
    """
169
    Token object created by lexer.
170

171
    Attributes:
172
    - type: Token type string
173
    - value: Token value (original text or processed value)
174
    - lineno: Line number where token appears
175
    - lexpos: Character position in input string
176
    """
177
    type: str
178
    value: any
179
    lineno: int
180
    lexpos: int
181
```
182

183
### Lexer Error Handling
184

185
Exception class for lexical analysis errors and logging utilities.
186

187
```python { .api }
188
class LexError(Exception):
189
    """Exception raised for lexical analysis errors."""
190

191
class PlyLogger:
192
    """
193
    Logging utility for PLY operations.
194
    Provides structured logging for lexer construction and operation.
195
    """
196
```
197

198
## Token Rule Conventions
199

200
### Basic Token Rules
201

202
Define tokens using variables or functions with `t_` prefix:
203

204
```python
205
# Simple token with literal regex
206
t_PLUS = r'\+'
207
t_MINUS = r'-'
208
t_ignore = ' \t'  # Characters to ignore
209

210
# Token function with processing
211
def t_NUMBER(t):
212
    r'\d+'
213
    t.value = int(t.value)
214
    return t
215

216
def t_ID(t):
217
    r'[a-zA-Z_][a-zA-Z_0-9]*'
218
    # Check for reserved words
219
    t.type = reserved.get(t.value, 'ID')
220
    return t
221
```
222

223
### Special Token Functions
224

225
Required functions for proper lexer operation:
226

227
```python
228
def t_newline(t):
229
    r'\n+'
230
    t.lexer.lineno += len(t.value)
231

232
def t_error(t):
233
    print(f"Illegal character '{t.value[0]}'")
234
    t.lexer.skip(1)
235
```
236

237
### Multiple States
238

239
Support for lexer states to handle context-sensitive tokenization:
240

241
```python
242
states = (
243
    ('comment', 'exclusive'),
244
    ('string', 'exclusive'),
245
)
246

247
# Rules for specific states
248
def t_comment_start(t):
249
    r'/\*'
250
    t.lexer.begin('comment')
251

252
def t_comment_end(t):
253
    r'\*/'
254
    t.lexer.begin('INITIAL')
255

256
def t_comment_error(t):
257
    t.lexer.skip(1)
258
```
259

260
## Error Recovery
261

262
The lexer provides multiple mechanisms for handling errors:
263

264
1. **t_error() function**: Called when illegal characters are encountered
265
2. **skip() method**: Skip characters during error recovery
266
3. **LexError exception**: Raised for critical lexer errors
267
4. **Logging**: Comprehensive error reporting through PlyLogger
268

269
## Global Variables
270

271
When `lex()` is called, it sets global variables for convenience:
272

273
- `lexer`: Global lexer instance
274
- `token`: Global token function (reference to lexer.token())
275
- `input`: Global input function (reference to lexer.input())
276

277
These allow for simplified usage patterns while maintaining access to the full Lexer API.
278

279
## Constants
280

281
```python { .api }
282
StringTypes = (str, bytes)  # Acceptable string types for PLY
283
```

Version

Tile

Files

lexical-analysis.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

lexical-analysis.mddocs/