0
# AST Parsing and Code Analysis
1
2
The parsing system converts Python source code into structured Definition objects representing modules, classes, functions, and methods with docstring extraction. It provides a lightweight AST-like representation optimized for docstring analysis.
3
4
## Capabilities
5
6
### Main Parser
7
8
The Parser class tokenizes Python source code and constructs a tree of Definition objects representing the code structure.
9
10
```python { .api }
11
class Parser:
12
"""
13
Parser for Python source code into Definition objects.
14
15
Uses Python's tokenize module to parse source code and construct
16
a tree of Definition objects (Module, Class, Function, etc.) with
17
extracted docstrings and metadata.
18
"""
19
20
def __call__(self, filelike, filename):
21
"""
22
Parse Python source code from file-like object.
23
24
Parameters:
25
- filelike: file-like object, source code to parse
26
- filename: str, filename for error reporting
27
28
Returns:
29
Module: Root module definition with children
30
"""
31
32
# Global parser instance
33
parse = Parser()
34
```
35
36
Usage examples:
37
38
```python
39
from pep257 import parse
40
from io import StringIO
41
42
# Parse source code string
43
source = '''
44
"""Module docstring."""
45
46
class MyClass:
47
"""Class docstring."""
48
49
def my_method(self):
50
"""Method docstring."""
51
pass
52
53
def my_function():
54
"""Function docstring."""
55
return 42
56
'''
57
58
module = parse(StringIO(source), 'example.py')
59
60
# Walk all definitions
61
for definition in module:
62
print(f"{definition.kind} {definition.name}: {definition.docstring}")
63
```
64
65
### Token Processing
66
67
Token-level processing classes for parsing Python source code.
68
69
```python { .api }
70
class TokenStream:
71
"""
72
Wrapper around tokenize.generate_tokens for parsing.
73
74
Provides a stream interface over Python tokens with current token
75
tracking and movement operations.
76
"""
77
78
def __init__(self, filelike):
79
"""
80
Initialize token stream.
81
82
Parameters:
83
- filelike: file-like object, source to tokenize
84
"""
85
86
@property
87
def current(self):
88
"""Token: Current token in the stream."""
89
90
@property
91
def line(self):
92
"""int: Current line number."""
93
94
def move(self):
95
"""
96
Move to next token in stream.
97
98
Returns:
99
Token: Previous token before move
100
"""
101
102
def __iter__(self):
103
"""
104
Iterate over all tokens in stream.
105
106
Returns:
107
generator: Token objects
108
"""
109
110
class Token:
111
"""
112
Represents a tokenized element from Python source.
113
114
Attributes:
115
- kind: TokenKind, type of token
116
- value: str, token string value
117
- start: tuple, (line, column) start position
118
- end: tuple, (line, column) end position
119
- source: str, source line containing token
120
"""
121
122
def __init__(self, kind, value, start, end, source):
123
"""Initialize token with tokenize output."""
124
125
class TokenKind(int):
126
"""
127
Token type identifier with string representation.
128
129
Subclass of int that provides readable __repr__ using token names.
130
"""
131
132
def __repr__(self):
133
"""str: Human-readable token kind name."""
134
```
135
136
### Definition Hierarchy
137
138
Base classes and hierarchy for representing parsed Python code structures.
139
140
```python { .api }
141
class Value:
142
"""
143
Base class for value objects with field-based initialization.
144
145
Provides common functionality for objects with _fields attribute
146
defining the object's structure.
147
"""
148
149
def __init__(self, *args):
150
"""Initialize object with field values."""
151
152
def __hash__(self):
153
"""int: Hash based on object representation."""
154
155
def __eq__(self, other):
156
"""bool: Equality based on field values."""
157
158
def __repr__(self):
159
"""str: String representation with field values."""
160
161
class Definition(Value):
162
"""
163
Base class for all code definitions (modules, classes, functions).
164
165
Represents a parsed code structure with name, source location,
166
docstring, and hierarchical relationships.
167
"""
168
169
_fields = ('name', '_source', 'start', 'end', 'decorators', 'docstring',
170
'children', 'parent')
171
172
@property
173
def _human(self):
174
"""str: Human-readable type name."""
175
176
@property
177
def kind(self):
178
"""str: Definition kind (module, class, function, etc.)."""
179
180
@property
181
def module(self):
182
"""Module: Root module containing this definition."""
183
184
@property
185
def all(self):
186
"""list: __all__ list from containing module."""
187
188
@property
189
def _slice(self):
190
"""slice: Source line slice for this definition."""
191
192
@property
193
def source(self):
194
"""str: Source code for this definition."""
195
196
@property
197
def _publicity(self):
198
"""str: 'public' or 'private' based on is_public."""
199
200
def __iter__(self):
201
"""
202
Iterate over this definition and all children.
203
204
Returns:
205
generator: This definition followed by all descendants
206
"""
207
208
def __str__(self):
209
"""str: Human-readable description of definition location."""
210
```
211
212
### Module Definitions
213
214
```python { .api }
215
class Module(Definition):
216
"""
217
Represents a Python module.
218
219
Root definition type that contains classes, functions, and other
220
module-level definitions. Tracks __all__ exports and future imports.
221
"""
222
223
_fields = ('name', '_source', 'start', 'end', 'decorators', 'docstring',
224
'children', 'parent', '_all', 'future_imports')
225
226
is_public = True
227
228
@property
229
def module(self):
230
"""Module: Returns self (modules are their own root)."""
231
232
@property
233
def all(self):
234
"""list: __all__ exports list."""
235
236
def _nest(self, statement_type):
237
"""
238
Get nested definition class for statement type.
239
240
Parameters:
241
- statement_type: str, 'def' or 'class'
242
243
Returns:
244
type: Function or Class definition class
245
"""
246
247
def __str__(self):
248
"""str: Module location description."""
249
250
class Package(Module):
251
"""
252
A package is a __init__.py module.
253
254
Special module type representing a Python package's __init__.py file.
255
"""
256
```
257
258
### Function Definitions
259
260
```python { .api }
261
class Function(Definition):
262
"""
263
Represents a function definition.
264
265
Handles both module-level functions and nested functions with
266
appropriate publicity determination.
267
"""
268
269
@property
270
def is_public(self):
271
"""
272
bool: True if function is public.
273
274
Functions are public if:
275
- Listed in __all__ (if __all__ exists), or
276
- Name doesn't start with underscore (if no __all__)
277
"""
278
279
def _nest(self, statement_type):
280
"""
281
Get nested definition class for statement type.
282
283
Parameters:
284
- statement_type: str, 'def' or 'class'
285
286
Returns:
287
type: NestedFunction or NestedClass
288
"""
289
290
class NestedFunction(Function):
291
"""
292
Represents a nested function definition.
293
294
Functions defined inside other functions or methods.
295
Always considered private.
296
"""
297
298
is_public = False
299
300
class Method(Function):
301
"""
302
Represents a class method definition.
303
304
Methods with special publicity rules considering decorators,
305
magic methods, and parent class publicity.
306
"""
307
308
@property
309
def is_public(self):
310
"""
311
bool: True if method is public.
312
313
Methods are public if:
314
- Parent class is public, AND
315
- Name doesn't start with underscore OR is magic method OR
316
is variadic magic method (__init__, __call__, __new__), AND
317
- Not a property setter/deleter method
318
"""
319
```
320
321
### Class Definitions
322
323
```python { .api }
324
class Class(Definition):
325
"""
326
Represents a class definition.
327
328
Handles both module-level classes and nested classes with
329
appropriate nesting behavior.
330
"""
331
332
@property
333
def is_public(self):
334
"""
335
bool: True if class is public.
336
337
Uses same logic as Function.is_public.
338
"""
339
340
def _nest(self, statement_type):
341
"""
342
Get nested definition class for statement type.
343
344
Parameters:
345
- statement_type: str, 'def' or 'class'
346
347
Returns:
348
type: Method or NestedClass
349
"""
350
351
class NestedClass(Class):
352
"""
353
Represents a nested class definition.
354
355
Classes defined inside other classes or functions.
356
Always considered private.
357
"""
358
359
is_public = False
360
```
361
362
### Decorator Representation
363
364
```python { .api }
365
class Decorator(Value):
366
"""
367
A decorator for function, method or class.
368
369
Represents decorator syntax with name and arguments.
370
"""
371
372
_fields = ('name', 'arguments')
373
374
def __init__(self, name, arguments):
375
"""
376
Initialize decorator.
377
378
Parameters:
379
- name: str, decorator name
380
- arguments: str, decorator arguments
381
"""
382
```
383
384
### Parser Implementation Details
385
386
```python { .api }
387
class Parser:
388
"""Internal parsing methods and state management."""
389
390
@property
391
def current(self):
392
"""Token: Current token from stream."""
393
394
@property
395
def line(self):
396
"""int: Current line number."""
397
398
def consume(self, kind):
399
"""
400
Consume token of expected kind.
401
402
Parameters:
403
- kind: TokenKind, expected token type
404
"""
405
406
def leapfrog(self, kind, value=None):
407
"""
408
Skip tokens until reaching specified kind/value.
409
410
Parameters:
411
- kind: TokenKind, token type to find
412
- value: str, optional token value to match
413
"""
414
415
def parse_docstring(self):
416
"""
417
Parse a single docstring and return its value.
418
419
Returns:
420
str or None: Docstring value if found
421
"""
422
423
def parse_decorators(self):
424
"""
425
Parse decorators into accumulated decorators list.
426
427
Called after first @ is found. Continues until 'def' or 'class'.
428
"""
429
430
def parse_definitions(self, class_, all=False):
431
"""
432
Parse multiple definitions and yield them.
433
434
Parameters:
435
- class_: type, definition class for context
436
- all: bool, whether to parse __all__ statements
437
438
Returns:
439
generator: Definition objects
440
"""
441
442
def parse_all(self):
443
"""
444
Parse the __all__ definition in a module.
445
446
Evaluates __all__ content and stores in self.all.
447
448
Raises:
449
AllError: If __all__ cannot be parsed
450
"""
451
452
def parse_module(self):
453
"""
454
Parse a module (and its children) and return Module object.
455
456
Returns:
457
Module or Package: Parsed module definition
458
"""
459
460
def parse_definition(self, class_):
461
"""
462
Parse a definition and return its value in class_ object.
463
464
Parameters:
465
- class_: type, definition class to create
466
467
Returns:
468
Definition: Parsed definition of specified type
469
"""
470
471
def parse_from_import_statement(self):
472
"""
473
Parse a 'from x import y' statement.
474
475
Specifically looks for __future__ imports and tracks them
476
in self.future_imports.
477
"""
478
```
479
480
### Utility Functions
481
482
```python { .api }
483
def humanize(string):
484
"""
485
Convert CamelCase to human readable format.
486
487
Parameters:
488
- string: str, CamelCase string
489
490
Returns:
491
str: Human-readable string with spaces
492
"""
493
494
def is_magic(name):
495
"""
496
Check if name is a magic method.
497
498
Parameters:
499
- name: str, method name
500
501
Returns:
502
bool: True if magic method (starts/ends with __ but not variadic)
503
"""
504
505
def is_ascii(string):
506
"""
507
Check if string contains only ASCII characters.
508
509
Parameters:
510
- string: str, string to check
511
512
Returns:
513
bool: True if all characters are ASCII
514
"""
515
516
def is_blank(string):
517
"""
518
Check if string is blank/whitespace only.
519
520
Parameters:
521
- string: str, string to check
522
523
Returns:
524
bool: True if string is empty or whitespace
525
"""
526
527
def leading_space(string):
528
"""
529
Extract leading whitespace from string.
530
531
Parameters:
532
- string: str, string to analyze
533
534
Returns:
535
str: Leading whitespace characters
536
"""
537
```
538
539
### Constants
540
541
```python { .api }
542
VARIADIC_MAGIC_METHODS = ('__init__', '__call__', '__new__')
543
"""tuple: Magic methods that can have variable arguments."""
544
```
545
546
### Exception Classes
547
548
```python { .api }
549
class AllError(Exception):
550
"""
551
Exception for __all__ parsing errors.
552
553
Raised when __all__ variable cannot be parsed or evaluated.
554
Provides detailed error message with guidance.
555
"""
556
557
def __init__(self, message):
558
"""
559
Initialize with error message.
560
561
Parameters:
562
- message: str, base error message
563
"""
564
```