Annotate AST trees with source code positions
npx @tessl/cli install tessl/pypi-asttokens@3.0.00
# ASTTokens
1
2
A Python library that annotates Abstract Syntax Trees (ASTs) with the positions of tokens and text in the source code that generated them. ASTTokens enables tools that work with logical AST nodes to find the particular text that resulted in those nodes, making it essential for automated refactoring, syntax highlighting, and code analysis tools.
3
4
## Package Information
5
6
- **Package Name**: asttokens
7
- **Language**: Python
8
- **Installation**: `pip install asttokens`
9
10
## Core Imports
11
12
```python
13
import asttokens
14
```
15
16
For direct access to main classes:
17
18
```python
19
from asttokens import ASTTokens, ASTText, LineNumbers, supports_tokenless
20
```
21
22
For utility functions:
23
24
```python
25
import asttokens.util
26
# or
27
from asttokens.util import walk, visit_tree, is_expr, match_token, token_repr
28
```
29
30
## Basic Usage
31
32
```python
33
import asttokens
34
import asttokens.util
35
import ast
36
37
# Basic usage - parse and annotate source code
38
source = "Robot('blue').walk(steps=10*n)"
39
atok = asttokens.ASTTokens(source, parse=True)
40
41
# Find a specific AST node and get its source text
42
attr_node = next(n for n in ast.walk(atok.tree) if isinstance(n, ast.Attribute))
43
print(atok.get_text(attr_node)) # Output: Robot('blue').walk
44
45
# Get position information
46
start, end = attr_node.last_token.startpos, attr_node.last_token.endpos
47
print(atok.text[:start] + 'RUN' + atok.text[end:]) # Output: Robot('blue').RUN(steps=10*n)
48
49
# Performance-optimized usage for newer Python versions
50
if asttokens.supports_tokenless():
51
astext = asttokens.ASTText(source, tree=ast.parse(source))
52
text = astext.get_text(attr_node) # Faster for supported nodes
53
```
54
55
## Architecture
56
57
ASTTokens provides a layered architecture for AST-to-source mapping:
58
59
- **ASTTokens**: Full-featured class that tokenizes source code and marks AST nodes with `.first_token` and `.last_token` attributes
60
- **ASTText**: Performance-optimized alternative that uses AST position information when available, falling back to tokenization
61
- **LineNumbers**: Utility for converting between character offsets and line/column positions
62
- **Token**: Enhanced token representation with both line/column and character offset positions
63
64
The library supports both standard Python `ast` module trees and `astroid` library trees, making it compatible with various Python static analysis tools.
65
66
## Capabilities
67
68
### Core AST Processing
69
70
Main classes for annotating AST trees with source code positions and extracting text from AST nodes. These provide the primary functionality for mapping between AST structures and their corresponding source code.
71
72
```python { .api }
73
class ASTTokens:
74
def __init__(self, source_text, parse=False, tree=None, filename='<unknown>', tokens=None): ...
75
def get_text(self, node, padded=True) -> str: ...
76
def get_text_range(self, node, padded=True) -> tuple[int, int]: ...
77
def mark_tokens(self, root_node): ...
78
79
class ASTText:
80
def __init__(self, source_text, tree=None, filename='<unknown>'): ...
81
def get_text(self, node, padded=True) -> str: ...
82
def get_text_range(self, node, padded=True) -> tuple[int, int]: ...
83
```
84
85
[Core AST Processing](./ast-processing.md)
86
87
### Token Navigation
88
89
Functions and methods for navigating and searching through tokenized source code, finding specific tokens by position, type, or content.
90
91
```python { .api }
92
class ASTTokens:
93
def get_token_from_offset(self, offset) -> Token: ...
94
def get_token(self, lineno, col_offset) -> Token: ...
95
def next_token(self, tok, include_extra=False) -> Token: ...
96
def prev_token(self, tok, include_extra=False) -> Token: ...
97
def find_token(self, start_token, tok_type, tok_str=None, reverse=False) -> Token: ...
98
```
99
100
[Token Navigation](./token-navigation.md)
101
102
### Position Utilities
103
104
Utilities for converting between different position representations (line/column vs character offsets) and working with source code positions.
105
106
```python { .api }
107
class LineNumbers:
108
def __init__(self, text): ...
109
def line_to_offset(self, line, column) -> int: ...
110
def offset_to_line(self, offset) -> tuple[int, int]: ...
111
def from_utf8_col(self, line, utf8_column) -> int: ...
112
113
def supports_tokenless(node=None) -> bool: ...
114
```
115
116
[Position Utilities](./position-utilities.md)
117
118
### AST Node Utilities
119
120
Helper functions for working with AST nodes, including type checking, tree traversal, and node classification utilities.
121
122
```python { .api }
123
def walk(node, include_joined_str=False): ...
124
def visit_tree(node, previsit, postvisit): ...
125
def is_expr(node) -> bool: ...
126
def is_stmt(node) -> bool: ...
127
def is_module(node) -> bool: ...
128
```
129
130
[AST Node Utilities](./ast-utilities.md)
131
132
### Utility Module Access
133
134
The `asttokens.util` module provides additional utility functions for advanced use cases including token manipulation, tree traversal, and node type checking. These functions offer fine-grained control over AST processing beyond the main classes.
135
136
```python { .api }
137
import asttokens.util
138
139
# Module contains various utility functions accessible as:
140
# asttokens.util.walk()
141
# asttokens.util.match_token()
142
# asttokens.util.is_expr()
143
# ... and many others documented in sub-docs
144
```
145
146
## Types
147
148
```python { .api }
149
from typing import Tuple, List, Iterator, Optional, Any
150
151
class Token:
152
"""Enhanced token representation with position information."""
153
type: int # Token type from token module
154
string: str # Token text content
155
start: Tuple[int, int] # Starting (row, column) position
156
end: Tuple[int, int] # Ending (row, column) position
157
line: str # Original line text
158
index: int # Token index in token list
159
startpos: int # Starting character offset
160
endpos: int # Ending character offset
161
162
def __str__(self) -> str: ...
163
164
# Type aliases for AST nodes with token attributes
165
AstNode = Any # Union of ast.AST and astroid nodes with .first_token/.last_token
166
EnhancedAST = Any # AST with added token attributes
167
```