Annotate AST trees with source code positions
—
Quality
Pending
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
A Python library that annotates Abstract Syntax Trees (ASTs) with the positions of tokens and text in the source code that generated them. ASTTokens enables tools that work with logical AST nodes to find the particular text that resulted in those nodes, making it essential for automated refactoring, syntax highlighting, and code analysis tools.
pip install asttokensimport asttokensFor direct access to main classes:
from asttokens import ASTTokens, ASTText, LineNumbers, supports_tokenlessFor utility functions:
import asttokens.util
# or
from asttokens.util import walk, visit_tree, is_expr, match_token, token_reprimport asttokens
import asttokens.util
import ast
# Basic usage - parse and annotate source code
source = "Robot('blue').walk(steps=10*n)"
atok = asttokens.ASTTokens(source, parse=True)
# Find a specific AST node and get its source text
attr_node = next(n for n in ast.walk(atok.tree) if isinstance(n, ast.Attribute))
print(atok.get_text(attr_node)) # Output: Robot('blue').walk
# Get position information
start, end = attr_node.last_token.startpos, attr_node.last_token.endpos
print(atok.text[:start] + 'RUN' + atok.text[end:]) # Output: Robot('blue').RUN(steps=10*n)
# Performance-optimized usage for newer Python versions
if asttokens.supports_tokenless():
astext = asttokens.ASTText(source, tree=ast.parse(source))
text = astext.get_text(attr_node) # Faster for supported nodesASTTokens provides a layered architecture for AST-to-source mapping:
.first_token and .last_token attributesThe library supports both standard Python ast module trees and astroid library trees, making it compatible with various Python static analysis tools.
Main classes for annotating AST trees with source code positions and extracting text from AST nodes. These provide the primary functionality for mapping between AST structures and their corresponding source code.
class ASTTokens:
def __init__(self, source_text, parse=False, tree=None, filename='<unknown>', tokens=None): ...
def get_text(self, node, padded=True) -> str: ...
def get_text_range(self, node, padded=True) -> tuple[int, int]: ...
def mark_tokens(self, root_node): ...
class ASTText:
def __init__(self, source_text, tree=None, filename='<unknown>'): ...
def get_text(self, node, padded=True) -> str: ...
def get_text_range(self, node, padded=True) -> tuple[int, int]: ...Functions and methods for navigating and searching through tokenized source code, finding specific tokens by position, type, or content.
class ASTTokens:
def get_token_from_offset(self, offset) -> Token: ...
def get_token(self, lineno, col_offset) -> Token: ...
def next_token(self, tok, include_extra=False) -> Token: ...
def prev_token(self, tok, include_extra=False) -> Token: ...
def find_token(self, start_token, tok_type, tok_str=None, reverse=False) -> Token: ...Utilities for converting between different position representations (line/column vs character offsets) and working with source code positions.
class LineNumbers:
def __init__(self, text): ...
def line_to_offset(self, line, column) -> int: ...
def offset_to_line(self, offset) -> tuple[int, int]: ...
def from_utf8_col(self, line, utf8_column) -> int: ...
def supports_tokenless(node=None) -> bool: ...Helper functions for working with AST nodes, including type checking, tree traversal, and node classification utilities.
def walk(node, include_joined_str=False): ...
def visit_tree(node, previsit, postvisit): ...
def is_expr(node) -> bool: ...
def is_stmt(node) -> bool: ...
def is_module(node) -> bool: ...The asttokens.util module provides additional utility functions for advanced use cases including token manipulation, tree traversal, and node type checking. These functions offer fine-grained control over AST processing beyond the main classes.
import asttokens.util
# Module contains various utility functions accessible as:
# asttokens.util.walk()
# asttokens.util.match_token()
# asttokens.util.is_expr()
# ... and many others documented in sub-docsfrom typing import Tuple, List, Iterator, Optional, Any
class Token:
"""Enhanced token representation with position information."""
type: int # Token type from token module
string: str # Token text content
start: Tuple[int, int] # Starting (row, column) position
end: Tuple[int, int] # Ending (row, column) position
line: str # Original line text
index: int # Token index in token list
startpos: int # Starting character offset
endpos: int # Ending character offset
def __str__(self) -> str: ...
# Type aliases for AST nodes with token attributes
AstNode = Any # Union of ast.AST and astroid nodes with .first_token/.last_token
EnhancedAST = Any # AST with added token attributesInstall with Tessl CLI
npx tessl i tessl/pypi-asttokens