Tessl Tile for pypi/tokenize-rt@6.2.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

index.md

index.mddocs/

0
# Tokenize RT
1

2
A wrapper around the stdlib `tokenize` which roundtrips. The tokenize-rt package provides perfect roundtrip tokenization by introducing additional token types (ESCAPED_NL and UNIMPORTANT_WS) that preserve exact source code formatting, enabling precise refactoring tools that maintain whitespace, comments, and formatting while modifying Python source code.
3

4
## Package Information
5

6
- **Package Name**: tokenize-rt
7
- **Language**: Python
8
- **Installation**: `pip install tokenize-rt`
9

10
## Core Imports
11

12
```python
13
import tokenize_rt
14
```
15

16
Common for working with tokens:
17

18
```python
19
from tokenize_rt import src_to_tokens, tokens_to_src, Token
20
```
21

22
For additional utilities:
23

24
```python
25
from tokenize_rt import (
26
    ESCAPED_NL, UNIMPORTANT_WS, NON_CODING_TOKENS, NAMED_UNICODE_RE,
27
    Offset, reversed_enumerate, parse_string_literal, 
28
    rfind_string_parts, curly_escape, _re_partition
29
)
30
```
31

32
## Basic Usage
33

34
```python
35
from tokenize_rt import src_to_tokens, tokens_to_src, Token
36

37
# Convert source code to tokens
38
source = '''
39
def hello():
40
    print("Hello, world!")
41
'''
42

43
# Tokenize with perfect roundtrip capability
44
tokens = src_to_tokens(source)
45

46
# Each token has name, src, line, and utf8_byte_offset
47
for token in tokens:
48
    if token.name not in {'UNIMPORTANT_WS', 'ESCAPED_NL'}:
49
        print(f'{token.name}: {token.src!r}')
50

51
# Convert back to source (perfect roundtrip)
52
reconstructed = tokens_to_src(tokens)
53
assert source == reconstructed
54

55
# Working with specific tokens
56
name_tokens = [t for t in tokens if t.name == 'NAME']
57
print(f"Found {len(name_tokens)} NAME tokens")
58

59
# Using token matching
60
for token in tokens:
61
    if token.matches(name='NAME', src='hello'):
62
        print(f"Found 'hello' at line {token.line}, offset {token.utf8_byte_offset}")
63
```
64

65
## Capabilities
66

67
### Core Tokenization
68

69
Convert between Python source code and token representations with perfect roundtrip capability, preserving all formatting including whitespace and escaped newlines.
70

71
```python { .api }
72
def src_to_tokens(src: str) -> list[Token]:
73
    """
74
    Convert Python source code string to list of tokens.
75
    
76
    Args:
77
        src (str): Python source code to tokenize
78
        
79
    Returns:
80
        list[Token]: List of Token objects representing the source
81
    """
82

83
def tokens_to_src(tokens: Iterable[Token]) -> str:
84
    """
85
    Convert an iterable of tokens back to source code string.
86
    
87
    Args:
88
        tokens (Iterable[Token]): Tokens to convert back to source
89
        
90
    Returns:
91
        str: Reconstructed source code
92
    """
93
```
94

95
### Token Data Structures
96

97
Data structures for representing tokens and their positions within source code.
98

99
```python { .api }
100
class Offset(NamedTuple):
101
    """
102
    Represents a token offset with line and byte position information.
103
    """
104
    line: int | None = None
105
    utf8_byte_offset: int | None = None
106

107
class Token(NamedTuple):
108
    """
109
    Represents a tokenized element with position information.
110
    """
111
    name: str  # Token type name (from token.tok_name or custom types)
112
    src: str   # Source text of the token
113
    line: int | None = None            # Line number where token appears
114
    utf8_byte_offset: int | None = None # UTF-8 byte offset within the line
115
    
116
    @property
117
    def offset(self) -> Offset:
118
        """Returns an Offset object for this token."""
119
    
120
    def matches(self, *, name: str, src: str) -> bool:
121
        """
122
        Check if token matches given name and source.
123
        
124
        Args:
125
            name (str): Token name to match
126
            src (str): Token source to match
127
            
128
        Returns:
129
            bool: True if both name and src match
130
        """
131
```
132

133
### Token Navigation Utilities
134

135
Helper functions for working with token sequences, particularly useful for code refactoring and analysis tools.
136

137
```python { .api }
138
def reversed_enumerate(tokens: Sequence[Token]) -> Generator[tuple[int, Token]]:
139
    """
140
    Yield (index, token) pairs in reverse order.
141
    
142
    Args:
143
        tokens (Sequence[Token]): Token sequence to enumerate in reverse
144
        
145
    Yields:
146
        tuple[int, Token]: (index, token) pairs in reverse order
147
    """
148

149
def rfind_string_parts(tokens: Sequence[Token], i: int) -> tuple[int, ...]:
150
    """
151
    Find the indices of string parts in a (joined) string literal.
152
    
153
    Args:
154
        tokens (Sequence[Token]): Token sequence to search
155
        i (int): Starting index (should be at end of string literal)
156
        
157
    Returns:
158
        tuple[int, ...]: Indices of string parts, or empty tuple if not a string literal
159
    """
160
```
161

162
### String Literal Processing
163

164
Functions for parsing and processing Python string literals, including prefix extraction and escaping utilities.
165

166
```python { .api }
167
def parse_string_literal(src: str) -> tuple[str, str]:
168
    """
169
    Parse a string literal's source into (prefix, string) components.
170
    
171
    Args:
172
        src (str): String literal source code
173
        
174
    Returns:
175
        tuple[str, str]: (prefix, string) pair
176
        
177
    Example:
178
        >>> parse_string_literal('f"foo"')
179
        ('f', '"foo"')
180
    """
181

182
def curly_escape(s: str) -> str:
183
    """
184
    Escape curly braces in strings while preserving named unicode escapes.
185
    
186
    Args:
187
        s (str): String to escape
188
        
189
    Returns:
190
        str: String with curly braces escaped except in unicode names
191
    """
192
```
193

194
### Token Constants
195

196
Pre-defined constants for token classification and filtering.
197

198
```python { .api }
199
# Type imports (for reference in signatures)
200
from re import Pattern
201
ESCAPED_NL: str
202
"""Constant for escaped newline token type."""
203

204
UNIMPORTANT_WS: str  
205
"""Constant for unimportant whitespace token type."""
206

207
NON_CODING_TOKENS: frozenset[str]
208
"""
209
Set of token names that don't affect control flow or code:
210
{'COMMENT', ESCAPED_NL, 'NL', UNIMPORTANT_WS}
211
"""
212

213
NAMED_UNICODE_RE: Pattern[str]
214
"""Regular expression pattern for matching named unicode escapes."""
215
```
216

217
### Internal Utilities
218

219
Internal helper functions that are exposed and may be useful for advanced use cases.
220

221
```python { .api }
222
def _re_partition(regex: Pattern[str], s: str) -> tuple[str, str, str]:
223
    """
224
    Partition a string based on regex match (internal helper function).
225
    
226
    Args:
227
        regex (Pattern[str]): Compiled regular expression pattern
228
        s (str): String to partition
229
        
230
    Returns:
231
        tuple[str, str, str]: (before_match, match, after_match) or (s, '', '') if no match
232
    """
233
```
234

235
### Command Line Interface
236

237
Command-line tool for tokenizing Python files and inspecting token sequences.
238

239
```python { .api }
240
def main(argv: Sequence[str] | None = None) -> int:
241
    """
242
    Command-line interface that tokenizes a file and prints tokens with positions.
243
    
244
    Args:
245
        argv (Sequence[str] | None): Command line arguments, or None for sys.argv
246
        
247
    Returns:
248
        int: Exit code (0 for success)
249
    """
250
```
251

252
## Advanced Usage Examples
253

254
### Token Filtering and Analysis
255

256
```python
257
from tokenize_rt import src_to_tokens, NON_CODING_TOKENS
258

259
source = '''
260
# This is a comment
261
def func():  # Another comment
262
    pass
263
'''
264

265
tokens = src_to_tokens(source)
266

267
# Filter out non-coding tokens
268
code_tokens = [t for t in tokens if t.name not in NON_CODING_TOKENS]
269
print("Code-only tokens:", [t.src for t in code_tokens])
270

271
# Find all comments
272
comments = [t for t in tokens if t.name == 'COMMENT']
273
print("Comments found:", [t.src for t in comments])
274
```
275

276
### String Literal Processing
277

278
```python
279
from tokenize_rt import src_to_tokens, parse_string_literal, rfind_string_parts
280

281
# Parse string prefixes
282
prefix, string_part = parse_string_literal('f"Hello {name}!"')
283
print(f"Prefix: {prefix!r}, String: {string_part!r}")
284

285
# Find string parts in concatenated strings
286
source = '"first" "second" "third"'
287
tokens = src_to_tokens(source)
288

289
# Find the string literal at the end
290
string_indices = rfind_string_parts(tokens, len(tokens) - 1)
291
print("String part indices:", string_indices)
292
```
293

294
### Token Modification for Refactoring
295

296
```python
297
from tokenize_rt import src_to_tokens, tokens_to_src, Token
298

299
source = 'old_name = 42'
300
tokens = src_to_tokens(source)
301

302
# Replace 'old_name' with 'new_name'
303
modified_tokens = []
304
for token in tokens:
305
    if token.matches(name='NAME', src='old_name'):
306
        # Create new token with same position but different source
307
        modified_tokens.append(Token(
308
            name=token.name,
309
            src='new_name',
310
            line=token.line,
311
            utf8_byte_offset=token.utf8_byte_offset
312
        ))
313
    else:
314
        modified_tokens.append(token)
315

316
result = tokens_to_src(modified_tokens)
317
print(result)  # 'new_name = 42'
318
```

Version

Tile

Files

index.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

index.mddocs/