0
# Core Transformation Engine
1
2
Core functionality for applying syntax transformations through plugin and token systems. The engine operates in two phases: plugin-based AST transformations followed by token-level fixes.
3
4
## Capabilities
5
6
### Plugin-Based Transformations
7
8
Apply all registered plugin transformations to source code through AST analysis.
9
10
```python { .api }
11
def _fix_plugins(contents_text: str, settings: Settings) -> str:
12
"""
13
Apply all plugin-based AST transformations to source code.
14
15
Args:
16
contents_text: Python source code to transform
17
settings: Configuration settings for transformations
18
19
Returns:
20
Transformed source code with plugin fixes applied
21
22
Notes:
23
- Returns original code if syntax errors occur
24
- Applies token fixup for DEDENT/UNIMPORTANT_WS ordering
25
- Processes callbacks in reverse token order for correct offsets
26
"""
27
```
28
29
### Token-Level Transformations
30
31
Apply token-level transformations for string literals, parentheses, and format strings.
32
33
```python { .api }
34
def _fix_tokens(contents_text: str) -> str:
35
"""
36
Apply token-level transformations to source code.
37
38
Args:
39
contents_text: Python source code to transform
40
41
Returns:
42
Transformed source code with token fixes applied
43
44
Transformations:
45
- Fix escape sequences in string literals
46
- Remove 'u' prefix from Unicode strings
47
- Remove extraneous parentheses
48
- Simplify format string literals
49
- Convert string.encode() to binary literals
50
- Remove encoding cookies from file headers
51
"""
52
```
53
54
### Utility Functions
55
56
Core utility functions used throughout the transformation process.
57
58
```python { .api }
59
def inty(s: str) -> bool:
60
"""
61
Check if string represents an integer.
62
63
Args:
64
s: String to check
65
66
Returns:
67
True if string can be converted to int, False otherwise
68
69
Notes:
70
Uses try/except to handle ValueError and TypeError gracefully
71
"""
72
```
73
74
### Configuration Settings
75
76
Configuration object controlling transformation behavior.
77
78
```python { .api }
79
class Settings(NamedTuple):
80
"""
81
Configuration settings for pyupgrade transformations.
82
83
Attributes:
84
min_version: Minimum Python version tuple (e.g., (3, 10))
85
keep_percent_format: Preserve %-style format strings
86
keep_mock: Preserve mock imports instead of unittest.mock
87
keep_runtime_typing: Preserve typing imports at runtime
88
"""
89
min_version: Version = (3,)
90
keep_percent_format: bool = False
91
keep_mock: bool = False
92
keep_runtime_typing: bool = False
93
```
94
95
### Token Ordering Fix
96
97
Fix misordered DEDENT and UNIMPORTANT_WS tokens from tokenize-rt.
98
99
```python { .api }
100
def _fixup_dedent_tokens(tokens: list[Token]) -> None:
101
"""
102
Fix misordered DEDENT/UNIMPORTANT_WS tokens.
103
104
Args:
105
tokens: Token list to fix in-place
106
107
Notes:
108
Addresses tokenize-rt issue where DEDENT and UNIMPORTANT_WS
109
tokens appear in wrong order in certain indentation patterns.
110
"""
111
```
112
113
## String Literal Processing
114
115
### Escape Sequence Constants
116
117
Constants used for validating and processing escape sequences in string literals.
118
119
```python { .api }
120
ESCAPE_STARTS: frozenset[str]
121
"""
122
Valid escape sequence starting characters.
123
124
Contains:
125
- Newline characters: '\n', '\r'
126
- Quote characters: '\\', "'", '"'
127
- Named escapes: 'a', 'b', 'f', 'n', 'r', 't', 'v'
128
- Octal digits: '0'-'7'
129
- Hex escape: 'x'
130
"""
131
132
ESCAPE_RE: re.Pattern[str]
133
"""Regex pattern for matching escape sequences ('\\.', DOTALL)."""
134
135
NAMED_ESCAPE_NAME: re.Pattern[str]
136
"""Regex pattern for matching named Unicode escapes ('{[^}]+}')."""
137
```
138
139
### Escape Sequence Fixes
140
141
Fix invalid escape sequences in string literals.
142
143
```python { .api }
144
def _fix_escape_sequences(token: Token) -> Token:
145
"""
146
Fix invalid escape sequences in string token.
147
148
Args:
149
token: String token to process
150
151
Returns:
152
Token with fixed escape sequences
153
154
Logic:
155
- Skips raw strings and strings without backslashes
156
- Validates escape sequences against Python standards
157
- Adds raw prefix if only invalid escapes found
158
- Escapes invalid sequences if valid ones also present
159
"""
160
```
161
162
### Unicode Prefix Removal
163
164
Remove unnecessary 'u' prefixes from Unicode string literals.
165
166
```python { .api }
167
def _remove_u_prefix(token: Token) -> Token:
168
"""
169
Remove 'u' prefix from Unicode string literals.
170
171
Args:
172
token: String token to process
173
174
Returns:
175
Token with 'u'/'U' prefixes removed
176
"""
177
```
178
179
## Parentheses and Format Processing
180
181
### Extraneous Parentheses Removal
182
183
Remove unnecessary parentheses around expressions.
184
185
```python { .api }
186
def _fix_extraneous_parens(tokens: list[Token], i: int) -> None:
187
"""
188
Remove extraneous parentheses around expressions.
189
190
Args:
191
tokens: Token list to modify in-place
192
i: Index of opening parenthesis token
193
194
Notes:
195
- Preserves tuple syntax (checks for commas)
196
- Preserves generator expressions (checks for yield)
197
- Only removes truly redundant parentheses
198
"""
199
```
200
201
### Format String Simplification
202
203
Simplify format string literals by removing redundant format keys.
204
205
```python { .api }
206
def _fix_format_literal(tokens: list[Token], end: int) -> None:
207
"""
208
Simplify format string literals.
209
210
Args:
211
tokens: Token list to modify in-place
212
end: Index of format method call
213
214
Logic:
215
- Removes positional format keys (0, 1, 2, ...)
216
- Only processes sequential numeric keys
217
- Skips f-strings and malformed format strings
218
"""
219
```
220
221
### String Encoding to Binary
222
223
Convert string.encode() calls to binary string literals.
224
225
```python { .api }
226
def _fix_encode_to_binary(tokens: list[Token], i: int) -> None:
227
"""
228
Convert string.encode() to binary literals.
229
230
Args:
231
tokens: Token list to modify in-place
232
i: Index of 'encode' token
233
234
Supported encodings:
235
- ASCII, UTF-8: Full conversion
236
- ISO-8859-1: Latin-1 compatible conversion
237
- Skips non-ASCII or complex escape sequences
238
"""
239
```
240
241
## Usage Examples
242
243
### Basic Transformation
244
245
```python
246
from pyupgrade._main import _fix_plugins, _fix_tokens
247
from pyupgrade._data import Settings
248
249
# Apply both transformation phases
250
source = "set([1, 2, 3])"
251
settings = Settings(min_version=(3, 8))
252
253
# Phase 1: Plugin transformations
254
transformed = _fix_plugins(source, settings)
255
# Result: "{1, 2, 3}"
256
257
# Phase 2: Token transformations
258
final = _fix_tokens(transformed)
259
```
260
261
### Custom Settings
262
263
```python
264
# Configure for Python 3.10+ with format preservation
265
settings = Settings(
266
min_version=(3, 10),
267
keep_percent_format=True,
268
keep_mock=True,
269
keep_runtime_typing=False
270
)
271
272
transformed = _fix_plugins(source_code, settings)
273
```