0
# String Substitution Functions
1
2
Advanced string replacement capabilities that extend beyond the standard `re` module with enhanced parameters for position control, concurrent execution, timeout handling, and format-based replacements. These functions provide powerful text transformation capabilities for complex pattern-based string manipulation.
3
4
## Capabilities
5
6
### Basic String Substitution
7
8
Replace pattern occurrences in a string with a replacement string or callable function, providing precise control over the number of replacements and search boundaries.
9
10
```python { .api }
11
def sub(pattern, repl, string, count=0, flags=0, pos=None, endpos=None,
12
concurrent=None, timeout=None, ignore_unused=False, **kwargs):
13
"""
14
Return the string obtained by replacing non-overlapping occurrences of pattern with replacement.
15
16
Args:
17
pattern (str): Regular expression pattern to find
18
repl (str or callable): Replacement string or function
19
string (str): String to perform substitutions on
20
count (int, optional): Maximum number of replacements (0 = all)
21
flags (int, optional): Regex flags to modify matching behavior
22
pos (int, optional): Start position for searching (default: 0)
23
endpos (int, optional): End position for searching (default: len(string))
24
concurrent (bool, optional): Release GIL during matching for multithreading
25
timeout (float, optional): Timeout in seconds for matching operation
26
ignore_unused (bool, optional): Ignore unused keyword arguments
27
**kwargs: Additional pattern compilation arguments
28
29
Returns:
30
str: String with replacements made
31
"""
32
```
33
34
**Usage Examples:**
35
36
```python
37
import regex
38
39
# Basic substitution
40
result = regex.sub(r'\d+', 'X', 'Replace 123 and 456 with X')
41
print(result) # 'Replace X and X with X'
42
43
# Limited number of replacements
44
result = regex.sub(r'\d+', 'NUM', 'Values: 1, 2, 3, 4', count=2)
45
print(result) # 'Values: NUM, NUM, 3, 4'
46
47
# Using replacement function
48
def capitalize_match(match):
49
return match.group().upper()
50
51
result = regex.sub(r'\b\w+\b', capitalize_match, 'hello world')
52
print(result) # 'HELLO WORLD'
53
54
# Position-bounded substitution
55
result = regex.sub(r'\d', 'X', '12abc34def56', pos=2, endpos=8)
56
print(result) # '12aXcXXdef56'
57
58
# Using backreferences
59
result = regex.sub(r'(\w+) (\w+)', r'\2, \1', 'John Doe')
60
print(result) # 'Doe, John'
61
62
# Named group backreferences
63
result = regex.sub(r'(?P<first>\w+) (?P<last>\w+)', r'\g<last>, \g<first>', 'Jane Smith')
64
print(result) # 'Smith, Jane'
65
```
66
67
### Format-Based Substitution
68
69
Replace pattern occurrences using Python's format string syntax, providing more flexible and readable replacement patterns.
70
71
```python { .api }
72
def subf(pattern, format, string, count=0, flags=0, pos=None, endpos=None,
73
concurrent=None, timeout=None, ignore_unused=False, **kwargs):
74
"""
75
Return the string obtained by replacing pattern occurrences using format string.
76
77
Args:
78
pattern (str): Regular expression pattern to find
79
format (str or callable): Format string or function using Python format syntax
80
string (str): String to perform substitutions on
81
count (int, optional): Maximum number of replacements (0 = all)
82
flags (int, optional): Regex flags to modify matching behavior
83
pos (int, optional): Start position for searching (default: 0)
84
endpos (int, optional): End position for searching (default: len(string))
85
concurrent (bool, optional): Release GIL during matching for multithreading
86
timeout (float, optional): Timeout in seconds for matching operation
87
ignore_unused (bool, optional): Ignore unused keyword arguments
88
**kwargs: Additional pattern compilation arguments
89
90
Returns:
91
str: String with format-based replacements made
92
"""
93
```
94
95
**Usage Examples:**
96
97
```python
98
import regex
99
100
# Format string with positional arguments
101
result = regex.subf(r'(\w+) (\w+)', '{1}, {0}', 'John Doe')
102
print(result) # 'Doe, John'
103
104
# Format string with named groups
105
pattern = r'(?P<name>\w+): (?P<value>\d+)'
106
format_str = '{name} = {value}'
107
result = regex.subf(pattern, format_str, 'width: 100, height: 200')
108
print(result) # 'width = 100, height = 200'
109
110
# Format function for complex transformations
111
def format_currency(match):
112
amount = float(match.group('amount'))
113
return f'${amount:.2f}'
114
115
pattern = r'(?P<amount>\d+\.\d+)'
116
result = regex.subf(pattern, format_currency, 'Price: 19.9, Tax: 2.5')
117
print(result) # 'Price: $19.90, Tax: $2.50'
118
```
119
120
### Substitution with Count
121
122
Perform substitutions and return both the modified string and the number of substitutions made, useful for tracking replacement operations.
123
124
```python { .api }
125
def subn(pattern, repl, string, count=0, flags=0, pos=None, endpos=None,
126
concurrent=None, timeout=None, ignore_unused=False, **kwargs):
127
"""
128
Return a 2-tuple containing (new_string, number_of_substitutions_made).
129
130
Args:
131
pattern (str): Regular expression pattern to find
132
repl (str or callable): Replacement string or function
133
string (str): String to perform substitutions on
134
count (int, optional): Maximum number of replacements (0 = all)
135
flags (int, optional): Regex flags to modify matching behavior
136
pos (int, optional): Start position for searching (default: 0)
137
endpos (int, optional): End position for searching (default: len(string))
138
concurrent (bool, optional): Release GIL during matching for multithreading
139
timeout (float, optional): Timeout in seconds for matching operation
140
ignore_unused (bool, optional): Ignore unused keyword arguments
141
**kwargs: Additional pattern compilation arguments
142
143
Returns:
144
tuple: (modified_string, substitution_count)
145
"""
146
```
147
148
**Usage Examples:**
149
150
```python
151
import regex
152
153
# Basic substitution with count
154
result, count = regex.subn(r'\d+', 'NUM', 'Replace 123 and 456')
155
print(f"Result: '{result}', Replacements: {count}")
156
# Result: 'Replace NUM and NUM', Replacements: 2
157
158
# Limited replacements with count
159
result, count = regex.subn(r'\w+', 'WORD', 'one two three four', count=2)
160
print(f"Result: '{result}', Replacements: {count}")
161
# Result: 'WORD WORD three four', Replacements: 2
162
163
# Check if any replacements were made
164
original = 'No numbers here'
165
result, count = regex.subn(r'\d+', 'NUM', original)
166
if count == 0:
167
print("No substitutions were made")
168
else:
169
print(f"Made {count} substitutions: {result}")
170
```
171
172
### Format-Based Substitution with Count
173
174
Combine format-based replacement with substitution counting for complete replacement operation tracking.
175
176
```python { .api }
177
def subfn(pattern, format, string, count=0, flags=0, pos=None, endpos=None,
178
concurrent=None, timeout=None, ignore_unused=False, **kwargs):
179
"""
180
Same as subf but also return the number of substitutions made.
181
182
Args:
183
pattern (str): Regular expression pattern to find
184
format (str or callable): Format string or function using Python format syntax
185
string (str): String to perform substitutions on
186
count (int, optional): Maximum number of replacements (0 = all)
187
flags (int, optional): Regex flags to modify matching behavior
188
pos (int, optional): Start position for searching (default: 0)
189
endpos (int, optional): End position for searching (default: len(string))
190
concurrent (bool, optional): Release GIL during matching for multithreading
191
timeout (float, optional): Timeout in seconds for matching operation
192
ignore_unused (bool, optional): Ignore unused keyword arguments
193
**kwargs: Additional pattern compilation arguments
194
195
Returns:
196
tuple: (formatted_string, substitution_count)
197
"""
198
```
199
200
**Usage Examples:**
201
202
```python
203
import regex
204
205
# Format-based substitution with count
206
pattern = r'(?P<name>\w+): (?P<value>\d+)'
207
format_str = '{name}={value}'
208
result, count = regex.subfn(pattern, format_str, 'width: 100, height: 200')
209
print(f"Result: '{result}', Replacements: {count}")
210
# Result: 'width=100, height=200', Replacements: 2
211
212
# Track format replacements
213
def format_phone(match):
214
area = match.group(1)
215
number = match.group(2)
216
return f"({area}) {number[:3]}-{number[3:]}"
217
218
pattern = r'(\d{3})(\d{7})'
219
text = 'Call 5551234567 or 8009876543'
220
result, count = regex.subfn(pattern, format_phone, text)
221
print(f"Formatted {count} phone numbers: {result}")
222
# Formatted 2 phone numbers: Call (555) 123-4567 or (800) 987-6543
223
```
224
225
## Advanced Substitution Features
226
227
### Replacement Functions
228
229
Replacement functions receive a Match object and can perform complex transformations:
230
231
```python
232
def smart_replace(match):
233
value = match.group()
234
if value.isdigit():
235
return str(int(value) * 2) # Double numbers
236
else:
237
return value.upper() # Uppercase text
238
239
result = regex.sub(r'\w+', smart_replace, 'test 123 hello 456')
240
print(result) # 'TEST 246 HELLO 912'
241
```
242
243
### Conditional Replacements
244
245
Use Match object properties for conditional replacements:
246
247
```python
248
def conditional_replace(match):
249
word = match.group()
250
if len(word) > 4:
251
return word.upper()
252
else:
253
return word.lower()
254
255
result = regex.sub(r'\b\w+\b', conditional_replace, 'Hello World Test')
256
print(result) # 'hello WORLD test'
257
```
258
259
### Position-Aware Replacements
260
261
Access match position information in replacement functions:
262
263
```python
264
def position_replace(match):
265
start = match.start()
266
text = match.group()
267
return f"{text}@{start}"
268
269
result = regex.sub(r'\w+', position_replace, 'one two three')
270
print(result) # 'one@0 two@4 three@8'
271
```
272
273
### Reverse Pattern Substitution
274
275
Use the REVERSE flag for right-to-left pattern matching:
276
277
```python
278
# Replace from right to left
279
result = regex.sub(r'\d+', 'X', '123abc456def789', flags=regex.REVERSE, count=2)
280
print(result) # '123abc456defX' (replaces from right)
281
```
282
283
### Fuzzy Pattern Substitution
284
285
Combine fuzzy matching with substitutions:
286
287
```python
288
# Replace approximate matches
289
pattern = r'(?e)(hello){e<=1}' # Allow 1 error
290
result = regex.sub(pattern, 'hi', 'helo world, hallo there')
291
print(result) # 'hi world, hi there'
292
```
293
294
### Concurrent Substitution
295
296
Enable concurrent execution for performance with large texts:
297
298
```python
299
# Process large text with concurrent execution
300
large_text = "..." * 10000 # Large text
301
result = regex.sub(r'\w+', 'WORD', large_text, concurrent=True)
302
303
# Set timeout for potentially slow operations
304
try:
305
result = regex.sub(complex_pattern, replacement, text, timeout=5.0)
306
except regex.error as e:
307
print(f"Substitution timed out: {e}")
308
```