0
# Pattern Matching Functions
1
2
Core functions for finding patterns in text with enhanced capabilities beyond the standard `re` module. These functions support advanced features like partial matching, concurrent execution, timeout handling, and position control for precise pattern matching operations.
3
4
## Capabilities
5
6
### Match at Start
7
8
Attempts to match a pattern at the beginning of a string, providing precise control over matching behavior through various parameters.
9
10
```python { .api }
11
def match(pattern, string, flags=0, pos=None, endpos=None, partial=False,
12
concurrent=None, timeout=None, ignore_unused=False, **kwargs):
13
"""
14
Try to apply the pattern at the start of the string, returning a Match object or None.
15
16
Args:
17
pattern (str): Regular expression pattern to match
18
string (str): String to search in
19
flags (int, optional): Regex flags to modify matching behavior
20
pos (int, optional): Start position for matching (default: 0)
21
endpos (int, optional): End position for matching (default: len(string))
22
partial (bool, optional): Allow partial matches at end of string
23
concurrent (bool, optional): Release GIL during matching for multithreading
24
timeout (float, optional): Timeout in seconds for matching operation
25
ignore_unused (bool, optional): Ignore unused keyword arguments
26
**kwargs: Additional pattern compilation arguments
27
28
Returns:
29
Match object if pattern matches at start, None otherwise
30
"""
31
```
32
33
**Usage Examples:**
34
35
```python
36
import regex
37
38
# Basic matching at start
39
result = regex.match(r'\d+', '123abc')
40
print(result.group()) # '123'
41
42
# Position control
43
result = regex.match(r'abc', 'xxabcyy', pos=2, endpos=5)
44
print(result.group()) # 'abc'
45
46
# Partial matching at string end
47
result = regex.match(r'hello world', 'hello wor', partial=True)
48
print(result.group()) # 'hello wor' (partial match)
49
50
# Timeout for complex patterns
51
import time
52
result = regex.match(r'(a+)+b', 'a' * 20, timeout=0.1) # May timeout
53
```
54
55
### Full String Match
56
57
Matches a pattern against the entire string, ensuring the pattern covers the complete input text.
58
59
```python { .api }
60
def fullmatch(pattern, string, flags=0, pos=None, endpos=None, partial=False,
61
concurrent=None, timeout=None, ignore_unused=False, **kwargs):
62
"""
63
Try to apply the pattern against all of the string, returning a Match object or None.
64
65
Args:
66
pattern (str): Regular expression pattern to match
67
string (str): String to match completely
68
flags (int, optional): Regex flags to modify matching behavior
69
pos (int, optional): Start position for matching (default: 0)
70
endpos (int, optional): End position for matching (default: len(string))
71
partial (bool, optional): Allow partial matches at end of string
72
concurrent (bool, optional): Release GIL during matching for multithreading
73
timeout (float, optional): Timeout in seconds for matching operation
74
ignore_unused (bool, optional): Ignore unused keyword arguments
75
**kwargs: Additional pattern compilation arguments
76
77
Returns:
78
Match object if pattern matches entire string, None otherwise
79
"""
80
```
81
82
**Usage Examples:**
83
84
```python
85
import regex
86
87
# Complete string matching
88
result = regex.fullmatch(r'\d{3}-\d{2}-\d{4}', '123-45-6789')
89
print(result.group()) # '123-45-6789'
90
91
# Fails on partial match
92
result = regex.fullmatch(r'\d+', '123abc')
93
print(result) # None (doesn't match entire string)
94
95
# With position bounds
96
result = regex.fullmatch(r'abc', 'xxabcyy', pos=2, endpos=5)
97
print(result.group()) # 'abc'
98
```
99
100
### Search Through String
101
102
Searches through a string looking for the first location where a pattern matches, providing the most commonly used pattern matching function.
103
104
```python { .api }
105
def search(pattern, string, flags=0, pos=None, endpos=None, partial=False,
106
concurrent=None, timeout=None, ignore_unused=False, **kwargs):
107
"""
108
Search through string looking for a match to the pattern, returning a Match object or None.
109
110
Args:
111
pattern (str): Regular expression pattern to search for
112
string (str): String to search in
113
flags (int, optional): Regex flags to modify matching behavior
114
pos (int, optional): Start position for searching (default: 0)
115
endpos (int, optional): End position for searching (default: len(string))
116
partial (bool, optional): Allow partial matches at end of string
117
concurrent (bool, optional): Release GIL during matching for multithreading
118
timeout (float, optional): Timeout in seconds for matching operation
119
ignore_unused (bool, optional): Ignore unused keyword arguments
120
**kwargs: Additional pattern compilation arguments
121
122
Returns:
123
Match object for first match found, None if no match
124
"""
125
```
126
127
**Usage Examples:**
128
129
```python
130
import regex
131
132
# Basic search
133
result = regex.search(r'\d+', 'abc123def')
134
print(result.group()) # '123'
135
print(result.span()) # (3, 6)
136
137
# Search with position bounds
138
result = regex.search(r'\w+', 'hello world test', pos=6, endpos=11)
139
print(result.group()) # 'world'
140
141
# Fuzzy search with error tolerance
142
result = regex.search(r'(?e)(hello){i<=1,d<=1,s<=1}', 'helo world')
143
print(result.group()) # 'helo' (found with 1 deletion error)
144
145
# Case-insensitive search
146
result = regex.search(r'python', 'I love PYTHON!', regex.IGNORECASE)
147
print(result.group()) # 'PYTHON'
148
```
149
150
### Find All Matches
151
152
Returns all non-overlapping matches of a pattern in a string as a list, with options for overlapping matches and position control.
153
154
```python { .api }
155
def findall(pattern, string, flags=0, pos=None, endpos=None, overlapped=False,
156
concurrent=None, timeout=None, ignore_unused=False, **kwargs):
157
"""
158
Return a list of all matches in the string.
159
160
Args:
161
pattern (str): Regular expression pattern to find
162
string (str): String to search in
163
flags (int, optional): Regex flags to modify matching behavior
164
pos (int, optional): Start position for searching (default: 0)
165
endpos (int, optional): End position for searching (default: len(string))
166
overlapped (bool, optional): Find overlapping matches
167
concurrent (bool, optional): Release GIL during matching for multithreading
168
timeout (float, optional): Timeout in seconds for matching operation
169
ignore_unused (bool, optional): Ignore unused keyword arguments
170
**kwargs: Additional pattern compilation arguments
171
172
Returns:
173
List of matched strings or tuples (for patterns with groups)
174
"""
175
```
176
177
**Usage Examples:**
178
179
```python
180
import regex
181
182
# Find all numbers
183
numbers = regex.findall(r'\d+', 'Price: $123, Quantity: 45, Total: $5535')
184
print(numbers) # ['123', '45', '5535']
185
186
# Find all email addresses
187
emails = regex.findall(r'\b\w+@\w+\.\w+\b', 'Contact: user@example.com or admin@site.org')
188
print(emails) # ['user@example.com', 'admin@site.org']
189
190
# Find with groups
191
matches = regex.findall(r'(\w+):(\d+)', 'port:80, secure:443, admin:8080')
192
print(matches) # [('port', '80'), ('secure', '443'), ('admin', '8080')]
193
194
# Overlapping matches
195
matches = regex.findall(r'\w\w', 'abcdef', overlapped=True)
196
print(matches) # ['ab', 'bc', 'cd', 'de', 'ef']
197
198
# Non-overlapping (default)
199
matches = regex.findall(r'\w\w', 'abcdef')
200
print(matches) # ['ab', 'cd', 'ef']
201
```
202
203
### Find All Matches Iterator
204
205
Returns an iterator over all matches, providing memory-efficient processing for large texts or when you need Match objects with full details.
206
207
```python { .api }
208
def finditer(pattern, string, flags=0, pos=None, endpos=None, overlapped=False,
209
partial=False, concurrent=None, timeout=None, ignore_unused=False, **kwargs):
210
"""
211
Return an iterator over all matches in the string.
212
213
Args:
214
pattern (str): Regular expression pattern to find
215
string (str): String to search in
216
flags (int, optional): Regex flags to modify matching behavior
217
pos (int, optional): Start position for searching (default: 0)
218
endpos (int, optional): End position for searching (default: len(string))
219
overlapped (bool, optional): Find overlapping matches
220
partial (bool, optional): Allow partial matches at end of string
221
concurrent (bool, optional): Release GIL during matching for multithreading
222
timeout (float, optional): Timeout in seconds for matching operation
223
ignore_unused (bool, optional): Ignore unused keyword arguments
224
**kwargs: Additional pattern compilation arguments
225
226
Returns:
227
Iterator yielding Match objects
228
"""
229
```
230
231
**Usage Examples:**
232
233
```python
234
import regex
235
236
# Iterator over matches with full match info
237
text = 'Word1: 123, Word2: 456, Word3: 789'
238
for match in regex.finditer(r'(\w+): (\d+)', text):
239
word, number = match.groups()
240
start, end = match.span()
241
print(f"Found '{word}: {number}' at positions {start}-{end}")
242
243
# Memory-efficient processing of large text
244
def process_large_text(text):
245
word_count = 0
246
for match in regex.finditer(r'\b\w+\b', text):
247
word_count += 1
248
# Process one match at a time without storing all matches
249
return word_count
250
251
# Overlapping matches with iterator
252
text = 'aaaa'
253
for match in regex.finditer(r'aa', text, overlapped=True):
254
print(f"Match: '{match.group()}' at {match.span()}")
255
# Output: Match: 'aa' at (0, 2)
256
# Match: 'aa' at (1, 3)
257
# Match: 'aa' at (2, 4)
258
```
259
260
## Advanced Pattern Features
261
262
### Fuzzy Matching
263
264
The regex module supports fuzzy (approximate) matching with configurable error limits:
265
266
```python
267
# Basic fuzzy matching - allow up to 2 errors of any type
268
pattern = r'(?e)(python){e<=2}'
269
result = regex.search(pattern, 'pyhton is great') # Matches with 1 substitution
270
271
# Specific error types - insertions, deletions, substitutions
272
pattern = r'(?e)(hello){i<=1,d<=1,s<=1}' # Allow 1 of each error type
273
result = regex.search(pattern, 'helo') # Matches with 1 deletion
274
275
# Best match mode - find the best match instead of first
276
pattern = r'(?be)(test){e<=2}'
277
result = regex.search(pattern, 'testing text best') # Finds 'test' (best match)
278
```
279
280
### Version Control
281
282
```python
283
# Version 0 (legacy re-compatible behavior)
284
result = regex.search(r'(?V0)pattern', text)
285
286
# Version 1 (enhanced behavior with full case-folding)
287
result = regex.search(r'(?V1)pattern', text, regex.IGNORECASE)
288
```
289
290
### Concurrent Execution
291
292
```python
293
# Enable concurrent execution for long-running matches
294
result = regex.search(complex_pattern, large_text, concurrent=True)
295
296
# Set timeout to prevent runaway regex
297
result = regex.search(potentially_slow_pattern, text, timeout=5.0)
298
```