0
# regex
1
2
An advanced regular expression library that serves as a backwards-compatible replacement for Python's standard `re` module while offering significantly enhanced functionality. The regex library provides full Unicode 16.0.0 support, fuzzy matching capabilities, advanced flags for fine-grained pattern control, and multithreading support with GIL release during matching operations.
3
4
## Package Information
5
6
- **Package Name**: regex
7
- **Language**: Python
8
- **Installation**: `pip install regex`
9
- **Version**: 2025.9.1
10
11
## Core Imports
12
13
```python
14
import regex
15
```
16
17
Common usage pattern:
18
19
```python
20
import regex as re # Drop-in replacement for standard re module
21
```
22
23
Specific imports:
24
25
```python
26
from regex import match, search, sub, findall, compile
27
from regex import IGNORECASE, MULTILINE, DOTALL, VERBOSE
28
from regex import BESTMATCH, ENHANCEMATCH, FULLCASE
29
```
30
31
## Basic Usage
32
33
```python
34
import regex
35
36
# Basic pattern matching
37
pattern = r'\b\w+@\w+\.\w+\b'
38
text = "Contact us at support@example.com or sales@company.org"
39
matches = regex.findall(pattern, text)
40
print(matches) # ['support@example.com', 'sales@company.org']
41
42
# Case-insensitive matching with enhanced flags
43
result = regex.search(r'hello', 'Hello World', regex.IGNORECASE)
44
if result:
45
print(f"Found: {result.group()}") # Found: Hello
46
47
# Fuzzy matching for approximate matches
48
pattern = r'(?e)(hello){i<=1,d<=1,s<=1}' # Allow 1 insertion, deletion, substitution
49
result = regex.search(pattern, 'helo world') # Matches with 1 deletion
50
if result:
51
print(f"Fuzzy match: {result.group()}") # Fuzzy match: helo
52
53
# Pattern compilation for reuse
54
compiled = regex.compile(r'\d{4}-\d{2}-\d{2}', regex.VERBOSE)
55
dates = compiled.findall('Dates: 2023-12-25 and 2024-01-01')
56
print(dates) # ['2023-12-25', '2024-01-01']
57
```
58
59
## Architecture
60
61
The regex module extends Python's regular expression capabilities through several key components:
62
63
- **Enhanced Pattern Engine**: Provides backwards compatibility with `re` while adding advanced features
64
- **Fuzzy Matching System**: Supports approximate matching with configurable error limits
65
- **Unicode Support**: Full Unicode 16.0.0 support with proper case-folding
66
- **Flag System**: Scoped and global flags for fine-grained pattern control
67
- **Multithreading**: GIL release during matching operations for better performance
68
69
The module supports both VERSION0 (legacy `re`-compatible) and VERSION1 (enhanced) behaviors, allowing gradual migration while maintaining compatibility.
70
71
## Capabilities
72
73
### Pattern Matching Functions
74
75
Core functions for finding patterns in text including match, search, findall, and finditer with enhanced parameters for position control, partial matching, concurrency, and timeout handling.
76
77
```python { .api }
78
def match(pattern, string, flags=0, pos=None, endpos=None, partial=False,
79
concurrent=None, timeout=None, ignore_unused=False, **kwargs):
80
"""Try to apply pattern at start of string, returning Match object or None"""
81
82
def search(pattern, string, flags=0, pos=None, endpos=None, partial=False,
83
concurrent=None, timeout=None, ignore_unused=False, **kwargs):
84
"""Search through string for pattern match, returning Match object or None"""
85
86
def findall(pattern, string, flags=0, pos=None, endpos=None, overlapped=False,
87
concurrent=None, timeout=None, ignore_unused=False, **kwargs):
88
"""Return list of all matches in string"""
89
90
def finditer(pattern, string, flags=0, pos=None, endpos=None, overlapped=False,
91
partial=False, concurrent=None, timeout=None, ignore_unused=False, **kwargs):
92
"""Return iterator over all matches in string"""
93
94
def fullmatch(pattern, string, flags=0, pos=None, endpos=None, partial=False,
95
concurrent=None, timeout=None, ignore_unused=False, **kwargs):
96
"""Try to apply pattern against all of string, returning Match object or None"""
97
```
98
99
[Pattern Matching](./pattern-matching.md)
100
101
### String Substitution Functions
102
103
Advanced string replacement capabilities including standard substitution, format-based replacement, and variants that return substitution counts. Supports concurrent execution and timeout handling.
104
105
```python { .api }
106
def sub(pattern, repl, string, count=0, flags=0, pos=None, endpos=None,
107
concurrent=None, timeout=None, ignore_unused=False, **kwargs):
108
"""Replace pattern occurrences with replacement string"""
109
110
def subf(pattern, format, string, count=0, flags=0, pos=None, endpos=None,
111
concurrent=None, timeout=None, ignore_unused=False, **kwargs):
112
"""Replace pattern occurrences using format string"""
113
114
def subn(pattern, repl, string, count=0, flags=0, pos=None, endpos=None,
115
concurrent=None, timeout=None, ignore_unused=False, **kwargs):
116
"""Return (new_string, number_of_substitutions_made) tuple"""
117
118
def subfn(pattern, format, string, count=0, flags=0, pos=None, endpos=None,
119
concurrent=None, timeout=None, ignore_unused=False, **kwargs):
120
"""Return (new_string, number_of_substitutions_made) tuple using format string"""
121
```
122
123
[String Substitution](./substitution.md)
124
125
### String Splitting Functions
126
127
Pattern-based string splitting with support for maximum splits, concurrent execution, and iterator-based processing for memory efficiency with large texts.
128
129
```python { .api }
130
def split(pattern, string, maxsplit=0, flags=0, concurrent=None,
131
timeout=None, ignore_unused=False, **kwargs):
132
"""Split string by pattern occurrences, returning list of substrings"""
133
134
def splititer(pattern, string, maxsplit=0, flags=0, concurrent=None,
135
timeout=None, ignore_unused=False, **kwargs):
136
"""Return iterator yielding split string parts"""
137
```
138
139
[String Splitting](./splitting.md)
140
141
### Pattern Compilation and Utilities
142
143
Pattern compilation, caching control, template support, and string escaping utilities for preparing literal strings for use in patterns.
144
145
```python { .api }
146
def compile(pattern, flags=0, ignore_unused=False, cache_pattern=None, **kwargs):
147
"""Compile regular expression pattern, returning Pattern object"""
148
149
def escape(pattern, special_only=True, literal_spaces=False):
150
"""Escape string for use as literal in pattern"""
151
152
def purge():
153
"""Clear the regular expression cache"""
154
155
def cache_all(value=True):
156
"""Set/get whether to cache all patterns"""
157
158
def template(pattern, flags=0):
159
"""Compile a template pattern, returning a Pattern object"""
160
```
161
162
[Compilation and Utilities](./compilation-utilities.md)
163
164
### Advanced Classes and Types
165
166
Pattern and Match objects providing compiled pattern functionality and match result access, plus Scanner for tokenization and RegexFlag enumeration for proper flag handling.
167
168
```python { .api }
169
class Pattern:
170
"""Compiled regular expression pattern object"""
171
def match(self, string, pos=None, endpos=None, concurrent=None, partial=False, timeout=None): ...
172
def search(self, string, pos=None, endpos=None, concurrent=None, partial=False, timeout=None): ...
173
# Additional methods: findall, finditer, sub, split, etc.
174
175
class Match:
176
"""Match object containing match information"""
177
def group(self, *groups): ...
178
def groups(self, default=None): ...
179
def groupdict(self, default=None): ...
180
def start(self, group=0): ...
181
def end(self, group=0): ...
182
def span(self, group=0): ...
183
184
class Scanner:
185
"""Tokenizing scanner using pattern-action pairs"""
186
def __init__(self, lexicon, flags=0): ...
187
def scan(self, string): ...
188
```
189
190
[Classes and Types](./classes-types.md)
191
192
### Flags and Constants
193
194
Comprehensive flag system including standard regex flags, enhanced flags for fuzzy matching and Unicode handling, version control flags, and global constants for controlling library behavior.
195
196
```python { .api }
197
# Standard flags
198
IGNORECASE = I = 0x2 # Case-insensitive matching
199
MULTILINE = M = 0x8 # Multi-line mode for ^ and $
200
DOTALL = S = 0x10 # Make . match any character including newline
201
VERBOSE = X = 0x40 # Verbose mode allowing comments
202
203
# Enhanced flags
204
BESTMATCH = B = 0x1000 # Find best fuzzy match instead of first
205
ENHANCEMATCH = E = 0x8000 # Improve fuzzy match fit after finding first
206
FULLCASE = F = 0x4000 # Full case-folding for Unicode
207
WORD = W = 0x800 # Unicode word boundaries and line breaks
208
209
# Version control
210
VERSION0 = V0 = 0x2000 # Legacy re-compatible behavior
211
VERSION1 = V1 = 0x100 # Enhanced behavior mode
212
DEFAULT_VERSION # Current default version setting
213
```
214
215
[Flags and Constants](./flags-constants.md)
216
217
## Types
218
219
```python { .api }
220
class error(Exception):
221
"""Exception raised for invalid regular expressions"""
222
msg: str # Unformatted error message
223
pattern: str # Regular expression pattern
224
pos: int # Position where compilation failed
225
lineno: int # Line number where compilation failed
226
colno: int # Column number where compilation failed
227
228
RegexFlag = enum.IntFlag # Enumeration of regex flags with proper combination support
229
```