RE2 - A regular expression library with linear time guarantees
npx @tessl/cli install tessl/pypi-google-re2@1.1.0A fast, safe, thread-friendly regular expression library with guaranteed linear time complexity. Unlike traditional regex engines that use backtracking (which can lead to exponential time complexity), RE2 compiles regular expressions into deterministic finite automata, ensuring predictable performance even with malicious input. This makes it ideal for production environments where regex patterns come from untrusted sources.
pip install google-re2import re2Individual functions can be imported:
from re2 import compile, search, match, fullmatch, findall, split, subimport re2
# Basic pattern matching
pattern = r'\d{3}-\d{2}-\d{4}'
text = "My SSN is 123-45-6789"
# Search for pattern
match_obj = re2.search(pattern, text)
if match_obj:
print(f"Found: {match_obj.group()}") # "123-45-6789"
# Compile pattern for reuse (more efficient)
compiled_pattern = re2.compile(r'(\w+)@(\w+\.\w+)')
email_text = "Contact john@example.com for info"
match = compiled_pattern.search(email_text)
if match:
username, domain = match.groups()
print(f"User: {username}, Domain: {domain}") # "User: john, Domain: example.com"
# Replace patterns
result = re2.sub(r'\d+', 'X', "Phone: 555-1234")
print(result) # "Phone: XXX-XXXX"RE2 provides two main interfaces:
re module with familiar functions like search, match, findall, subThe library prioritizes safety and reliability over performance in pathological cases, making it secure for handling untrusted input while maintaining excellent performance for normal use cases.
Essential pattern matching functions that provide the primary interface for regular expressions. These functions support searching, matching, and extracting subpatterns from text.
def search(pattern, text, options=None): ...
def match(pattern, text, options=None): ...
def fullmatch(pattern, text, options=None): ...
def findall(pattern, text, options=None): ...
def finditer(pattern, text, options=None): ...Functions for splitting text and performing substitutions using regular expressions. These operations are fundamental for text processing and data cleaning tasks.
def split(pattern, text, maxsplit=0, options=None): ...
def sub(pattern, repl, text, count=0, options=None): ...
def subn(pattern, repl, text, count=0, options=None): ...Pre-compilation of regular expressions for improved performance when patterns are used repeatedly. Compiled patterns provide access to advanced features and optimization options.
def compile(pattern, options=None): ...
class _Regexp:
def search(text, pos=None, endpos=None): ...
def match(text, pos=None, endpos=None): ...
def fullmatch(text, pos=None, endpos=None): ...
# ... additional methodsConfiguration options that control how RE2 processes regular expressions, including encoding, syntax modes, memory limits, and performance tuning.
class Options:
max_mem: int
encoding: Options.Encoding
posix_syntax: bool
longest_match: bool
case_sensitive: bool
# ... additional optionsSpecialized functionality for high-performance scenarios including pattern sets for matching multiple patterns simultaneously and filtered matching for optimized multi-pattern operations.
class Set:
def Add(pattern): ...
def Compile(): ...
def Match(text): ...
class Filter:
def Add(pattern, options=None): ...
def Compile(): ...
def Match(text, potential=False): ...class Options:
"""Configuration options for RE2 compilation and matching."""
class Encoding:
UTF8: int
LATIN1: int
def __init__(self):
self.max_mem: int = 8388608 # 8MiB default
self.encoding: Options.Encoding = Options.Encoding.UTF8
self.posix_syntax: bool = False
self.longest_match: bool = False
self.log_errors: bool = True
self.literal: bool = False
self.never_nl: bool = False
self.dot_nl: bool = False
self.never_capture: bool = False
self.case_sensitive: bool = True
self.perl_classes: bool = True
self.word_boundary: bool = True
self.one_line: bool = False
class error(Exception):
"""Exception raised for RE2 compilation and matching errors."""
pass