or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

advanced-features.mdcore-matching.mdindex.mdoptions-configuration.mdpattern-compilation.mdtext-processing.md
tile.json

tessl/pypi-google-re2

RE2 - A regular expression library with linear time guarantees

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/google-re2@1.1.x

To install, run

npx @tessl/cli install tessl/pypi-google-re2@1.1.0

index.mddocs/

Google RE2

A fast, safe, thread-friendly regular expression library with guaranteed linear time complexity. Unlike traditional regex engines that use backtracking (which can lead to exponential time complexity), RE2 compiles regular expressions into deterministic finite automata, ensuring predictable performance even with malicious input. This makes it ideal for production environments where regex patterns come from untrusted sources.

Package Information

  • Package Name: google-re2
  • Language: Python (C++ library with Python bindings)
  • Installation: pip install google-re2
  • Version: 1.1.20250805
  • License: BSD-3-Clause

Core Imports

import re2

Individual functions can be imported:

from re2 import compile, search, match, fullmatch, findall, split, sub

Basic Usage

import re2

# Basic pattern matching
pattern = r'\d{3}-\d{2}-\d{4}'
text = "My SSN is 123-45-6789"

# Search for pattern
match_obj = re2.search(pattern, text)
if match_obj:
    print(f"Found: {match_obj.group()}")  # "123-45-6789"

# Compile pattern for reuse (more efficient)
compiled_pattern = re2.compile(r'(\w+)@(\w+\.\w+)')
email_text = "Contact john@example.com for info"
match = compiled_pattern.search(email_text)
if match:
    username, domain = match.groups()
    print(f"User: {username}, Domain: {domain}")  # "User: john, Domain: example.com"

# Replace patterns
result = re2.sub(r'\d+', 'X', "Phone: 555-1234")
print(result)  # "Phone: XXX-XXXX"

Architecture

RE2 provides two main interfaces:

  • Python Module Interface: Drop-in replacement for Python's re module with familiar functions like search, match, findall, sub
  • Compiled Pattern Objects: Pre-compiled patterns for better performance in repeated operations
  • Advanced Features: Pattern sets for multi-pattern matching and filtered matching for high-performance scenarios

The library prioritizes safety and reliability over performance in pathological cases, making it secure for handling untrusted input while maintaining excellent performance for normal use cases.

Capabilities

Core Pattern Matching

Essential pattern matching functions that provide the primary interface for regular expressions. These functions support searching, matching, and extracting subpatterns from text.

def search(pattern, text, options=None): ...
def match(pattern, text, options=None): ...
def fullmatch(pattern, text, options=None): ...
def findall(pattern, text, options=None): ...
def finditer(pattern, text, options=None): ...

Core Pattern Matching

Text Processing

Functions for splitting text and performing substitutions using regular expressions. These operations are fundamental for text processing and data cleaning tasks.

def split(pattern, text, maxsplit=0, options=None): ...
def sub(pattern, repl, text, count=0, options=None): ...
def subn(pattern, repl, text, count=0, options=None): ...

Text Processing

Pattern Compilation

Pre-compilation of regular expressions for improved performance when patterns are used repeatedly. Compiled patterns provide access to advanced features and optimization options.

def compile(pattern, options=None): ...

class _Regexp:
    def search(text, pos=None, endpos=None): ...
    def match(text, pos=None, endpos=None): ...
    def fullmatch(text, pos=None, endpos=None): ...
    # ... additional methods

Pattern Compilation

Options and Configuration

Configuration options that control how RE2 processes regular expressions, including encoding, syntax modes, memory limits, and performance tuning.

class Options:
    max_mem: int
    encoding: Options.Encoding
    posix_syntax: bool
    longest_match: bool
    case_sensitive: bool
    # ... additional options

Options and Configuration

Advanced Features

Specialized functionality for high-performance scenarios including pattern sets for matching multiple patterns simultaneously and filtered matching for optimized multi-pattern operations.

class Set:
    def Add(pattern): ...
    def Compile(): ...
    def Match(text): ...

class Filter:
    def Add(pattern, options=None): ...
    def Compile(): ...
    def Match(text, potential=False): ...

Advanced Features

Common Types

class Options:
    """Configuration options for RE2 compilation and matching."""
    
    class Encoding:
        UTF8: int
        LATIN1: int
    
    def __init__(self):
        self.max_mem: int = 8388608  # 8MiB default
        self.encoding: Options.Encoding = Options.Encoding.UTF8
        self.posix_syntax: bool = False
        self.longest_match: bool = False
        self.log_errors: bool = True
        self.literal: bool = False
        self.never_nl: bool = False
        self.dot_nl: bool = False
        self.never_capture: bool = False
        self.case_sensitive: bool = True
        self.perl_classes: bool = True
        self.word_boundary: bool = True
        self.one_line: bool = False

class error(Exception):
    """Exception raised for RE2 compilation and matching errors."""
    pass