or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

classes-types.mdcompilation-utilities.mdflags-constants.mdindex.mdpattern-matching.mdsplitting.mdsubstitution.md

pattern-matching.mddocs/

0

# Pattern Matching Functions

1

2

Core functions for finding patterns in text with enhanced capabilities beyond the standard `re` module. These functions support advanced features like partial matching, concurrent execution, timeout handling, and position control for precise pattern matching operations.

3

4

## Capabilities

5

6

### Match at Start

7

8

Attempts to match a pattern at the beginning of a string, providing precise control over matching behavior through various parameters.

9

10

```python { .api }

11

def match(pattern, string, flags=0, pos=None, endpos=None, partial=False,

12

concurrent=None, timeout=None, ignore_unused=False, **kwargs):

13

"""

14

Try to apply the pattern at the start of the string, returning a Match object or None.

15

16

Args:

17

pattern (str): Regular expression pattern to match

18

string (str): String to search in

19

flags (int, optional): Regex flags to modify matching behavior

20

pos (int, optional): Start position for matching (default: 0)

21

endpos (int, optional): End position for matching (default: len(string))

22

partial (bool, optional): Allow partial matches at end of string

23

concurrent (bool, optional): Release GIL during matching for multithreading

24

timeout (float, optional): Timeout in seconds for matching operation

25

ignore_unused (bool, optional): Ignore unused keyword arguments

26

**kwargs: Additional pattern compilation arguments

27

28

Returns:

29

Match object if pattern matches at start, None otherwise

30

"""

31

```

32

33

**Usage Examples:**

34

35

```python

36

import regex

37

38

# Basic matching at start

39

result = regex.match(r'\d+', '123abc')

40

print(result.group()) # '123'

41

42

# Position control

43

result = regex.match(r'abc', 'xxabcyy', pos=2, endpos=5)

44

print(result.group()) # 'abc'

45

46

# Partial matching at string end

47

result = regex.match(r'hello world', 'hello wor', partial=True)

48

print(result.group()) # 'hello wor' (partial match)

49

50

# Timeout for complex patterns

51

import time

52

result = regex.match(r'(a+)+b', 'a' * 20, timeout=0.1) # May timeout

53

```

54

55

### Full String Match

56

57

Matches a pattern against the entire string, ensuring the pattern covers the complete input text.

58

59

```python { .api }

60

def fullmatch(pattern, string, flags=0, pos=None, endpos=None, partial=False,

61

concurrent=None, timeout=None, ignore_unused=False, **kwargs):

62

"""

63

Try to apply the pattern against all of the string, returning a Match object or None.

64

65

Args:

66

pattern (str): Regular expression pattern to match

67

string (str): String to match completely

68

flags (int, optional): Regex flags to modify matching behavior

69

pos (int, optional): Start position for matching (default: 0)

70

endpos (int, optional): End position for matching (default: len(string))

71

partial (bool, optional): Allow partial matches at end of string

72

concurrent (bool, optional): Release GIL during matching for multithreading

73

timeout (float, optional): Timeout in seconds for matching operation

74

ignore_unused (bool, optional): Ignore unused keyword arguments

75

**kwargs: Additional pattern compilation arguments

76

77

Returns:

78

Match object if pattern matches entire string, None otherwise

79

"""

80

```

81

82

**Usage Examples:**

83

84

```python

85

import regex

86

87

# Complete string matching

88

result = regex.fullmatch(r'\d{3}-\d{2}-\d{4}', '123-45-6789')

89

print(result.group()) # '123-45-6789'

90

91

# Fails on partial match

92

result = regex.fullmatch(r'\d+', '123abc')

93

print(result) # None (doesn't match entire string)

94

95

# With position bounds

96

result = regex.fullmatch(r'abc', 'xxabcyy', pos=2, endpos=5)

97

print(result.group()) # 'abc'

98

```

99

100

### Search Through String

101

102

Searches through a string looking for the first location where a pattern matches, providing the most commonly used pattern matching function.

103

104

```python { .api }

105

def search(pattern, string, flags=0, pos=None, endpos=None, partial=False,

106

concurrent=None, timeout=None, ignore_unused=False, **kwargs):

107

"""

108

Search through string looking for a match to the pattern, returning a Match object or None.

109

110

Args:

111

pattern (str): Regular expression pattern to search for

112

string (str): String to search in

113

flags (int, optional): Regex flags to modify matching behavior

114

pos (int, optional): Start position for searching (default: 0)

115

endpos (int, optional): End position for searching (default: len(string))

116

partial (bool, optional): Allow partial matches at end of string

117

concurrent (bool, optional): Release GIL during matching for multithreading

118

timeout (float, optional): Timeout in seconds for matching operation

119

ignore_unused (bool, optional): Ignore unused keyword arguments

120

**kwargs: Additional pattern compilation arguments

121

122

Returns:

123

Match object for first match found, None if no match

124

"""

125

```

126

127

**Usage Examples:**

128

129

```python

130

import regex

131

132

# Basic search

133

result = regex.search(r'\d+', 'abc123def')

134

print(result.group()) # '123'

135

print(result.span()) # (3, 6)

136

137

# Search with position bounds

138

result = regex.search(r'\w+', 'hello world test', pos=6, endpos=11)

139

print(result.group()) # 'world'

140

141

# Fuzzy search with error tolerance

142

result = regex.search(r'(?e)(hello){i<=1,d<=1,s<=1}', 'helo world')

143

print(result.group()) # 'helo' (found with 1 deletion error)

144

145

# Case-insensitive search

146

result = regex.search(r'python', 'I love PYTHON!', regex.IGNORECASE)

147

print(result.group()) # 'PYTHON'

148

```

149

150

### Find All Matches

151

152

Returns all non-overlapping matches of a pattern in a string as a list, with options for overlapping matches and position control.

153

154

```python { .api }

155

def findall(pattern, string, flags=0, pos=None, endpos=None, overlapped=False,

156

concurrent=None, timeout=None, ignore_unused=False, **kwargs):

157

"""

158

Return a list of all matches in the string.

159

160

Args:

161

pattern (str): Regular expression pattern to find

162

string (str): String to search in

163

flags (int, optional): Regex flags to modify matching behavior

164

pos (int, optional): Start position for searching (default: 0)

165

endpos (int, optional): End position for searching (default: len(string))

166

overlapped (bool, optional): Find overlapping matches

167

concurrent (bool, optional): Release GIL during matching for multithreading

168

timeout (float, optional): Timeout in seconds for matching operation

169

ignore_unused (bool, optional): Ignore unused keyword arguments

170

**kwargs: Additional pattern compilation arguments

171

172

Returns:

173

List of matched strings or tuples (for patterns with groups)

174

"""

175

```

176

177

**Usage Examples:**

178

179

```python

180

import regex

181

182

# Find all numbers

183

numbers = regex.findall(r'\d+', 'Price: $123, Quantity: 45, Total: $5535')

184

print(numbers) # ['123', '45', '5535']

185

186

# Find all email addresses

187

emails = regex.findall(r'\b\w+@\w+\.\w+\b', 'Contact: user@example.com or admin@site.org')

188

print(emails) # ['user@example.com', 'admin@site.org']

189

190

# Find with groups

191

matches = regex.findall(r'(\w+):(\d+)', 'port:80, secure:443, admin:8080')

192

print(matches) # [('port', '80'), ('secure', '443'), ('admin', '8080')]

193

194

# Overlapping matches

195

matches = regex.findall(r'\w\w', 'abcdef', overlapped=True)

196

print(matches) # ['ab', 'bc', 'cd', 'de', 'ef']

197

198

# Non-overlapping (default)

199

matches = regex.findall(r'\w\w', 'abcdef')

200

print(matches) # ['ab', 'cd', 'ef']

201

```

202

203

### Find All Matches Iterator

204

205

Returns an iterator over all matches, providing memory-efficient processing for large texts or when you need Match objects with full details.

206

207

```python { .api }

208

def finditer(pattern, string, flags=0, pos=None, endpos=None, overlapped=False,

209

partial=False, concurrent=None, timeout=None, ignore_unused=False, **kwargs):

210

"""

211

Return an iterator over all matches in the string.

212

213

Args:

214

pattern (str): Regular expression pattern to find

215

string (str): String to search in

216

flags (int, optional): Regex flags to modify matching behavior

217

pos (int, optional): Start position for searching (default: 0)

218

endpos (int, optional): End position for searching (default: len(string))

219

overlapped (bool, optional): Find overlapping matches

220

partial (bool, optional): Allow partial matches at end of string

221

concurrent (bool, optional): Release GIL during matching for multithreading

222

timeout (float, optional): Timeout in seconds for matching operation

223

ignore_unused (bool, optional): Ignore unused keyword arguments

224

**kwargs: Additional pattern compilation arguments

225

226

Returns:

227

Iterator yielding Match objects

228

"""

229

```

230

231

**Usage Examples:**

232

233

```python

234

import regex

235

236

# Iterator over matches with full match info

237

text = 'Word1: 123, Word2: 456, Word3: 789'

238

for match in regex.finditer(r'(\w+): (\d+)', text):

239

word, number = match.groups()

240

start, end = match.span()

241

print(f"Found '{word}: {number}' at positions {start}-{end}")

242

243

# Memory-efficient processing of large text

244

def process_large_text(text):

245

word_count = 0

246

for match in regex.finditer(r'\b\w+\b', text):

247

word_count += 1

248

# Process one match at a time without storing all matches

249

return word_count

250

251

# Overlapping matches with iterator

252

text = 'aaaa'

253

for match in regex.finditer(r'aa', text, overlapped=True):

254

print(f"Match: '{match.group()}' at {match.span()}")

255

# Output: Match: 'aa' at (0, 2)

256

# Match: 'aa' at (1, 3)

257

# Match: 'aa' at (2, 4)

258

```

259

260

## Advanced Pattern Features

261

262

### Fuzzy Matching

263

264

The regex module supports fuzzy (approximate) matching with configurable error limits:

265

266

```python

267

# Basic fuzzy matching - allow up to 2 errors of any type

268

pattern = r'(?e)(python){e<=2}'

269

result = regex.search(pattern, 'pyhton is great') # Matches with 1 substitution

270

271

# Specific error types - insertions, deletions, substitutions

272

pattern = r'(?e)(hello){i<=1,d<=1,s<=1}' # Allow 1 of each error type

273

result = regex.search(pattern, 'helo') # Matches with 1 deletion

274

275

# Best match mode - find the best match instead of first

276

pattern = r'(?be)(test){e<=2}'

277

result = regex.search(pattern, 'testing text best') # Finds 'test' (best match)

278

```

279

280

### Version Control

281

282

```python

283

# Version 0 (legacy re-compatible behavior)

284

result = regex.search(r'(?V0)pattern', text)

285

286

# Version 1 (enhanced behavior with full case-folding)

287

result = regex.search(r'(?V1)pattern', text, regex.IGNORECASE)

288

```

289

290

### Concurrent Execution

291

292

```python

293

# Enable concurrent execution for long-running matches

294

result = regex.search(complex_pattern, large_text, concurrent=True)

295

296

# Set timeout to prevent runaway regex

297

result = regex.search(potentially_slow_pattern, text, timeout=5.0)

298

```