or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

advanced-features.mdcore-matching.mdindex.mdoptions-configuration.mdpattern-compilation.mdtext-processing.md

text-processing.mddocs/

0

# Text Processing

1

2

Functions for splitting text and performing substitutions using regular expressions. These operations are fundamental for text processing, data cleaning, and string manipulation tasks.

3

4

## Capabilities

5

6

### Text Splitting

7

8

Splits text into a list using a regular expression pattern as the delimiter, with optional control over the maximum number of splits.

9

10

```python { .api }

11

def split(pattern, text, maxsplit=0, options=None):

12

"""

13

Split text by pattern occurrences.

14

15

Args:

16

pattern (str): Regular expression pattern used as delimiter

17

text (str): Input text to split

18

maxsplit (int): Maximum number of splits (0 = no limit)

19

options (Options, optional): Compilation options

20

21

Returns:

22

list: List of text segments

23

"""

24

```

25

26

Example usage:

27

28

```python

29

import re2

30

31

# Split on whitespace

32

text = "apple banana cherry"

33

parts = re2.split(r'\s+', text)

34

print(parts) # ['apple', 'banana', 'cherry']

35

36

# Split with limit

37

text = "one,two,three,four"

38

parts = re2.split(r',', text, maxsplit=2)

39

print(parts) # ['one', 'two', 'three,four']

40

41

# Split capturing delimiter

42

text = "word1,word2;word3"

43

parts = re2.split(r'([,;])', text)

44

print(parts) # ['word1', ',', 'word2', ';', 'word3']

45

```

46

47

### Text Substitution

48

49

Replaces occurrences of a pattern with a replacement string, with optional control over the number of replacements.

50

51

```python { .api }

52

def sub(pattern, repl, text, count=0, options=None):

53

"""

54

Replace pattern occurrences with replacement string.

55

56

Args:

57

pattern (str): Regular expression pattern to match

58

repl (str or callable): Replacement string or function

59

text (str): Input text to process

60

count (int): Maximum number of replacements (0 = all)

61

options (Options, optional): Compilation options

62

63

Returns:

64

str: Text with replacements made

65

"""

66

```

67

68

Example usage:

69

70

```python

71

import re2

72

73

# Simple replacement

74

text = "Hello world"

75

result = re2.sub(r'world', 'universe', text)

76

print(result) # "Hello universe"

77

78

# Replace with group references

79

text = "John Smith, Jane Doe"

80

result = re2.sub(r'(\w+) (\w+)', r'\2, \1', text)

81

print(result) # "Smith, John, Doe, Jane"

82

83

# Limited replacements

84

text = "foo foo foo"

85

result = re2.sub(r'foo', 'bar', text, count=2)

86

print(result) # "bar bar foo"

87

88

# Using callable replacement

89

def upper_match(match):

90

return match.group().upper()

91

92

text = "hello world"

93

result = re2.sub(r'\w+', upper_match, text)

94

print(result) # "HELLO WORLD"

95

```

96

97

### Text Substitution with Count

98

99

Performs substitution like `sub()` but returns both the modified text and the number of substitutions made.

100

101

```python { .api }

102

def subn(pattern, repl, text, count=0, options=None):

103

"""

104

Replace pattern occurrences and return (result, count).

105

106

Args:

107

pattern (str): Regular expression pattern to match

108

repl (str or callable): Replacement string or function

109

text (str): Input text to process

110

count (int): Maximum number of replacements (0 = all)

111

options (Options, optional): Compilation options

112

113

Returns:

114

tuple: (modified_text, substitution_count)

115

"""

116

```

117

118

Example usage:

119

120

```python

121

import re2

122

123

# Get substitution count

124

text = "The quick brown fox jumps over the lazy dog"

125

result, num_subs = re2.subn(r'\b\w{4}\b', 'WORD', text)

126

print(result) # "The quick brown WORD jumps WORD the WORD dog"

127

print(num_subs) # 3

128

129

# Check if any substitutions were made

130

text = "no matches here"

131

result, num_subs = re2.subn(r'\d+', 'NUMBER', text)

132

if num_subs == 0:

133

print("No changes made")

134

```

135

136

### Utility Functions

137

138

Additional text processing utilities for escaping special characters and managing compiled pattern cache.

139

140

```python { .api }

141

def escape(pattern):

142

"""

143

Escape special regex characters in pattern.

144

145

Args:

146

pattern (str): String to escape

147

148

Returns:

149

str: Pattern with special characters escaped

150

"""

151

152

def purge():

153

"""

154

Clear the compiled regular expression cache.

155

156

This function clears the internal LRU cache that stores

157

compiled pattern objects for better performance.

158

"""

159

```

160

161

Example usage:

162

163

```python

164

import re2

165

166

# Escape special characters

167

literal_text = "Price: $19.99 (20% off)"

168

escaped = re2.escape(literal_text)

169

print(escaped) # "Price: \$19\.99 \(20% off\)"

170

171

# Use escaped text as literal pattern

172

text = "Item costs $19.99 (20% off) today"

173

match = re2.search(escaped, text)

174

print(match is not None) # True

175

176

# Clear pattern cache (useful for memory management)

177

re2.purge()

178

```

179

180

## Pattern Object Text Processing

181

182

When using compiled pattern objects, text processing methods are available as instance methods:

183

184

```python { .api }

185

class _Regexp:

186

"""Compiled regular expression pattern object."""

187

188

def split(text, maxsplit=0):

189

"""Split text using this pattern as delimiter."""

190

191

def sub(repl, text, count=0):

192

"""Replace matches with replacement string."""

193

194

def subn(repl, text, count=0):

195

"""Replace matches and return (result, count)."""

196

```

197

198

Example usage:

199

200

```python

201

import re2

202

203

# Compile pattern once, use multiple times

204

pattern = re2.compile(r'[,;]\s*')

205

text1 = "apple, banana; cherry"

206

text2 = "red,green;blue"

207

208

# Split multiple texts with same pattern

209

parts1 = pattern.split(text1) # ['apple', 'banana', 'cherry']

210

parts2 = pattern.split(text2) # ['red', 'green', 'blue']

211

212

# Replace using compiled pattern

213

result = pattern.sub(' | ', text1) # "apple | banana | cherry"

214

```