Tessl Tile for npm/re2@1.22.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

buffer-support.md constructor-properties.md core-methods.md index.md string-methods.md types.md

buffer-support.mddocs/

0
# Buffer Support
1

2
Direct Buffer processing for efficient text operations without string conversion overhead.
3

4
## Capabilities
5

6
### Buffer Processing Overview
7

8
RE2 provides native support for Node.js Buffers, allowing direct processing of UTF-8 encoded binary data without conversion to JavaScript strings. This is particularly useful for:
9

10
- Processing large text files efficiently
11
- Working with binary protocols containing text patterns
12
- Avoiding UTF-8 ↔ UTF-16 conversion overhead
13
- Handling text data that may contain null bytes
14

15
**Key Characteristics:**
16
- All Buffer inputs must be UTF-8 encoded
17
- Positions and lengths are in bytes, not characters
18
- Results are returned as Buffers when input is Buffer
19
- Full Unicode support maintained
20

21
### Buffer Method Signatures
22

23
All core RE2 methods accept Buffer inputs and return appropriate Buffer results:
24

25
```javascript { .api }
26
/**
27
 * Buffer-compatible method signatures
28
 */
29
regex.exec(buffer: Buffer): RE2BufferExecArray | null;
30
regex.test(buffer: Buffer): boolean;
31
regex.match(buffer: Buffer): RE2BufferMatchArray | null;
32
regex.search(buffer: Buffer): number;
33
regex.replace(buffer: Buffer, replacement: string | Buffer): Buffer;
34
regex.split(buffer: Buffer, limit?: number): Buffer[];
35
```
36

37
### Buffer Result Types
38

39
```javascript { .api }
40
/**
41
 * Buffer-specific result interfaces
42
 */
43
interface RE2BufferExecArray extends Array<Buffer> {
44
  index: number;          // Match start position in bytes
45
  input: Buffer;         // Original Buffer input
46
  groups?: {             // Named groups as Buffers
47
    [key: string]: Buffer;
48
  };
49
}
50

51
interface RE2BufferMatchArray extends Array<Buffer> {
52
  index?: number;        // Match position in bytes (undefined for global)
53
  input?: Buffer;       // Original input (undefined for global)
54
  groups?: {            // Named groups as Buffers
55
    [key: string]: Buffer;
56
  };
57
}
58
```
59

60
### Buffer Usage Examples
61

62
**Basic Buffer Operations:**
63

64
```javascript
65
const RE2 = require("re2");
66

67
// Create Buffer with UTF-8 text
68
const buffer = Buffer.from("Hello 世界! Testing 123", "utf8");
69
const regex = new RE2("\\d+");
70

71
// Test with Buffer
72
console.log(regex.test(buffer)); // true
73

74
// Find match in Buffer
75
const match = regex.exec(buffer);
76
console.log(match[0].toString()); // "123"
77
console.log(match.index);         // 20 (byte position, not character position)
78

79
// Search in Buffer
80
const position = regex.search(buffer);
81
console.log(position); // 20 (byte position)
82
```
83

84
**Buffer Replacement:**
85

86
```javascript
87
const RE2 = require("re2");
88

89
// Replace text in Buffer
90
const sourceBuffer = Buffer.from("test 123 and 456", "utf8");
91
const numberRegex = new RE2("\\d+", "g");
92

93
// Replace with string (returns Buffer)
94
const replaced1 = numberRegex.replace(sourceBuffer, "XXX");
95
console.log(replaced1.toString()); // "test XXX and XXX"
96

97
// Replace with Buffer
98
const replacement = Buffer.from("NUM", "utf8");
99
const replaced2 = numberRegex.replace(sourceBuffer, replacement);
100
console.log(replaced2.toString()); // "test NUM and NUM"
101

102
// Replace with function
103
const replacer = (match, offset, input) => {
104
  const num = parseInt(match.toString());
105
  return Buffer.from(String(num * 2), "utf8");
106
};
107
const doubled = numberRegex.replace(sourceBuffer, replacer);
108
console.log(doubled.toString()); // "test 246 and 912"
109
```
110

111
**Buffer Splitting:**
112

113
```javascript
114
const RE2 = require("re2");
115

116
// Split Buffer by pattern
117
const data = Buffer.from("apple,banana,cherry", "utf8");
118
const commaRegex = new RE2(",");
119

120
const parts = commaRegex.split(data);
121
console.log(parts.length); // 3
122
console.log(parts[0].toString()); // "apple"
123
console.log(parts[1].toString()); // "banana" 
124
console.log(parts[2].toString()); // "cherry"
125

126
// Each part is a Buffer
127
console.log(Buffer.isBuffer(parts[0])); // true
128
```
129

130
### Named Groups with Buffers
131

132
Named capture groups work seamlessly with Buffers:
133

134
```javascript
135
const RE2 = require("re2");
136

137
// Named groups in Buffer matching
138
const emailRegex = new RE2("(?<user>\\w+)@(?<domain>\\w+\\.\\w+)");
139
const emailBuffer = Buffer.from("Contact: user@example.com", "utf8");
140

141
const match = emailRegex.exec(emailBuffer);
142
console.log(match.groups.user.toString());   // "user"
143
console.log(match.groups.domain.toString()); // "example.com"
144

145
// Groups are also Buffers
146
console.log(Buffer.isBuffer(match.groups.user)); // true
147
```
148

149
### UTF-8 Length Utilities
150

151
RE2 provides utility methods for calculating UTF-8 and UTF-16 lengths:
152

153
```javascript { .api }
154
/**
155
 * Calculate UTF-8 byte length needed for UTF-16 string
156
 * @param str - UTF-16 string
157
 * @returns Number of bytes needed for UTF-8 encoding
158
 */
159
RE2.getUtf8Length(str: string): number;
160

161
/**
162
 * Calculate UTF-16 character length for UTF-8 Buffer
163
 * @param buffer - UTF-8 encoded Buffer
164
 * @returns Number of characters in UTF-16, or -1 on error
165
 */
166
RE2.getUtf16Length(buffer: Buffer): number;
167
```
168

169
**Usage Examples:**
170

171
```javascript
172
const RE2 = require("re2");
173

174
// Calculate UTF-8 length for string
175
const text = "Hello 世界!";
176
const utf8Length = RE2.getUtf8Length(text);
177
console.log(utf8Length); // 13 (bytes needed for UTF-8)
178
console.log(text.length); // 9 (UTF-16 characters)
179

180
// Verify with actual Buffer
181
const buffer = Buffer.from(text, "utf8");
182
console.log(buffer.length); // 13 (matches calculated length)
183

184
// Calculate UTF-16 length for Buffer
185
const utf16Length = RE2.getUtf16Length(buffer);
186
console.log(utf16Length); // 9 (UTF-16 characters)
187

188
// Error handling
189
const invalidBuffer = Buffer.from([0xff, 0xfe, 0xfd]); // Invalid UTF-8
190
const errorResult = RE2.getUtf16Length(invalidBuffer);
191
console.log(errorResult); // -1 (indicates error)
192
```
193

194
### Buffer Performance Considerations
195

196
**Advantages:**
197
- No UTF-8 ↔ UTF-16 conversion overhead
198
- Direct binary data processing
199
- Memory efficient for large text files
200
- Preserves exact byte boundaries
201

202
**Considerations:**
203
- Positions and lengths are in bytes, not characters
204
- Requires UTF-8 encoded input
205
- Results need `.toString()` for string operations
206
- More complex when mixing with string operations
207

208
**Best Practices:**
209

210
```javascript
211
const RE2 = require("re2");
212
const fs = require("fs");
213

214
// Efficient large file processing
215
async function processLogFile(filename) {
216
  const buffer = await fs.promises.readFile(filename);
217
  const errorRegex = new RE2("ERROR:\\s*(.*)", "g");
218
  
219
  const errors = [];
220
  let match;
221
  while ((match = errorRegex.exec(buffer)) !== null) {
222
    errors.push({
223
      message: match[1].toString(),
224
      position: match.index,
225
      context: buffer.slice(
226
        Math.max(0, match.index - 50),
227
        match.index + match[0].length + 50
228
      ).toString()
229
    });
230
  }
231
  
232
  return errors;
233
}
234

235
// Mixed string/Buffer operations
236
function processWithContext(text) {
237
  // Use string for simple operations
238
  const regex = new RE2("\\w+@\\w+\\.\\w+", "g");
239
  const emails = text.match(regex);
240
  
241
  // Use Buffer for binary operations if needed
242
  if (emails && emails.length > 0) {
243
    const buffer = Buffer.from(text, "utf8");
244
    const firstEmailPos = regex.search(buffer);
245
    
246
    return {
247
      emails,
248
      firstEmailBytePosition: firstEmailPos
249
    };
250
  }
251
  
252
  return { emails: [], firstEmailBytePosition: -1 };
253
}
254
```
255

256
### Binary Data Patterns
257

258
RE2 can process Buffers containing binary data with text patterns:
259

260
```javascript
261
const RE2 = require("re2");
262

263
// Create Buffer with mixed binary and text data
264
const binaryData = Buffer.concat([
265
  Buffer.from([0x00, 0x01, 0x02]), // Binary header
266
  Buffer.from("START", "utf8"),     // Text marker
267
  Buffer.from([0x03, 0x04]),       // More binary data
268
  Buffer.from("Hello World", "utf8"), // Text content
269
  Buffer.from([0x05, 0x06, 0x07])  // Binary footer
270
]);
271

272
// Find text patterns in binary data
273
const textRegex = new RE2("[A-Z]+");
274
const textMatch = textRegex.exec(binaryData);
275
console.log(textMatch[0].toString()); // "START"
276
console.log(textMatch.index);         // 3 (after binary header)
277

278
// Extract all text from binary data
279
const wordRegex = new RE2("[a-zA-Z]+", "g");
280
const words = [];
281
let match;
282
while ((match = wordRegex.exec(binaryData)) !== null) {
283
  words.push(match[0].toString());
284
}
285
console.log(words); // ["START", "Hello", "World"]
286
```

Version

Tile

Files

buffer-support.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

buffer-support.mddocs/