0
# Buffer Support
1
2
Direct Buffer processing for efficient text operations without string conversion overhead.
3
4
## Capabilities
5
6
### Buffer Processing Overview
7
8
RE2 provides native support for Node.js Buffers, allowing direct processing of UTF-8 encoded binary data without conversion to JavaScript strings. This is particularly useful for:
9
10
- Processing large text files efficiently
11
- Working with binary protocols containing text patterns
12
- Avoiding UTF-8 ↔ UTF-16 conversion overhead
13
- Handling text data that may contain null bytes
14
15
**Key Characteristics:**
16
- All Buffer inputs must be UTF-8 encoded
17
- Positions and lengths are in bytes, not characters
18
- Results are returned as Buffers when input is Buffer
19
- Full Unicode support maintained
20
21
### Buffer Method Signatures
22
23
All core RE2 methods accept Buffer inputs and return appropriate Buffer results:
24
25
```javascript { .api }
26
/**
27
* Buffer-compatible method signatures
28
*/
29
regex.exec(buffer: Buffer): RE2BufferExecArray | null;
30
regex.test(buffer: Buffer): boolean;
31
regex.match(buffer: Buffer): RE2BufferMatchArray | null;
32
regex.search(buffer: Buffer): number;
33
regex.replace(buffer: Buffer, replacement: string | Buffer): Buffer;
34
regex.split(buffer: Buffer, limit?: number): Buffer[];
35
```
36
37
### Buffer Result Types
38
39
```javascript { .api }
40
/**
41
* Buffer-specific result interfaces
42
*/
43
interface RE2BufferExecArray extends Array<Buffer> {
44
index: number; // Match start position in bytes
45
input: Buffer; // Original Buffer input
46
groups?: { // Named groups as Buffers
47
[key: string]: Buffer;
48
};
49
}
50
51
interface RE2BufferMatchArray extends Array<Buffer> {
52
index?: number; // Match position in bytes (undefined for global)
53
input?: Buffer; // Original input (undefined for global)
54
groups?: { // Named groups as Buffers
55
[key: string]: Buffer;
56
};
57
}
58
```
59
60
### Buffer Usage Examples
61
62
**Basic Buffer Operations:**
63
64
```javascript
65
const RE2 = require("re2");
66
67
// Create Buffer with UTF-8 text
68
const buffer = Buffer.from("Hello 世界! Testing 123", "utf8");
69
const regex = new RE2("\\d+");
70
71
// Test with Buffer
72
console.log(regex.test(buffer)); // true
73
74
// Find match in Buffer
75
const match = regex.exec(buffer);
76
console.log(match[0].toString()); // "123"
77
console.log(match.index); // 20 (byte position, not character position)
78
79
// Search in Buffer
80
const position = regex.search(buffer);
81
console.log(position); // 20 (byte position)
82
```
83
84
**Buffer Replacement:**
85
86
```javascript
87
const RE2 = require("re2");
88
89
// Replace text in Buffer
90
const sourceBuffer = Buffer.from("test 123 and 456", "utf8");
91
const numberRegex = new RE2("\\d+", "g");
92
93
// Replace with string (returns Buffer)
94
const replaced1 = numberRegex.replace(sourceBuffer, "XXX");
95
console.log(replaced1.toString()); // "test XXX and XXX"
96
97
// Replace with Buffer
98
const replacement = Buffer.from("NUM", "utf8");
99
const replaced2 = numberRegex.replace(sourceBuffer, replacement);
100
console.log(replaced2.toString()); // "test NUM and NUM"
101
102
// Replace with function
103
const replacer = (match, offset, input) => {
104
const num = parseInt(match.toString());
105
return Buffer.from(String(num * 2), "utf8");
106
};
107
const doubled = numberRegex.replace(sourceBuffer, replacer);
108
console.log(doubled.toString()); // "test 246 and 912"
109
```
110
111
**Buffer Splitting:**
112
113
```javascript
114
const RE2 = require("re2");
115
116
// Split Buffer by pattern
117
const data = Buffer.from("apple,banana,cherry", "utf8");
118
const commaRegex = new RE2(",");
119
120
const parts = commaRegex.split(data);
121
console.log(parts.length); // 3
122
console.log(parts[0].toString()); // "apple"
123
console.log(parts[1].toString()); // "banana"
124
console.log(parts[2].toString()); // "cherry"
125
126
// Each part is a Buffer
127
console.log(Buffer.isBuffer(parts[0])); // true
128
```
129
130
### Named Groups with Buffers
131
132
Named capture groups work seamlessly with Buffers:
133
134
```javascript
135
const RE2 = require("re2");
136
137
// Named groups in Buffer matching
138
const emailRegex = new RE2("(?<user>\\w+)@(?<domain>\\w+\\.\\w+)");
139
const emailBuffer = Buffer.from("Contact: user@example.com", "utf8");
140
141
const match = emailRegex.exec(emailBuffer);
142
console.log(match.groups.user.toString()); // "user"
143
console.log(match.groups.domain.toString()); // "example.com"
144
145
// Groups are also Buffers
146
console.log(Buffer.isBuffer(match.groups.user)); // true
147
```
148
149
### UTF-8 Length Utilities
150
151
RE2 provides utility methods for calculating UTF-8 and UTF-16 lengths:
152
153
```javascript { .api }
154
/**
155
* Calculate UTF-8 byte length needed for UTF-16 string
156
* @param str - UTF-16 string
157
* @returns Number of bytes needed for UTF-8 encoding
158
*/
159
RE2.getUtf8Length(str: string): number;
160
161
/**
162
* Calculate UTF-16 character length for UTF-8 Buffer
163
* @param buffer - UTF-8 encoded Buffer
164
* @returns Number of characters in UTF-16, or -1 on error
165
*/
166
RE2.getUtf16Length(buffer: Buffer): number;
167
```
168
169
**Usage Examples:**
170
171
```javascript
172
const RE2 = require("re2");
173
174
// Calculate UTF-8 length for string
175
const text = "Hello 世界!";
176
const utf8Length = RE2.getUtf8Length(text);
177
console.log(utf8Length); // 13 (bytes needed for UTF-8)
178
console.log(text.length); // 9 (UTF-16 characters)
179
180
// Verify with actual Buffer
181
const buffer = Buffer.from(text, "utf8");
182
console.log(buffer.length); // 13 (matches calculated length)
183
184
// Calculate UTF-16 length for Buffer
185
const utf16Length = RE2.getUtf16Length(buffer);
186
console.log(utf16Length); // 9 (UTF-16 characters)
187
188
// Error handling
189
const invalidBuffer = Buffer.from([0xff, 0xfe, 0xfd]); // Invalid UTF-8
190
const errorResult = RE2.getUtf16Length(invalidBuffer);
191
console.log(errorResult); // -1 (indicates error)
192
```
193
194
### Buffer Performance Considerations
195
196
**Advantages:**
197
- No UTF-8 ↔ UTF-16 conversion overhead
198
- Direct binary data processing
199
- Memory efficient for large text files
200
- Preserves exact byte boundaries
201
202
**Considerations:**
203
- Positions and lengths are in bytes, not characters
204
- Requires UTF-8 encoded input
205
- Results need `.toString()` for string operations
206
- More complex when mixing with string operations
207
208
**Best Practices:**
209
210
```javascript
211
const RE2 = require("re2");
212
const fs = require("fs");
213
214
// Efficient large file processing
215
async function processLogFile(filename) {
216
const buffer = await fs.promises.readFile(filename);
217
const errorRegex = new RE2("ERROR:\\s*(.*)", "g");
218
219
const errors = [];
220
let match;
221
while ((match = errorRegex.exec(buffer)) !== null) {
222
errors.push({
223
message: match[1].toString(),
224
position: match.index,
225
context: buffer.slice(
226
Math.max(0, match.index - 50),
227
match.index + match[0].length + 50
228
).toString()
229
});
230
}
231
232
return errors;
233
}
234
235
// Mixed string/Buffer operations
236
function processWithContext(text) {
237
// Use string for simple operations
238
const regex = new RE2("\\w+@\\w+\\.\\w+", "g");
239
const emails = text.match(regex);
240
241
// Use Buffer for binary operations if needed
242
if (emails && emails.length > 0) {
243
const buffer = Buffer.from(text, "utf8");
244
const firstEmailPos = regex.search(buffer);
245
246
return {
247
emails,
248
firstEmailBytePosition: firstEmailPos
249
};
250
}
251
252
return { emails: [], firstEmailBytePosition: -1 };
253
}
254
```
255
256
### Binary Data Patterns
257
258
RE2 can process Buffers containing binary data with text patterns:
259
260
```javascript
261
const RE2 = require("re2");
262
263
// Create Buffer with mixed binary and text data
264
const binaryData = Buffer.concat([
265
Buffer.from([0x00, 0x01, 0x02]), // Binary header
266
Buffer.from("START", "utf8"), // Text marker
267
Buffer.from([0x03, 0x04]), // More binary data
268
Buffer.from("Hello World", "utf8"), // Text content
269
Buffer.from([0x05, 0x06, 0x07]) // Binary footer
270
]);
271
272
// Find text patterns in binary data
273
const textRegex = new RE2("[A-Z]+");
274
const textMatch = textRegex.exec(binaryData);
275
console.log(textMatch[0].toString()); // "START"
276
console.log(textMatch.index); // 3 (after binary header)
277
278
// Extract all text from binary data
279
const wordRegex = new RE2("[a-zA-Z]+", "g");
280
const words = [];
281
let match;
282
while ((match = wordRegex.exec(binaryData)) !== null) {
283
words.push(match[0].toString());
284
}
285
console.log(words); // ["START", "Hello", "World"]
286
```