Bindings for RE2: fast, safe alternative to backtracking regular expression engines.
—
Quality
Pending
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Direct Buffer processing for efficient text operations without string conversion overhead.
RE2 provides native support for Node.js Buffers, allowing direct processing of UTF-8 encoded binary data without conversion to JavaScript strings. This is particularly useful for:
Key Characteristics:
All core RE2 methods accept Buffer inputs and return appropriate Buffer results:
/**
* Buffer-compatible method signatures
*/
regex.exec(buffer: Buffer): RE2BufferExecArray | null;
regex.test(buffer: Buffer): boolean;
regex.match(buffer: Buffer): RE2BufferMatchArray | null;
regex.search(buffer: Buffer): number;
regex.replace(buffer: Buffer, replacement: string | Buffer): Buffer;
regex.split(buffer: Buffer, limit?: number): Buffer[];/**
* Buffer-specific result interfaces
*/
interface RE2BufferExecArray extends Array<Buffer> {
index: number; // Match start position in bytes
input: Buffer; // Original Buffer input
groups?: { // Named groups as Buffers
[key: string]: Buffer;
};
}
interface RE2BufferMatchArray extends Array<Buffer> {
index?: number; // Match position in bytes (undefined for global)
input?: Buffer; // Original input (undefined for global)
groups?: { // Named groups as Buffers
[key: string]: Buffer;
};
}Basic Buffer Operations:
const RE2 = require("re2");
// Create Buffer with UTF-8 text
const buffer = Buffer.from("Hello 世界! Testing 123", "utf8");
const regex = new RE2("\\d+");
// Test with Buffer
console.log(regex.test(buffer)); // true
// Find match in Buffer
const match = regex.exec(buffer);
console.log(match[0].toString()); // "123"
console.log(match.index); // 20 (byte position, not character position)
// Search in Buffer
const position = regex.search(buffer);
console.log(position); // 20 (byte position)Buffer Replacement:
const RE2 = require("re2");
// Replace text in Buffer
const sourceBuffer = Buffer.from("test 123 and 456", "utf8");
const numberRegex = new RE2("\\d+", "g");
// Replace with string (returns Buffer)
const replaced1 = numberRegex.replace(sourceBuffer, "XXX");
console.log(replaced1.toString()); // "test XXX and XXX"
// Replace with Buffer
const replacement = Buffer.from("NUM", "utf8");
const replaced2 = numberRegex.replace(sourceBuffer, replacement);
console.log(replaced2.toString()); // "test NUM and NUM"
// Replace with function
const replacer = (match, offset, input) => {
const num = parseInt(match.toString());
return Buffer.from(String(num * 2), "utf8");
};
const doubled = numberRegex.replace(sourceBuffer, replacer);
console.log(doubled.toString()); // "test 246 and 912"Buffer Splitting:
const RE2 = require("re2");
// Split Buffer by pattern
const data = Buffer.from("apple,banana,cherry", "utf8");
const commaRegex = new RE2(",");
const parts = commaRegex.split(data);
console.log(parts.length); // 3
console.log(parts[0].toString()); // "apple"
console.log(parts[1].toString()); // "banana"
console.log(parts[2].toString()); // "cherry"
// Each part is a Buffer
console.log(Buffer.isBuffer(parts[0])); // trueNamed capture groups work seamlessly with Buffers:
const RE2 = require("re2");
// Named groups in Buffer matching
const emailRegex = new RE2("(?<user>\\w+)@(?<domain>\\w+\\.\\w+)");
const emailBuffer = Buffer.from("Contact: user@example.com", "utf8");
const match = emailRegex.exec(emailBuffer);
console.log(match.groups.user.toString()); // "user"
console.log(match.groups.domain.toString()); // "example.com"
// Groups are also Buffers
console.log(Buffer.isBuffer(match.groups.user)); // trueRE2 provides utility methods for calculating UTF-8 and UTF-16 lengths:
/**
* Calculate UTF-8 byte length needed for UTF-16 string
* @param str - UTF-16 string
* @returns Number of bytes needed for UTF-8 encoding
*/
RE2.getUtf8Length(str: string): number;
/**
* Calculate UTF-16 character length for UTF-8 Buffer
* @param buffer - UTF-8 encoded Buffer
* @returns Number of characters in UTF-16, or -1 on error
*/
RE2.getUtf16Length(buffer: Buffer): number;Usage Examples:
const RE2 = require("re2");
// Calculate UTF-8 length for string
const text = "Hello 世界!";
const utf8Length = RE2.getUtf8Length(text);
console.log(utf8Length); // 13 (bytes needed for UTF-8)
console.log(text.length); // 9 (UTF-16 characters)
// Verify with actual Buffer
const buffer = Buffer.from(text, "utf8");
console.log(buffer.length); // 13 (matches calculated length)
// Calculate UTF-16 length for Buffer
const utf16Length = RE2.getUtf16Length(buffer);
console.log(utf16Length); // 9 (UTF-16 characters)
// Error handling
const invalidBuffer = Buffer.from([0xff, 0xfe, 0xfd]); // Invalid UTF-8
const errorResult = RE2.getUtf16Length(invalidBuffer);
console.log(errorResult); // -1 (indicates error)Advantages:
Considerations:
.toString() for string operationsBest Practices:
const RE2 = require("re2");
const fs = require("fs");
// Efficient large file processing
async function processLogFile(filename) {
const buffer = await fs.promises.readFile(filename);
const errorRegex = new RE2("ERROR:\\s*(.*)", "g");
const errors = [];
let match;
while ((match = errorRegex.exec(buffer)) !== null) {
errors.push({
message: match[1].toString(),
position: match.index,
context: buffer.slice(
Math.max(0, match.index - 50),
match.index + match[0].length + 50
).toString()
});
}
return errors;
}
// Mixed string/Buffer operations
function processWithContext(text) {
// Use string for simple operations
const regex = new RE2("\\w+@\\w+\\.\\w+", "g");
const emails = text.match(regex);
// Use Buffer for binary operations if needed
if (emails && emails.length > 0) {
const buffer = Buffer.from(text, "utf8");
const firstEmailPos = regex.search(buffer);
return {
emails,
firstEmailBytePosition: firstEmailPos
};
}
return { emails: [], firstEmailBytePosition: -1 };
}RE2 can process Buffers containing binary data with text patterns:
const RE2 = require("re2");
// Create Buffer with mixed binary and text data
const binaryData = Buffer.concat([
Buffer.from([0x00, 0x01, 0x02]), // Binary header
Buffer.from("START", "utf8"), // Text marker
Buffer.from([0x03, 0x04]), // More binary data
Buffer.from("Hello World", "utf8"), // Text content
Buffer.from([0x05, 0x06, 0x07]) // Binary footer
]);
// Find text patterns in binary data
const textRegex = new RE2("[A-Z]+");
const textMatch = textRegex.exec(binaryData);
console.log(textMatch[0].toString()); // "START"
console.log(textMatch.index); // 3 (after binary header)
// Extract all text from binary data
const wordRegex = new RE2("[a-zA-Z]+", "g");
const words = [];
let match;
while ((match = wordRegex.exec(binaryData)) !== null) {
words.push(match[0].toString());
}
console.log(words); // ["START", "Hello", "World"]Install with Tessl CLI
npx tessl i tessl/npm-re2