CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/npm-re2

Bindings for RE2: fast, safe alternative to backtracking regular expression engines.

Pending

Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

Overview
Eval results
Files

buffer-support.mddocs/

Buffer Support

Direct Buffer processing for efficient text operations without string conversion overhead.

Capabilities

Buffer Processing Overview

RE2 provides native support for Node.js Buffers, allowing direct processing of UTF-8 encoded binary data without conversion to JavaScript strings. This is particularly useful for:

  • Processing large text files efficiently
  • Working with binary protocols containing text patterns
  • Avoiding UTF-8 ↔ UTF-16 conversion overhead
  • Handling text data that may contain null bytes

Key Characteristics:

  • All Buffer inputs must be UTF-8 encoded
  • Positions and lengths are in bytes, not characters
  • Results are returned as Buffers when input is Buffer
  • Full Unicode support maintained

Buffer Method Signatures

All core RE2 methods accept Buffer inputs and return appropriate Buffer results:

/**
 * Buffer-compatible method signatures
 */
regex.exec(buffer: Buffer): RE2BufferExecArray | null;
regex.test(buffer: Buffer): boolean;
regex.match(buffer: Buffer): RE2BufferMatchArray | null;
regex.search(buffer: Buffer): number;
regex.replace(buffer: Buffer, replacement: string | Buffer): Buffer;
regex.split(buffer: Buffer, limit?: number): Buffer[];

Buffer Result Types

/**
 * Buffer-specific result interfaces
 */
interface RE2BufferExecArray extends Array<Buffer> {
  index: number;          // Match start position in bytes
  input: Buffer;         // Original Buffer input
  groups?: {             // Named groups as Buffers
    [key: string]: Buffer;
  };
}

interface RE2BufferMatchArray extends Array<Buffer> {
  index?: number;        // Match position in bytes (undefined for global)
  input?: Buffer;       // Original input (undefined for global)
  groups?: {            // Named groups as Buffers
    [key: string]: Buffer;
  };
}

Buffer Usage Examples

Basic Buffer Operations:

const RE2 = require("re2");

// Create Buffer with UTF-8 text
const buffer = Buffer.from("Hello 世界! Testing 123", "utf8");
const regex = new RE2("\\d+");

// Test with Buffer
console.log(regex.test(buffer)); // true

// Find match in Buffer
const match = regex.exec(buffer);
console.log(match[0].toString()); // "123"
console.log(match.index);         // 20 (byte position, not character position)

// Search in Buffer
const position = regex.search(buffer);
console.log(position); // 20 (byte position)

Buffer Replacement:

const RE2 = require("re2");

// Replace text in Buffer
const sourceBuffer = Buffer.from("test 123 and 456", "utf8");
const numberRegex = new RE2("\\d+", "g");

// Replace with string (returns Buffer)
const replaced1 = numberRegex.replace(sourceBuffer, "XXX");
console.log(replaced1.toString()); // "test XXX and XXX"

// Replace with Buffer
const replacement = Buffer.from("NUM", "utf8");
const replaced2 = numberRegex.replace(sourceBuffer, replacement);
console.log(replaced2.toString()); // "test NUM and NUM"

// Replace with function
const replacer = (match, offset, input) => {
  const num = parseInt(match.toString());
  return Buffer.from(String(num * 2), "utf8");
};
const doubled = numberRegex.replace(sourceBuffer, replacer);
console.log(doubled.toString()); // "test 246 and 912"

Buffer Splitting:

const RE2 = require("re2");

// Split Buffer by pattern
const data = Buffer.from("apple,banana,cherry", "utf8");
const commaRegex = new RE2(",");

const parts = commaRegex.split(data);
console.log(parts.length); // 3
console.log(parts[0].toString()); // "apple"
console.log(parts[1].toString()); // "banana" 
console.log(parts[2].toString()); // "cherry"

// Each part is a Buffer
console.log(Buffer.isBuffer(parts[0])); // true

Named Groups with Buffers

Named capture groups work seamlessly with Buffers:

const RE2 = require("re2");

// Named groups in Buffer matching
const emailRegex = new RE2("(?<user>\\w+)@(?<domain>\\w+\\.\\w+)");
const emailBuffer = Buffer.from("Contact: user@example.com", "utf8");

const match = emailRegex.exec(emailBuffer);
console.log(match.groups.user.toString());   // "user"
console.log(match.groups.domain.toString()); // "example.com"

// Groups are also Buffers
console.log(Buffer.isBuffer(match.groups.user)); // true

UTF-8 Length Utilities

RE2 provides utility methods for calculating UTF-8 and UTF-16 lengths:

/**
 * Calculate UTF-8 byte length needed for UTF-16 string
 * @param str - UTF-16 string
 * @returns Number of bytes needed for UTF-8 encoding
 */
RE2.getUtf8Length(str: string): number;

/**
 * Calculate UTF-16 character length for UTF-8 Buffer
 * @param buffer - UTF-8 encoded Buffer
 * @returns Number of characters in UTF-16, or -1 on error
 */
RE2.getUtf16Length(buffer: Buffer): number;

Usage Examples:

const RE2 = require("re2");

// Calculate UTF-8 length for string
const text = "Hello 世界!";
const utf8Length = RE2.getUtf8Length(text);
console.log(utf8Length); // 13 (bytes needed for UTF-8)
console.log(text.length); // 9 (UTF-16 characters)

// Verify with actual Buffer
const buffer = Buffer.from(text, "utf8");
console.log(buffer.length); // 13 (matches calculated length)

// Calculate UTF-16 length for Buffer
const utf16Length = RE2.getUtf16Length(buffer);
console.log(utf16Length); // 9 (UTF-16 characters)

// Error handling
const invalidBuffer = Buffer.from([0xff, 0xfe, 0xfd]); // Invalid UTF-8
const errorResult = RE2.getUtf16Length(invalidBuffer);
console.log(errorResult); // -1 (indicates error)

Buffer Performance Considerations

Advantages:

  • No UTF-8 ↔ UTF-16 conversion overhead
  • Direct binary data processing
  • Memory efficient for large text files
  • Preserves exact byte boundaries

Considerations:

  • Positions and lengths are in bytes, not characters
  • Requires UTF-8 encoded input
  • Results need .toString() for string operations
  • More complex when mixing with string operations

Best Practices:

const RE2 = require("re2");
const fs = require("fs");

// Efficient large file processing
async function processLogFile(filename) {
  const buffer = await fs.promises.readFile(filename);
  const errorRegex = new RE2("ERROR:\\s*(.*)", "g");
  
  const errors = [];
  let match;
  while ((match = errorRegex.exec(buffer)) !== null) {
    errors.push({
      message: match[1].toString(),
      position: match.index,
      context: buffer.slice(
        Math.max(0, match.index - 50),
        match.index + match[0].length + 50
      ).toString()
    });
  }
  
  return errors;
}

// Mixed string/Buffer operations
function processWithContext(text) {
  // Use string for simple operations
  const regex = new RE2("\\w+@\\w+\\.\\w+", "g");
  const emails = text.match(regex);
  
  // Use Buffer for binary operations if needed
  if (emails && emails.length > 0) {
    const buffer = Buffer.from(text, "utf8");
    const firstEmailPos = regex.search(buffer);
    
    return {
      emails,
      firstEmailBytePosition: firstEmailPos
    };
  }
  
  return { emails: [], firstEmailBytePosition: -1 };
}

Binary Data Patterns

RE2 can process Buffers containing binary data with text patterns:

const RE2 = require("re2");

// Create Buffer with mixed binary and text data
const binaryData = Buffer.concat([
  Buffer.from([0x00, 0x01, 0x02]), // Binary header
  Buffer.from("START", "utf8"),     // Text marker
  Buffer.from([0x03, 0x04]),       // More binary data
  Buffer.from("Hello World", "utf8"), // Text content
  Buffer.from([0x05, 0x06, 0x07])  // Binary footer
]);

// Find text patterns in binary data
const textRegex = new RE2("[A-Z]+");
const textMatch = textRegex.exec(binaryData);
console.log(textMatch[0].toString()); // "START"
console.log(textMatch.index);         // 3 (after binary header)

// Extract all text from binary data
const wordRegex = new RE2("[a-zA-Z]+", "g");
const words = [];
let match;
while ((match = wordRegex.exec(binaryData)) !== null) {
  words.push(match[0].toString());
}
console.log(words); // ["START", "Hello", "World"]

Install with Tessl CLI

npx tessl i tessl/npm-re2

docs

buffer-support.md

constructor-properties.md

core-methods.md

index.md

string-methods.md

types.md

tile.json