or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

buffer-support.mdconstructor-properties.mdcore-methods.mdindex.mdstring-methods.mdtypes.md

buffer-support.mddocs/

0

# Buffer Support

1

2

Direct Buffer processing for efficient text operations without string conversion overhead.

3

4

## Capabilities

5

6

### Buffer Processing Overview

7

8

RE2 provides native support for Node.js Buffers, allowing direct processing of UTF-8 encoded binary data without conversion to JavaScript strings. This is particularly useful for:

9

10

- Processing large text files efficiently

11

- Working with binary protocols containing text patterns

12

- Avoiding UTF-8 ↔ UTF-16 conversion overhead

13

- Handling text data that may contain null bytes

14

15

**Key Characteristics:**

16

- All Buffer inputs must be UTF-8 encoded

17

- Positions and lengths are in bytes, not characters

18

- Results are returned as Buffers when input is Buffer

19

- Full Unicode support maintained

20

21

### Buffer Method Signatures

22

23

All core RE2 methods accept Buffer inputs and return appropriate Buffer results:

24

25

```javascript { .api }

26

/**

27

* Buffer-compatible method signatures

28

*/

29

regex.exec(buffer: Buffer): RE2BufferExecArray | null;

30

regex.test(buffer: Buffer): boolean;

31

regex.match(buffer: Buffer): RE2BufferMatchArray | null;

32

regex.search(buffer: Buffer): number;

33

regex.replace(buffer: Buffer, replacement: string | Buffer): Buffer;

34

regex.split(buffer: Buffer, limit?: number): Buffer[];

35

```

36

37

### Buffer Result Types

38

39

```javascript { .api }

40

/**

41

* Buffer-specific result interfaces

42

*/

43

interface RE2BufferExecArray extends Array<Buffer> {

44

index: number; // Match start position in bytes

45

input: Buffer; // Original Buffer input

46

groups?: { // Named groups as Buffers

47

[key: string]: Buffer;

48

};

49

}

50

51

interface RE2BufferMatchArray extends Array<Buffer> {

52

index?: number; // Match position in bytes (undefined for global)

53

input?: Buffer; // Original input (undefined for global)

54

groups?: { // Named groups as Buffers

55

[key: string]: Buffer;

56

};

57

}

58

```

59

60

### Buffer Usage Examples

61

62

**Basic Buffer Operations:**

63

64

```javascript

65

const RE2 = require("re2");

66

67

// Create Buffer with UTF-8 text

68

const buffer = Buffer.from("Hello 世界! Testing 123", "utf8");

69

const regex = new RE2("\\d+");

70

71

// Test with Buffer

72

console.log(regex.test(buffer)); // true

73

74

// Find match in Buffer

75

const match = regex.exec(buffer);

76

console.log(match[0].toString()); // "123"

77

console.log(match.index); // 20 (byte position, not character position)

78

79

// Search in Buffer

80

const position = regex.search(buffer);

81

console.log(position); // 20 (byte position)

82

```

83

84

**Buffer Replacement:**

85

86

```javascript

87

const RE2 = require("re2");

88

89

// Replace text in Buffer

90

const sourceBuffer = Buffer.from("test 123 and 456", "utf8");

91

const numberRegex = new RE2("\\d+", "g");

92

93

// Replace with string (returns Buffer)

94

const replaced1 = numberRegex.replace(sourceBuffer, "XXX");

95

console.log(replaced1.toString()); // "test XXX and XXX"

96

97

// Replace with Buffer

98

const replacement = Buffer.from("NUM", "utf8");

99

const replaced2 = numberRegex.replace(sourceBuffer, replacement);

100

console.log(replaced2.toString()); // "test NUM and NUM"

101

102

// Replace with function

103

const replacer = (match, offset, input) => {

104

const num = parseInt(match.toString());

105

return Buffer.from(String(num * 2), "utf8");

106

};

107

const doubled = numberRegex.replace(sourceBuffer, replacer);

108

console.log(doubled.toString()); // "test 246 and 912"

109

```

110

111

**Buffer Splitting:**

112

113

```javascript

114

const RE2 = require("re2");

115

116

// Split Buffer by pattern

117

const data = Buffer.from("apple,banana,cherry", "utf8");

118

const commaRegex = new RE2(",");

119

120

const parts = commaRegex.split(data);

121

console.log(parts.length); // 3

122

console.log(parts[0].toString()); // "apple"

123

console.log(parts[1].toString()); // "banana"

124

console.log(parts[2].toString()); // "cherry"

125

126

// Each part is a Buffer

127

console.log(Buffer.isBuffer(parts[0])); // true

128

```

129

130

### Named Groups with Buffers

131

132

Named capture groups work seamlessly with Buffers:

133

134

```javascript

135

const RE2 = require("re2");

136

137

// Named groups in Buffer matching

138

const emailRegex = new RE2("(?<user>\\w+)@(?<domain>\\w+\\.\\w+)");

139

const emailBuffer = Buffer.from("Contact: user@example.com", "utf8");

140

141

const match = emailRegex.exec(emailBuffer);

142

console.log(match.groups.user.toString()); // "user"

143

console.log(match.groups.domain.toString()); // "example.com"

144

145

// Groups are also Buffers

146

console.log(Buffer.isBuffer(match.groups.user)); // true

147

```

148

149

### UTF-8 Length Utilities

150

151

RE2 provides utility methods for calculating UTF-8 and UTF-16 lengths:

152

153

```javascript { .api }

154

/**

155

* Calculate UTF-8 byte length needed for UTF-16 string

156

* @param str - UTF-16 string

157

* @returns Number of bytes needed for UTF-8 encoding

158

*/

159

RE2.getUtf8Length(str: string): number;

160

161

/**

162

* Calculate UTF-16 character length for UTF-8 Buffer

163

* @param buffer - UTF-8 encoded Buffer

164

* @returns Number of characters in UTF-16, or -1 on error

165

*/

166

RE2.getUtf16Length(buffer: Buffer): number;

167

```

168

169

**Usage Examples:**

170

171

```javascript

172

const RE2 = require("re2");

173

174

// Calculate UTF-8 length for string

175

const text = "Hello 世界!";

176

const utf8Length = RE2.getUtf8Length(text);

177

console.log(utf8Length); // 13 (bytes needed for UTF-8)

178

console.log(text.length); // 9 (UTF-16 characters)

179

180

// Verify with actual Buffer

181

const buffer = Buffer.from(text, "utf8");

182

console.log(buffer.length); // 13 (matches calculated length)

183

184

// Calculate UTF-16 length for Buffer

185

const utf16Length = RE2.getUtf16Length(buffer);

186

console.log(utf16Length); // 9 (UTF-16 characters)

187

188

// Error handling

189

const invalidBuffer = Buffer.from([0xff, 0xfe, 0xfd]); // Invalid UTF-8

190

const errorResult = RE2.getUtf16Length(invalidBuffer);

191

console.log(errorResult); // -1 (indicates error)

192

```

193

194

### Buffer Performance Considerations

195

196

**Advantages:**

197

- No UTF-8 ↔ UTF-16 conversion overhead

198

- Direct binary data processing

199

- Memory efficient for large text files

200

- Preserves exact byte boundaries

201

202

**Considerations:**

203

- Positions and lengths are in bytes, not characters

204

- Requires UTF-8 encoded input

205

- Results need `.toString()` for string operations

206

- More complex when mixing with string operations

207

208

**Best Practices:**

209

210

```javascript

211

const RE2 = require("re2");

212

const fs = require("fs");

213

214

// Efficient large file processing

215

async function processLogFile(filename) {

216

const buffer = await fs.promises.readFile(filename);

217

const errorRegex = new RE2("ERROR:\\s*(.*)", "g");

218

219

const errors = [];

220

let match;

221

while ((match = errorRegex.exec(buffer)) !== null) {

222

errors.push({

223

message: match[1].toString(),

224

position: match.index,

225

context: buffer.slice(

226

Math.max(0, match.index - 50),

227

match.index + match[0].length + 50

228

).toString()

229

});

230

}

231

232

return errors;

233

}

234

235

// Mixed string/Buffer operations

236

function processWithContext(text) {

237

// Use string for simple operations

238

const regex = new RE2("\\w+@\\w+\\.\\w+", "g");

239

const emails = text.match(regex);

240

241

// Use Buffer for binary operations if needed

242

if (emails && emails.length > 0) {

243

const buffer = Buffer.from(text, "utf8");

244

const firstEmailPos = regex.search(buffer);

245

246

return {

247

emails,

248

firstEmailBytePosition: firstEmailPos

249

};

250

}

251

252

return { emails: [], firstEmailBytePosition: -1 };

253

}

254

```

255

256

### Binary Data Patterns

257

258

RE2 can process Buffers containing binary data with text patterns:

259

260

```javascript

261

const RE2 = require("re2");

262

263

// Create Buffer with mixed binary and text data

264

const binaryData = Buffer.concat([

265

Buffer.from([0x00, 0x01, 0x02]), // Binary header

266

Buffer.from("START", "utf8"), // Text marker

267

Buffer.from([0x03, 0x04]), // More binary data

268

Buffer.from("Hello World", "utf8"), // Text content

269

Buffer.from([0x05, 0x06, 0x07]) // Binary footer

270

]);

271

272

// Find text patterns in binary data

273

const textRegex = new RE2("[A-Z]+");

274

const textMatch = textRegex.exec(binaryData);

275

console.log(textMatch[0].toString()); // "START"

276

console.log(textMatch.index); // 3 (after binary header)

277

278

// Extract all text from binary data

279

const wordRegex = new RE2("[a-zA-Z]+", "g");

280

const words = [];

281

let match;

282

while ((match = wordRegex.exec(binaryData)) !== null) {

283

words.push(match[0].toString());

284

}

285

console.log(words); // ["START", "Hello", "World"]

286

```