or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

buffer-conversion.mdbuffer-management.mdfloating-point-operations.mdindex.mdinteger-operations.mdstring-operations.mdvarint-operations.md
tile.json

string-operations.mddocs/

String Operations

Read and write operations for strings with multiple encoding formats and length-prefixed variants. ByteBuffer provides comprehensive string handling with support for null-terminated strings, length-prefixed strings, and various character encodings.

Capabilities

UTF-8 String Operations

Read and write UTF-8 encoded strings with various length encoding schemes.

/**
 * Write UTF-8 encoded string without length prefix
 * @param {string} str - String to write
 * @param {number} offset - Offset to write at (default: current offset)
 * @returns {ByteBuffer} This ByteBuffer for chaining
 */
writeUTF8String(str, offset);

/**
 * Alias for writeUTF8String
 */
writeString(str, offset);

/**
 * Read UTF-8 encoded string
 * @param {number} length - Number of bytes (if metrics='b') or characters (if metrics='c') to read
 * @param {string} metrics - Length interpretation: 'b' for bytes, 'c' for characters (default: 'b')
 * @param {number} offset - Offset to read from (default: current offset)
 * @returns {string|{string: string, offset: number}} String value or result object
 */
readUTF8String(length, metrics, offset);

/**
 * Alias for readUTF8String
 */
readString(length, metrics, offset);

Usage Examples:

const ByteBuffer = require("bytebuffer");
const bb = ByteBuffer.allocate(64);

// Write UTF-8 strings
bb.writeUTF8String("Hello World!");
bb.writeString("Café"); // UTF-8 handles accented characters
bb.writeUTF8String("🚀 Emoji"); // UTF-8 handles emoji

// Calculate string sizes for reading
const hello = "Hello World!";
const helloBytes = ByteBuffer.calculateUTF8Bytes(hello);
const helloChars = ByteBuffer.calculateUTF8Chars(hello);

console.log(`"${hello}": ${helloBytes} bytes, ${helloChars} chars`);

// Read back using byte count
bb.flip();
const readHello = bb.readUTF8String(12); // 12 bytes
const readCafe = bb.readString(5);        // 5 bytes (é is 2 bytes in UTF-8)
const readEmoji = bb.readUTF8String(10);  // 10 bytes (🚀 is 4 bytes)

console.log(readHello); // "Hello World!"
console.log(readCafe);  // "Café"
console.log(readEmoji); // "🚀 Emoji"

// Read using character count
bb.clear();
bb.writeUTF8String("Café");
bb.flip();
const cafeByChars = bb.readUTF8String(4, 'c'); // 4 characters
console.log(cafeByChars); // "Café"

C-Style Null-Terminated Strings

Read and write null-terminated strings (C-style strings).

/**
 * Write null-terminated string (C-style string)
 * @param {string} str - String to write (null terminator added automatically)
 * @param {number} offset - Offset to write at (default: current offset)
 * @returns {ByteBuffer} This ByteBuffer for chaining
 */
writeCString(str, offset);

/**
 * Read null-terminated string (C-style string)
 * @param {number} offset - Offset to read from (default: current offset)
 * @returns {string|{string: string, offset: number}} String value or result object
 */
readCString(offset);

Usage Examples:

const bb = ByteBuffer.allocate(64);

// Write C-style strings (null-terminated)
bb.writeCString("Hello");
bb.writeCString("World");
bb.writeCString(""); // Empty string

// Read back - automatically stops at null terminator
bb.flip();
const str1 = bb.readCString(); // "Hello"
const str2 = bb.readCString(); // "World" 
const str3 = bb.readCString(); // ""

console.log(`Read: "${str1}", "${str2}", "${str3}"`);

// C-strings include null terminator in byte count
bb.clear();
bb.writeCString("Test");
console.log(bb.offset); // 5 (4 chars + 1 null terminator)

Length-Prefixed Strings

Read and write strings with various length-prefix encodings.

/**
 * Write string with 32-bit unsigned integer length prefix
 * @param {string} str - String to write
 * @param {number} offset - Offset to write at (default: current offset)
 * @returns {ByteBuffer} This ByteBuffer for chaining
 */
writeIString(str, offset);

/**
 * Read string with 32-bit unsigned integer length prefix
 * @param {number} offset - Offset to read from (default: current offset)
 * @returns {string|{string: string, offset: number}} String value or result object
 */
readIString(offset);

/**
 * Write string with varint32 length prefix
 * @param {string} str - String to write
 * @param {number} offset - Offset to write at (default: current offset)
 * @returns {ByteBuffer} This ByteBuffer for chaining
 */
writeVString(str, offset);

/**
 * Read string with varint32 length prefix
 * @param {number} offset - Offset to read from (default: current offset)
 * @returns {string|{string: string, offset: number}} String value or result object
 */
readVString(offset);

Usage Examples:

const bb = ByteBuffer.allocate(128);

// IString - uses 4-byte uint32 length prefix
bb.writeIString("Hello World!");
bb.writeIString("Short");
bb.writeIString("");

// VString - uses varint32 length prefix (more space-efficient)
bb.writeVString("Hello World!"); 
bb.writeVString("Short");
bb.writeVString("");

// Read back IStrings
bb.flip();
const istr1 = bb.readIString(); // "Hello World!"
const istr2 = bb.readIString(); // "Short"
const istr3 = bb.readIString(); // ""

// Read back VStrings  
const vstr1 = bb.readVString(); // "Hello World!"
const vstr2 = bb.readVString(); // "Short"
const vstr3 = bb.readVString(); // ""

console.log("IStrings:", istr1, istr2, istr3);
console.log("VStrings:", vstr1, vstr2, vstr3);

// Compare space usage
bb.clear();
bb.writeIString("Hi");  // 4 bytes (length) + 2 bytes (data) = 6 bytes
const iStringSize = bb.offset;

bb.clear();
bb.writeVString("Hi");  // 1 byte (length) + 2 bytes (data) = 3 bytes
const vStringSize = bb.offset;

console.log(`IString: ${iStringSize} bytes, VString: ${vStringSize} bytes`);

String Calculation Utilities

Calculate the number of bytes and characters needed for UTF-8 string encoding.

/**
 * Calculate number of bytes required to encode string as UTF-8
 * @param {string} str - String to calculate for
 * @returns {number} Number of bytes required
 */
ByteBuffer.calculateUTF8Bytes(str);

/**
 * Alias for calculateUTF8Bytes
 */
ByteBuffer.calculateString(str);

/**
 * Calculate number of characters in UTF-8 string
 * @param {string} str - String to calculate for
 * @returns {number} Number of Unicode characters
 */
ByteBuffer.calculateUTF8Chars(str);

Usage Examples:

// Test various strings
const testStrings = [
    "Hello",           // ASCII characters
    "Café",            // Latin characters with accents
    "日本語",           // CJK characters  
    "🚀🌟✨",           // Emoji
    "𝓗𝓮𝓵𝓵𝓸",           // Mathematical script characters
];

testStrings.forEach(str => {
    const bytes = ByteBuffer.calculateUTF8Bytes(str);
    const chars = ByteBuffer.calculateUTF8Chars(str);
    const jsLength = str.length; // JavaScript string length (UTF-16 code units)
    
    console.log(`"${str}":
        UTF-8 bytes: ${bytes}
        UTF-8 chars: ${chars} 
        JS length: ${jsLength}`);
});

// Output shows differences between byte count, character count, and JS length:
// "Hello": 5 bytes, 5 chars, 5 JS length
// "Café": 5 bytes, 4 chars, 4 JS length  
// "日本語": 9 bytes, 3 chars, 3 JS length
// "🚀🌟✨": 12 bytes, 3 chars, 6 JS length (emoji are surrogate pairs in JS)
// "𝓗𝓮𝓵𝓵𝓸": 20 bytes, 5 chars, 10 JS length (math script uses surrogate pairs)

String Metrics Constants

Constants for specifying how string lengths should be interpreted.

/**
 * Character-based metrics - interpret length as number of Unicode characters
 */
ByteBuffer.METRICS_CHARS = 'c';

/**
 * Byte-based metrics - interpret length as number of UTF-8 bytes
 */
ByteBuffer.METRICS_BYTES = 'b';

Usage Examples:

const bb = ByteBuffer.allocate(32);
const testString = "Café"; // 4 chars, 5 bytes in UTF-8

bb.writeUTF8String(testString);
bb.flip();

// Read by byte count
const byBytes = bb.readUTF8String(5, ByteBuffer.METRICS_BYTES);
console.log(byBytes); // "Café"

bb.offset = 0; // Reset for second read

// Read by character count  
const byChars = bb.readUTF8String(4, ByteBuffer.METRICS_CHARS);
console.log(byChars); // "Café"

// Demonstrate the difference with emoji
bb.clear();
const emojiString = "Hi🚀"; // 3 chars, 6 bytes
bb.writeUTF8String(emojiString);
bb.flip();

const emojiByBytes = bb.readUTF8String(6, 'b'); // 6 bytes
const emojiByChars = bb.readUTF8String(3, 'c'); // 3 characters

// Both should be "Hi🚀" but read using different metrics

Advanced String Operations

Working with different encodings and string manipulation.

Usage Examples:

const bb = ByteBuffer.allocate(128);

// Write mixed content
bb.writeUTF8String("Start: ");
bb.writeIString("Middle part");
bb.writeCString("End");

// Chain string operations
bb.clear()
  .writeVString("First")
  .writeVString("Second") 
  .writeVString("Third");

// Read back in order
bb.flip();
const first = bb.readVString();   // "First"
const second = bb.readVString();  // "Second"
const third = bb.readVString();   // "Third"

// Working with large strings
const largeString = "x".repeat(10000);
bb.clear();
bb.ensureCapacity(ByteBuffer.calculateUTF8Bytes(largeString) + 10);
bb.writeVString(largeString);
bb.flip();
const readLarge = bb.readVString();
console.log(readLarge.length); // 10000

// Empty string handling
bb.clear();
bb.writeCString("");    // Just null terminator
bb.writeIString("");    // 4-byte zero length + no data
bb.writeVString("");    // 1-byte zero length + no data

bb.flip();
console.log(`"${bb.readCString()}"`);  // ""
console.log(`"${bb.readIString()}"`);  // "" 
console.log(`"${bb.readVString()}"`);  // ""

Error Handling

String operations may encounter the following error conditions:

  • Error: When attempting to read beyond buffer limits
  • Error: When invalid UTF-8 sequences are encountered
  • RangeError: When string is too large for available buffer space
  • TypeError: When non-string values are provided to string write methods

Example Error Handling:

const bb = ByteBuffer.allocate(16);

try {
    // This may throw RangeError if string is too large
    const largeString = "x".repeat(1000);
    bb.writeUTF8String(largeString);
} catch (error) {
    console.error("String too large:", error.message);
}

try {
    // This will throw Error when reading beyond buffer
    bb.readUTF8String(100); // Trying to read 100 bytes from small buffer
} catch (error) {
    console.error("Read beyond buffer:", error.message);
}

try {
    // This will throw TypeError
    bb.writeUTF8String(123); // Number instead of string
} catch (error) {
    console.error("Invalid type:", error.message);
}

// Handle null terminator edge cases
bb.clear();
bb.writeUTF8String("Test\0with\0nulls");
bb.flip();
const withNulls = bb.readCString(); // Stops at first null
console.log(`C-string: "${withNulls}"`); // "Test" (stops at \0)

Performance Considerations

  • UTF-8 encoding/decoding has computational overhead compared to ASCII
  • VString is more space-efficient than IString for short strings
  • CString is space-efficient but requires null-termination scanning
  • Character vs byte metrics - byte metrics are faster as they avoid UTF-8 character counting
  • Large strings may require buffer resizing, which involves memory allocation

Performance Comparison:

// Space efficiency comparison for short strings
const shortString = "Hi";

// Method 1: C-String (3 bytes: 'H', 'i', '\0')
// Method 2: VString (3 bytes: 1-byte length + 'H', 'i') 
// Method 3: IString (6 bytes: 4-byte length + 'H', 'i')

// For longer strings, the length prefix overhead becomes negligible
const longString = "x".repeat(1000);
// C-String: 1001 bytes
// VString: 1002 bytes (1-byte varint length + 1000 data)
// IString: 1004 bytes (4-byte length + 1000 data)