or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

index.md
tile.json

tessl/npm-stablelib--utf8

UTF-8 encoder and decoder for robust text processing with validation

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
npmpkg:npm/@stablelib/utf8@2.0.x

To install, run

npx @tessl/cli install tessl/npm-stablelib--utf8@2.0.0

index.mddocs/

@stablelib/utf8

@stablelib/utf8 provides robust UTF-8 encoding and decoding functionality implemented in TypeScript. It handles conversion between JavaScript strings and UTF-8 byte arrays with comprehensive validation of both UTF-16 surrogate pairs and UTF-8 byte sequences.

Package Information

  • Package Name: @stablelib/utf8
  • Package Type: npm
  • Language: TypeScript
  • Installation: npm install @stablelib/utf8

Core Imports

import { encode, decode, encodedLength } from "@stablelib/utf8";

For CommonJS:

const { encode, decode, encodedLength } = require("@stablelib/utf8");

Basic Usage

import { encode, decode, encodedLength } from "@stablelib/utf8";

// Encode a string to UTF-8 bytes
const text = "Hello, δΈ–η•Œ! 🌍";
const bytes = encode(text);

// Calculate encoded length without encoding
const length = encodedLength(text);
console.log(length === bytes.length); // true

// Decode UTF-8 bytes back to string
const decoded = decode(bytes);
console.log(decoded === text); // true

// Handle validation errors
try {
  // This will throw for invalid UTF-16 input
  encode("Invalid surrogate pair: \uD800");
} catch (error) {
  console.error(error.message); // "utf8: invalid string"
}

Capabilities

String Encoding

Converts JavaScript strings to UTF-8 byte arrays with validation.

/**
 * Encodes the given string into UTF-8 byte array.
 * Throws if the source string has invalid UTF-16 encoding.
 * @param s - The string to encode
 * @returns UTF-8 encoded byte array
 * @throws Error with message "utf8: invalid string" for invalid UTF-16
 */
function encode(s: string): Uint8Array;

Usage Examples:

import { encode } from "@stablelib/utf8";

// Basic ASCII
const ascii = encode("Hello");
// Result: Uint8Array([72, 101, 108, 108, 111])

// Unicode characters
const unicode = encode("こんにけは");
// Result: UTF-8 encoded bytes for Japanese text

// Emoji with surrogate pairs
const emoji = encode("🌍");
// Result: UTF-8 encoded bytes for Earth emoji

// Error handling
try {
  encode("Invalid: \uD800"); // Lone high surrogate
} catch (error) {
  console.error(error.message); // "utf8: invalid string"
}

Byte Decoding

Converts UTF-8 byte arrays back to JavaScript strings with validation.

/**
 * Decodes the given byte array from UTF-8 into a string.
 * Throws if encoding is invalid.
 * @param arr - The UTF-8 byte array to decode
 * @returns Decoded string
 * @throws Error with message "utf8: invalid source encoding" for invalid UTF-8
 */
function decode(arr: Uint8Array): string;

Usage Examples:

import { decode } from "@stablelib/utf8";

// Basic decoding
const bytes = new Uint8Array([72, 101, 108, 108, 111]);
const text = decode(bytes);
// Result: "Hello"

// Unicode decoding
const unicodeBytes = new Uint8Array([227, 129, 147, 227, 130, 147, 227, 129, 171, 227, 129, 161, 227, 129, 175]);
const unicodeText = decode(unicodeBytes);
// Result: "こんにけは"

// Error handling
try {
  decode(new Uint8Array([0xFF])); // Invalid UTF-8 byte
} catch (error) {
  console.error(error.message); // "utf8: invalid source encoding"
}

Length Calculation

Calculates the number of bytes required to encode a string without performing the actual encoding.

/**
 * Returns the number of bytes required to encode the given string into UTF-8.
 * Throws if the source string has invalid UTF-16 encoding.
 * @param s - The string to measure
 * @returns Number of bytes needed for UTF-8 encoding
 * @throws Error with message "utf8: invalid string" for invalid UTF-16
 */
function encodedLength(s: string): number;

Usage Examples:

import { encodedLength, encode } from "@stablelib/utf8";

// Calculate length for memory allocation
const text = "Hello, δΈ–η•Œ!";
const length = encodedLength(text);
console.log(length); // 13 bytes

// Verify length matches actual encoding
const encoded = encode(text);
console.log(length === encoded.length); // true

// Performance optimization - check size before encoding
if (encodedLength(largeText) > MAX_BUFFER_SIZE) {
  throw new Error("Text too large to encode");
}

Error Handling

The library provides comprehensive validation with descriptive error messages:

UTF-16 Validation Errors

Thrown by encode() and encodedLength() for invalid UTF-16 input:

  • Error Message: "utf8: invalid string"
  • Common Causes:
    • Lone high surrogate (0xD800-0xDBFF) without matching low surrogate
    • Lone low surrogate (0xDC00-0xDFFF) without preceding high surrogate
    • Invalid surrogate pair sequences

UTF-8 Validation Errors

Thrown by decode() for invalid UTF-8 byte sequences:

  • Error Message: "utf8: invalid source encoding"
  • Common Causes:
    • Invalid start bytes (0xFE, 0xFF)
    • Incomplete multi-byte sequences
    • Invalid continuation bytes
    • Overlong encodings
    • Invalid code points (surrogate range, above U+10FFFF)

Performance Characteristics

  • Zero Dependencies: No runtime dependencies for maximum compatibility
  • Efficient Encoding: Single-pass algorithm with pre-calculated buffer allocation
  • Validation: Comprehensive validation without performance degradation
  • Memory Safe: Proper bounds checking for all array access
  • TypeScript: Full type safety with accurate type definitions