or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

index.md

tile.json

tessl/npm-stablelib--utf8

UTF-8 encoder and decoder for robust text processing with validation

Workspace: tessl
Visibility: Public
Created: 3 months ago
Last updated: 3 months ago
Describes: pkg:npm/@stablelib/utf8@2.0.x

To install, run

npx @tessl/cli install tessl/npm-stablelib--utf8@2.0.0

@stablelib/utf8

@stablelib/utf8 provides robust UTF-8 encoding and decoding functionality implemented in TypeScript. It handles conversion between JavaScript strings and UTF-8 byte arrays with comprehensive validation of both UTF-16 surrogate pairs and UTF-8 byte sequences.

Package Information

Package Name: @stablelib/utf8
Package Type: npm
Language: TypeScript
Installation: npm install @stablelib/utf8

Core Imports

import { encode, decode, encodedLength } from "@stablelib/utf8";

For CommonJS:

const { encode, decode, encodedLength } = require("@stablelib/utf8");

Basic Usage

import { encode, decode, encodedLength } from "@stablelib/utf8";

// Encode a string to UTF-8 bytes
const text = "Hello, 世界! 🌍";
const bytes = encode(text);

// Calculate encoded length without encoding
const length = encodedLength(text);
console.log(length === bytes.length); // true

// Decode UTF-8 bytes back to string
const decoded = decode(bytes);
console.log(decoded === text); // true

// Handle validation errors
try {
  // This will throw for invalid UTF-16 input
  encode("Invalid surrogate pair: \uD800");
} catch (error) {
  console.error(error.message); // "utf8: invalid string"
}

Capabilities

String Encoding

Converts JavaScript strings to UTF-8 byte arrays with validation.

/**
 * Encodes the given string into UTF-8 byte array.
 * Throws if the source string has invalid UTF-16 encoding.
 * @param s - The string to encode
 * @returns UTF-8 encoded byte array
 * @throws Error with message "utf8: invalid string" for invalid UTF-16
 */
function encode(s: string): Uint8Array;

Usage Examples:

import { encode } from "@stablelib/utf8";

// Basic ASCII
const ascii = encode("Hello");
// Result: Uint8Array([72, 101, 108, 108, 111])

// Unicode characters
const unicode = encode("こんにちは");
// Result: UTF-8 encoded bytes for Japanese text

// Emoji with surrogate pairs
const emoji = encode("🌍");
// Result: UTF-8 encoded bytes for Earth emoji

// Error handling
try {
  encode("Invalid: \uD800"); // Lone high surrogate
} catch (error) {
  console.error(error.message); // "utf8: invalid string"
}

Byte Decoding

Converts UTF-8 byte arrays back to JavaScript strings with validation.

/**
 * Decodes the given byte array from UTF-8 into a string.
 * Throws if encoding is invalid.
 * @param arr - The UTF-8 byte array to decode
 * @returns Decoded string
 * @throws Error with message "utf8: invalid source encoding" for invalid UTF-8
 */
function decode(arr: Uint8Array): string;

Usage Examples:

import { decode } from "@stablelib/utf8";

// Basic decoding
const bytes = new Uint8Array([72, 101, 108, 108, 111]);
const text = decode(bytes);
// Result: "Hello"

// Unicode decoding
const unicodeBytes = new Uint8Array([227, 129, 147, 227, 130, 147, 227, 129, 171, 227, 129, 161, 227, 129, 175]);
const unicodeText = decode(unicodeBytes);
// Result: "こんにちは"

// Error handling
try {
  decode(new Uint8Array([0xFF])); // Invalid UTF-8 byte
} catch (error) {
  console.error(error.message); // "utf8: invalid source encoding"
}

Length Calculation

Calculates the number of bytes required to encode a string without performing the actual encoding.

/**
 * Returns the number of bytes required to encode the given string into UTF-8.
 * Throws if the source string has invalid UTF-16 encoding.
 * @param s - The string to measure
 * @returns Number of bytes needed for UTF-8 encoding
 * @throws Error with message "utf8: invalid string" for invalid UTF-16
 */
function encodedLength(s: string): number;

Usage Examples:

import { encodedLength, encode } from "@stablelib/utf8";

// Calculate length for memory allocation
const text = "Hello, 世界!";
const length = encodedLength(text);
console.log(length); // 13 bytes

// Verify length matches actual encoding
const encoded = encode(text);
console.log(length === encoded.length); // true

// Performance optimization - check size before encoding
if (encodedLength(largeText) > MAX_BUFFER_SIZE) {
  throw new Error("Text too large to encode");
}

Error Handling

The library provides comprehensive validation with descriptive error messages:

UTF-16 Validation Errors

Thrown by encode() and encodedLength() for invalid UTF-16 input:

Error Message: "utf8: invalid string"
Common Causes:
- Lone high surrogate (0xD800-0xDBFF) without matching low surrogate
- Lone low surrogate (0xDC00-0xDFFF) without preceding high surrogate
- Invalid surrogate pair sequences

UTF-8 Validation Errors

Thrown by decode() for invalid UTF-8 byte sequences:

Error Message: "utf8: invalid source encoding"
Common Causes:
- Invalid start bytes (0xFE, 0xFF)
- Incomplete multi-byte sequences
- Invalid continuation bytes
- Overlong encodings
- Invalid code points (surrogate range, above U+10FFFF)

Performance Characteristics

Zero Dependencies: No runtime dependencies for maximum compatibility
Efficient Encoding: Single-pass algorithm with pre-calculated buffer allocation
Validation: Comprehensive validation without performance degradation
Memory Safe: Proper bounds checking for all array access
TypeScript: Full type safety with accurate type definitions

Version

Tile

Files

tessl/npm-stablelib--utf8

To install, run

index.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

@stablelib/utf8

Package Information

Core Imports

Basic Usage

Capabilities

String Encoding

Byte Decoding

Length Calculation

Error Handling

UTF-16 Validation Errors

UTF-8 Validation Errors

Performance Characteristics

index.mddocs/