CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/npm-stablelib--utf8

UTF-8 encoder and decoder for robust text processing with validation

Pending
Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Pending

The risk profile of this skill

Overview
Eval results
Files

@stablelib/utf8

@stablelib/utf8 provides robust UTF-8 encoding and decoding functionality implemented in TypeScript. It handles conversion between JavaScript strings and UTF-8 byte arrays with comprehensive validation of both UTF-16 surrogate pairs and UTF-8 byte sequences.

Package Information

  • Package Name: @stablelib/utf8
  • Package Type: npm
  • Language: TypeScript
  • Installation: npm install @stablelib/utf8

Core Imports

import { encode, decode, encodedLength } from "@stablelib/utf8";

For CommonJS:

const { encode, decode, encodedLength } = require("@stablelib/utf8");

Basic Usage

import { encode, decode, encodedLength } from "@stablelib/utf8";

// Encode a string to UTF-8 bytes
const text = "Hello, 世界! 🌍";
const bytes = encode(text);

// Calculate encoded length without encoding
const length = encodedLength(text);
console.log(length === bytes.length); // true

// Decode UTF-8 bytes back to string
const decoded = decode(bytes);
console.log(decoded === text); // true

// Handle validation errors
try {
  // This will throw for invalid UTF-16 input
  encode("Invalid surrogate pair: \uD800");
} catch (error) {
  console.error(error.message); // "utf8: invalid string"
}

Capabilities

String Encoding

Converts JavaScript strings to UTF-8 byte arrays with validation.

/**
 * Encodes the given string into UTF-8 byte array.
 * Throws if the source string has invalid UTF-16 encoding.
 * @param s - The string to encode
 * @returns UTF-8 encoded byte array
 * @throws Error with message "utf8: invalid string" for invalid UTF-16
 */
function encode(s: string): Uint8Array;

Usage Examples:

import { encode } from "@stablelib/utf8";

// Basic ASCII
const ascii = encode("Hello");
// Result: Uint8Array([72, 101, 108, 108, 111])

// Unicode characters
const unicode = encode("こんにちは");
// Result: UTF-8 encoded bytes for Japanese text

// Emoji with surrogate pairs
const emoji = encode("🌍");
// Result: UTF-8 encoded bytes for Earth emoji

// Error handling
try {
  encode("Invalid: \uD800"); // Lone high surrogate
} catch (error) {
  console.error(error.message); // "utf8: invalid string"
}

Byte Decoding

Converts UTF-8 byte arrays back to JavaScript strings with validation.

/**
 * Decodes the given byte array from UTF-8 into a string.
 * Throws if encoding is invalid.
 * @param arr - The UTF-8 byte array to decode
 * @returns Decoded string
 * @throws Error with message "utf8: invalid source encoding" for invalid UTF-8
 */
function decode(arr: Uint8Array): string;

Usage Examples:

import { decode } from "@stablelib/utf8";

// Basic decoding
const bytes = new Uint8Array([72, 101, 108, 108, 111]);
const text = decode(bytes);
// Result: "Hello"

// Unicode decoding
const unicodeBytes = new Uint8Array([227, 129, 147, 227, 130, 147, 227, 129, 171, 227, 129, 161, 227, 129, 175]);
const unicodeText = decode(unicodeBytes);
// Result: "こんにちは"

// Error handling
try {
  decode(new Uint8Array([0xFF])); // Invalid UTF-8 byte
} catch (error) {
  console.error(error.message); // "utf8: invalid source encoding"
}

Length Calculation

Calculates the number of bytes required to encode a string without performing the actual encoding.

/**
 * Returns the number of bytes required to encode the given string into UTF-8.
 * Throws if the source string has invalid UTF-16 encoding.
 * @param s - The string to measure
 * @returns Number of bytes needed for UTF-8 encoding
 * @throws Error with message "utf8: invalid string" for invalid UTF-16
 */
function encodedLength(s: string): number;

Usage Examples:

import { encodedLength, encode } from "@stablelib/utf8";

// Calculate length for memory allocation
const text = "Hello, 世界!";
const length = encodedLength(text);
console.log(length); // 13 bytes

// Verify length matches actual encoding
const encoded = encode(text);
console.log(length === encoded.length); // true

// Performance optimization - check size before encoding
if (encodedLength(largeText) > MAX_BUFFER_SIZE) {
  throw new Error("Text too large to encode");
}

Error Handling

The library provides comprehensive validation with descriptive error messages:

UTF-16 Validation Errors

Thrown by encode() and encodedLength() for invalid UTF-16 input:

  • Error Message: "utf8: invalid string"
  • Common Causes:
    • Lone high surrogate (0xD800-0xDBFF) without matching low surrogate
    • Lone low surrogate (0xDC00-0xDFFF) without preceding high surrogate
    • Invalid surrogate pair sequences

UTF-8 Validation Errors

Thrown by decode() for invalid UTF-8 byte sequences:

  • Error Message: "utf8: invalid source encoding"
  • Common Causes:
    • Invalid start bytes (0xFE, 0xFF)
    • Incomplete multi-byte sequences
    • Invalid continuation bytes
    • Overlong encodings
    • Invalid code points (surrogate range, above U+10FFFF)

Performance Characteristics

  • Zero Dependencies: No runtime dependencies for maximum compatibility
  • Efficient Encoding: Single-pass algorithm with pre-calculated buffer allocation
  • Validation: Comprehensive validation without performance degradation
  • Memory Safe: Proper bounds checking for all array access
  • TypeScript: Full type safety with accurate type definitions
Workspace
tessl
Visibility
Public
Created
Last updated
Describes
npmpkg:npm/@stablelib/utf8@2.0.x
Publish Source
CLI
Badge
tessl/npm-stablelib--utf8 badge