UTF-8 encoder and decoder for robust text processing with validation
npx @tessl/cli install tessl/npm-stablelib--utf8@2.0.0@stablelib/utf8 provides robust UTF-8 encoding and decoding functionality implemented in TypeScript. It handles conversion between JavaScript strings and UTF-8 byte arrays with comprehensive validation of both UTF-16 surrogate pairs and UTF-8 byte sequences.
npm install @stablelib/utf8import { encode, decode, encodedLength } from "@stablelib/utf8";For CommonJS:
const { encode, decode, encodedLength } = require("@stablelib/utf8");import { encode, decode, encodedLength } from "@stablelib/utf8";
// Encode a string to UTF-8 bytes
const text = "Hello, δΈη! π";
const bytes = encode(text);
// Calculate encoded length without encoding
const length = encodedLength(text);
console.log(length === bytes.length); // true
// Decode UTF-8 bytes back to string
const decoded = decode(bytes);
console.log(decoded === text); // true
// Handle validation errors
try {
// This will throw for invalid UTF-16 input
encode("Invalid surrogate pair: \uD800");
} catch (error) {
console.error(error.message); // "utf8: invalid string"
}Converts JavaScript strings to UTF-8 byte arrays with validation.
/**
* Encodes the given string into UTF-8 byte array.
* Throws if the source string has invalid UTF-16 encoding.
* @param s - The string to encode
* @returns UTF-8 encoded byte array
* @throws Error with message "utf8: invalid string" for invalid UTF-16
*/
function encode(s: string): Uint8Array;Usage Examples:
import { encode } from "@stablelib/utf8";
// Basic ASCII
const ascii = encode("Hello");
// Result: Uint8Array([72, 101, 108, 108, 111])
// Unicode characters
const unicode = encode("γγγ«γ‘γ―");
// Result: UTF-8 encoded bytes for Japanese text
// Emoji with surrogate pairs
const emoji = encode("π");
// Result: UTF-8 encoded bytes for Earth emoji
// Error handling
try {
encode("Invalid: \uD800"); // Lone high surrogate
} catch (error) {
console.error(error.message); // "utf8: invalid string"
}Converts UTF-8 byte arrays back to JavaScript strings with validation.
/**
* Decodes the given byte array from UTF-8 into a string.
* Throws if encoding is invalid.
* @param arr - The UTF-8 byte array to decode
* @returns Decoded string
* @throws Error with message "utf8: invalid source encoding" for invalid UTF-8
*/
function decode(arr: Uint8Array): string;Usage Examples:
import { decode } from "@stablelib/utf8";
// Basic decoding
const bytes = new Uint8Array([72, 101, 108, 108, 111]);
const text = decode(bytes);
// Result: "Hello"
// Unicode decoding
const unicodeBytes = new Uint8Array([227, 129, 147, 227, 130, 147, 227, 129, 171, 227, 129, 161, 227, 129, 175]);
const unicodeText = decode(unicodeBytes);
// Result: "γγγ«γ‘γ―"
// Error handling
try {
decode(new Uint8Array([0xFF])); // Invalid UTF-8 byte
} catch (error) {
console.error(error.message); // "utf8: invalid source encoding"
}Calculates the number of bytes required to encode a string without performing the actual encoding.
/**
* Returns the number of bytes required to encode the given string into UTF-8.
* Throws if the source string has invalid UTF-16 encoding.
* @param s - The string to measure
* @returns Number of bytes needed for UTF-8 encoding
* @throws Error with message "utf8: invalid string" for invalid UTF-16
*/
function encodedLength(s: string): number;Usage Examples:
import { encodedLength, encode } from "@stablelib/utf8";
// Calculate length for memory allocation
const text = "Hello, δΈη!";
const length = encodedLength(text);
console.log(length); // 13 bytes
// Verify length matches actual encoding
const encoded = encode(text);
console.log(length === encoded.length); // true
// Performance optimization - check size before encoding
if (encodedLength(largeText) > MAX_BUFFER_SIZE) {
throw new Error("Text too large to encode");
}The library provides comprehensive validation with descriptive error messages:
Thrown by encode() and encodedLength() for invalid UTF-16 input:
"utf8: invalid string"Thrown by decode() for invalid UTF-8 byte sequences:
"utf8: invalid source encoding"