String utility functions for Ethereum development, focusing on UTF-8 encoding/decoding, Bytes32 string formatting, and Unicode normalization.
npx @tessl/cli install tessl/npm-ethersproject--strings@5.8.0String utility functions for Ethereum development, focusing on safe conversion between UTF-8 data, JavaScript strings, and Bytes32 strings. This package provides essential string manipulation utilities with proper encoding safety and gas-efficient storage patterns for blockchain applications.
npm install @ethersproject/stringsimport {
toUtf8Bytes,
toUtf8String,
toUtf8CodePoints,
formatBytes32String,
parseBytes32String,
nameprep,
UnicodeNormalizationForm,
Utf8ErrorFuncs,
Utf8ErrorReason,
type Utf8ErrorFunc
} from "@ethersproject/strings";For CommonJS:
const {
toUtf8Bytes,
toUtf8String,
formatBytes32String,
parseBytes32String,
nameprep
} = require("@ethersproject/strings");import {
toUtf8Bytes,
toUtf8String,
formatBytes32String,
parseBytes32String
} from "@ethersproject/strings";
// Convert string to UTF-8 bytes
const message = "Hello, Ethereum!";
const bytes = toUtf8Bytes(message);
console.log(bytes); // Uint8Array
// Convert UTF-8 bytes back to string
const decoded = toUtf8String(bytes);
console.log(decoded); // "Hello, Ethereum!"
// Format short string for efficient on-chain storage
const bytes32 = formatBytes32String("ENS");
console.log(bytes32); // "0x454e530000000000000000000000000000000000000000000000000000000000"
// Parse bytes32 back to string
const parsed = parseBytes32String(bytes32);
console.log(parsed); // "ENS"The @ethersproject/strings package is organized around three core capabilities:
Safe conversion between JavaScript strings and UTF-8 encoded bytes with comprehensive error handling strategies.
/**
* Converts a JavaScript string to UTF-8 encoded bytes
* @param str - JavaScript string to encode
* @param form - Optional Unicode normalization form (default: current)
* @returns UTF-8 encoded bytes as Uint8Array
*/
function toUtf8Bytes(
str: string,
form?: UnicodeNormalizationForm
): Uint8Array;
/**
* Converts UTF-8 encoded bytes to a JavaScript string
* @param bytes - UTF-8 encoded bytes to decode
* @param onError - Optional error handling function
* @returns Decoded JavaScript string
*/
function toUtf8String(
bytes: BytesLike,
onError?: Utf8ErrorFunc
): string;
/**
* Converts a JavaScript string to an array of UTF-8 code points
* @param str - JavaScript string to convert
* @param form - Optional Unicode normalization form (default: current)
* @returns Array of UTF-8 code points
*/
function toUtf8CodePoints(
str: string,
form?: UnicodeNormalizationForm
): Array<number>;
/**
* Internal function to convert bytes to escaped UTF-8 string representation
* @param bytes - Bytes to convert
* @param onError - Optional error handling function
* @returns Escaped string representation with proper JSON encoding
*/
function _toEscapedUtf8String(
bytes: BytesLike,
onError?: Utf8ErrorFunc
): string;Usage Examples:
import { toUtf8Bytes, toUtf8String, toUtf8CodePoints, UnicodeNormalizationForm } from "@ethersproject/strings";
// Basic encoding/decoding
const text = "Hello 世界";
const bytes = toUtf8Bytes(text);
const decoded = toUtf8String(bytes);
// With Unicode normalization
const normalizedBytes = toUtf8Bytes("café", UnicodeNormalizationForm.NFC);
// Get code points
const codePoints = toUtf8CodePoints("🚀");
console.log(codePoints); // [128640]
// Error handling with custom function
const malformedBytes = new Uint8Array([0xff, 0xfe]);
const safeDecoded = toUtf8String(malformedBytes, (reason, offset, bytes, output) => {
console.log(`UTF-8 error: ${reason} at offset ${offset}`);
return 0; // Skip invalid bytes
});Efficient formatting and parsing of strings for on-chain storage using 32-byte fixed-length format.
/**
* Formats a string as a bytes32 hex string for efficient on-chain storage
* @param text - String to format (must be ≤31 bytes when UTF-8 encoded)
* @returns Hex-encoded bytes32 string (32 bytes, null-terminated)
* @throws Error if string is too long (>31 bytes)
*/
function formatBytes32String(text: string): string;
/**
* Parses a bytes32 hex string back to its original string value
* @param bytes - Bytes32 data to parse (must be exactly 32 bytes)
* @returns Original string value
* @throws Error if not 32 bytes or missing null terminator
*/
function parseBytes32String(bytes: BytesLike): string;Usage Examples:
import { formatBytes32String, parseBytes32String } from "@ethersproject/strings";
// Format string for on-chain storage
const contractName = "MyToken";
const bytes32Name = formatBytes32String(contractName);
console.log(bytes32Name);
// "0x4d79546f6b656e00000000000000000000000000000000000000000000000000"
// Parse back to original string
const originalName = parseBytes32String(bytes32Name);
console.log(originalName); // "MyToken"
// Error cases
try {
formatBytes32String("This string is way too long to fit in 32 bytes");
} catch (error) {
console.log("String too long for bytes32 format");
}
try {
parseBytes32String("0x1234"); // Not 32 bytes
} catch (error) {
console.log("Invalid bytes32 - not 32 bytes long");
}Unicode normalization functionality for internationalized domain names following RFC 3491.
/**
* Applies nameprep algorithm for internationalized domain names (RFC 3491)
* @param value - String to process with nameprep
* @returns Processed string with case folding and normalization applied
* @throws Error for prohibited characters or invalid format
*/
function nameprep(value: string): string;Usage Examples:
import { nameprep } from "@ethersproject/strings";
// Basic nameprep processing
const domain = "EXAMPLE.COM";
const processed = nameprep(domain);
console.log(processed); // "example.com"
// International domain names
const idn = "Bücher.example";
const processedIdn = nameprep(idn);
console.log(processedIdn); // Normalized form
// Error handling
try {
nameprep("invalid--domain");
} catch (error) {
console.log("Invalid hyphen pattern");
}Unicode normalization forms for string processing.
enum UnicodeNormalizationForm {
/** No normalization applied */
current = "",
/** Canonical Composition */
NFC = "NFC",
/** Canonical Decomposition */
NFD = "NFD",
/** Compatibility Composition */
NFKC = "NFKC",
/** Compatibility Decomposition */
NFKD = "NFKD"
}Essential types used throughout the package.
/**
* Type representing data that can be interpreted as bytes
* Accepts hex strings (e.g., "0x1234") or array-like structures containing numbers (0-255)
*/
type BytesLike = ArrayLike<number> | string;Error handling types and constants for UTF-8 operations.
enum Utf8ErrorReason {
/** A continuation byte was present where there was nothing to continue */
UNEXPECTED_CONTINUE = "unexpected continuation byte",
/** An invalid (non-continuation) byte to start a UTF-8 codepoint was found */
BAD_PREFIX = "bad codepoint prefix",
/** The string is too short to process the expected codepoint */
OVERRUN = "string overrun",
/** A missing continuation byte was expected but not found */
MISSING_CONTINUE = "missing continuation byte",
/** The computed code point is outside the range for UTF-8 */
OUT_OF_RANGE = "out of UTF-8 range",
/** UTF-8 strings may not contain UTF-16 surrogate pairs */
UTF16_SURROGATE = "UTF-16 surrogate",
/** The string is an overlong representation */
OVERLONG = "overlong representation"
}
/**
* Function type for handling UTF-8 decoding errors
* @param reason - The type of error that occurred
* @param offset - Byte offset where the error occurred
* @param bytes - The input byte array being processed
* @param output - The output array being built
* @param badCodepoint - The invalid codepoint (if applicable)
* @returns Number of bytes to skip
*/
type Utf8ErrorFunc = (
reason: Utf8ErrorReason,
offset: number,
bytes: ArrayLike<number>,
output: Array<number>,
badCodepoint?: number
) => number;
/**
* Predefined error handling strategies for UTF-8 decoding
*/
const Utf8ErrorFuncs: {
/** Throws an error on invalid UTF-8 (default behavior) */
error: Utf8ErrorFunc;
/** Skips invalid UTF-8 sequences silently */
ignore: Utf8ErrorFunc;
/** Replaces invalid UTF-8 with replacement character (U+FFFD) */
replace: Utf8ErrorFunc;
};Usage Examples:
import { toUtf8String, Utf8ErrorFuncs, Utf8ErrorReason } from "@ethersproject/strings";
// Using predefined error handlers
const malformedBytes = new Uint8Array([0xc0, 0x80]); // Invalid sequence
// Throw error (default)
try {
toUtf8String(malformedBytes, Utf8ErrorFuncs.error);
} catch (error) {
console.log("UTF-8 decode error");
}
// Ignore invalid sequences
const ignoredResult = toUtf8String(malformedBytes, Utf8ErrorFuncs.ignore);
// Replace with replacement character
const replacedResult = toUtf8String(malformedBytes, Utf8ErrorFuncs.replace);
// Custom error handler
const customHandler = (reason: Utf8ErrorReason, offset: number) => {
console.log(`Custom handler: ${reason} at ${offset}`);
return 1; // Skip 1 byte
};
const customResult = toUtf8String(malformedBytes, customHandler);The package provides comprehensive error handling for various scenarios:
All errors provide descriptive messages and maintain consistency with Ethereum ecosystem error patterns.