String utility functions for Ethereum development, focusing on UTF-8 encoding/decoding, Bytes32 string formatting, and Unicode normalization.
—
Pending
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Pending
The risk profile of this skill
String utility functions for Ethereum development, focusing on safe conversion between UTF-8 data, JavaScript strings, and Bytes32 strings. This package provides essential string manipulation utilities with proper encoding safety and gas-efficient storage patterns for blockchain applications.
npm install @ethersproject/stringsimport {
toUtf8Bytes,
toUtf8String,
toUtf8CodePoints,
formatBytes32String,
parseBytes32String,
nameprep,
UnicodeNormalizationForm,
Utf8ErrorFuncs,
Utf8ErrorReason,
type Utf8ErrorFunc
} from "@ethersproject/strings";For CommonJS:
const {
toUtf8Bytes,
toUtf8String,
formatBytes32String,
parseBytes32String,
nameprep
} = require("@ethersproject/strings");import {
toUtf8Bytes,
toUtf8String,
formatBytes32String,
parseBytes32String
} from "@ethersproject/strings";
// Convert string to UTF-8 bytes
const message = "Hello, Ethereum!";
const bytes = toUtf8Bytes(message);
console.log(bytes); // Uint8Array
// Convert UTF-8 bytes back to string
const decoded = toUtf8String(bytes);
console.log(decoded); // "Hello, Ethereum!"
// Format short string for efficient on-chain storage
const bytes32 = formatBytes32String("ENS");
console.log(bytes32); // "0x454e530000000000000000000000000000000000000000000000000000000000"
// Parse bytes32 back to string
const parsed = parseBytes32String(bytes32);
console.log(parsed); // "ENS"The @ethersproject/strings package is organized around three core capabilities:
Safe conversion between JavaScript strings and UTF-8 encoded bytes with comprehensive error handling strategies.
/**
* Converts a JavaScript string to UTF-8 encoded bytes
* @param str - JavaScript string to encode
* @param form - Optional Unicode normalization form (default: current)
* @returns UTF-8 encoded bytes as Uint8Array
*/
function toUtf8Bytes(
str: string,
form?: UnicodeNormalizationForm
): Uint8Array;
/**
* Converts UTF-8 encoded bytes to a JavaScript string
* @param bytes - UTF-8 encoded bytes to decode
* @param onError - Optional error handling function
* @returns Decoded JavaScript string
*/
function toUtf8String(
bytes: BytesLike,
onError?: Utf8ErrorFunc
): string;
/**
* Converts a JavaScript string to an array of UTF-8 code points
* @param str - JavaScript string to convert
* @param form - Optional Unicode normalization form (default: current)
* @returns Array of UTF-8 code points
*/
function toUtf8CodePoints(
str: string,
form?: UnicodeNormalizationForm
): Array<number>;
/**
* Internal function to convert bytes to escaped UTF-8 string representation
* @param bytes - Bytes to convert
* @param onError - Optional error handling function
* @returns Escaped string representation with proper JSON encoding
*/
function _toEscapedUtf8String(
bytes: BytesLike,
onError?: Utf8ErrorFunc
): string;Usage Examples:
import { toUtf8Bytes, toUtf8String, toUtf8CodePoints, UnicodeNormalizationForm } from "@ethersproject/strings";
// Basic encoding/decoding
const text = "Hello 世界";
const bytes = toUtf8Bytes(text);
const decoded = toUtf8String(bytes);
// With Unicode normalization
const normalizedBytes = toUtf8Bytes("café", UnicodeNormalizationForm.NFC);
// Get code points
const codePoints = toUtf8CodePoints("🚀");
console.log(codePoints); // [128640]
// Error handling with custom function
const malformedBytes = new Uint8Array([0xff, 0xfe]);
const safeDecoded = toUtf8String(malformedBytes, (reason, offset, bytes, output) => {
console.log(`UTF-8 error: ${reason} at offset ${offset}`);
return 0; // Skip invalid bytes
});Efficient formatting and parsing of strings for on-chain storage using 32-byte fixed-length format.
/**
* Formats a string as a bytes32 hex string for efficient on-chain storage
* @param text - String to format (must be ≤31 bytes when UTF-8 encoded)
* @returns Hex-encoded bytes32 string (32 bytes, null-terminated)
* @throws Error if string is too long (>31 bytes)
*/
function formatBytes32String(text: string): string;
/**
* Parses a bytes32 hex string back to its original string value
* @param bytes - Bytes32 data to parse (must be exactly 32 bytes)
* @returns Original string value
* @throws Error if not 32 bytes or missing null terminator
*/
function parseBytes32String(bytes: BytesLike): string;Usage Examples:
import { formatBytes32String, parseBytes32String } from "@ethersproject/strings";
// Format string for on-chain storage
const contractName = "MyToken";
const bytes32Name = formatBytes32String(contractName);
console.log(bytes32Name);
// "0x4d79546f6b656e00000000000000000000000000000000000000000000000000"
// Parse back to original string
const originalName = parseBytes32String(bytes32Name);
console.log(originalName); // "MyToken"
// Error cases
try {
formatBytes32String("This string is way too long to fit in 32 bytes");
} catch (error) {
console.log("String too long for bytes32 format");
}
try {
parseBytes32String("0x1234"); // Not 32 bytes
} catch (error) {
console.log("Invalid bytes32 - not 32 bytes long");
}Unicode normalization functionality for internationalized domain names following RFC 3491.
/**
* Applies nameprep algorithm for internationalized domain names (RFC 3491)
* @param value - String to process with nameprep
* @returns Processed string with case folding and normalization applied
* @throws Error for prohibited characters or invalid format
*/
function nameprep(value: string): string;Usage Examples:
import { nameprep } from "@ethersproject/strings";
// Basic nameprep processing
const domain = "EXAMPLE.COM";
const processed = nameprep(domain);
console.log(processed); // "example.com"
// International domain names
const idn = "Bücher.example";
const processedIdn = nameprep(idn);
console.log(processedIdn); // Normalized form
// Error handling
try {
nameprep("invalid--domain");
} catch (error) {
console.log("Invalid hyphen pattern");
}Unicode normalization forms for string processing.
enum UnicodeNormalizationForm {
/** No normalization applied */
current = "",
/** Canonical Composition */
NFC = "NFC",
/** Canonical Decomposition */
NFD = "NFD",
/** Compatibility Composition */
NFKC = "NFKC",
/** Compatibility Decomposition */
NFKD = "NFKD"
}Essential types used throughout the package.
/**
* Type representing data that can be interpreted as bytes
* Accepts hex strings (e.g., "0x1234") or array-like structures containing numbers (0-255)
*/
type BytesLike = ArrayLike<number> | string;Error handling types and constants for UTF-8 operations.
enum Utf8ErrorReason {
/** A continuation byte was present where there was nothing to continue */
UNEXPECTED_CONTINUE = "unexpected continuation byte",
/** An invalid (non-continuation) byte to start a UTF-8 codepoint was found */
BAD_PREFIX = "bad codepoint prefix",
/** The string is too short to process the expected codepoint */
OVERRUN = "string overrun",
/** A missing continuation byte was expected but not found */
MISSING_CONTINUE = "missing continuation byte",
/** The computed code point is outside the range for UTF-8 */
OUT_OF_RANGE = "out of UTF-8 range",
/** UTF-8 strings may not contain UTF-16 surrogate pairs */
UTF16_SURROGATE = "UTF-16 surrogate",
/** The string is an overlong representation */
OVERLONG = "overlong representation"
}
/**
* Function type for handling UTF-8 decoding errors
* @param reason - The type of error that occurred
* @param offset - Byte offset where the error occurred
* @param bytes - The input byte array being processed
* @param output - The output array being built
* @param badCodepoint - The invalid codepoint (if applicable)
* @returns Number of bytes to skip
*/
type Utf8ErrorFunc = (
reason: Utf8ErrorReason,
offset: number,
bytes: ArrayLike<number>,
output: Array<number>,
badCodepoint?: number
) => number;
/**
* Predefined error handling strategies for UTF-8 decoding
*/
const Utf8ErrorFuncs: {
/** Throws an error on invalid UTF-8 (default behavior) */
error: Utf8ErrorFunc;
/** Skips invalid UTF-8 sequences silently */
ignore: Utf8ErrorFunc;
/** Replaces invalid UTF-8 with replacement character (U+FFFD) */
replace: Utf8ErrorFunc;
};Usage Examples:
import { toUtf8String, Utf8ErrorFuncs, Utf8ErrorReason } from "@ethersproject/strings";
// Using predefined error handlers
const malformedBytes = new Uint8Array([0xc0, 0x80]); // Invalid sequence
// Throw error (default)
try {
toUtf8String(malformedBytes, Utf8ErrorFuncs.error);
} catch (error) {
console.log("UTF-8 decode error");
}
// Ignore invalid sequences
const ignoredResult = toUtf8String(malformedBytes, Utf8ErrorFuncs.ignore);
// Replace with replacement character
const replacedResult = toUtf8String(malformedBytes, Utf8ErrorFuncs.replace);
// Custom error handler
const customHandler = (reason: Utf8ErrorReason, offset: number) => {
console.log(`Custom handler: ${reason} at ${offset}`);
return 1; // Skip 1 byte
};
const customResult = toUtf8String(malformedBytes, customHandler);The package provides comprehensive error handling for various scenarios:
All errors provide descriptive messages and maintain consistency with Ethereum ecosystem error patterns.