or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

index.md
tile.json

tessl/npm-ethersproject--strings

String utility functions for Ethereum development, focusing on UTF-8 encoding/decoding, Bytes32 string formatting, and Unicode normalization.

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
npmpkg:npm/@ethersproject/strings@5.8.x

To install, run

npx @tessl/cli install tessl/npm-ethersproject--strings@5.8.0

index.mddocs/

@ethersproject/strings

String utility functions for Ethereum development, focusing on safe conversion between UTF-8 data, JavaScript strings, and Bytes32 strings. This package provides essential string manipulation utilities with proper encoding safety and gas-efficient storage patterns for blockchain applications.

Package Information

  • Package Name: @ethersproject/strings
  • Package Type: npm
  • Language: TypeScript
  • Installation: npm install @ethersproject/strings

Core Imports

import { 
  toUtf8Bytes, 
  toUtf8String, 
  toUtf8CodePoints,
  formatBytes32String, 
  parseBytes32String,
  nameprep,
  UnicodeNormalizationForm,
  Utf8ErrorFuncs,
  Utf8ErrorReason,
  type Utf8ErrorFunc
} from "@ethersproject/strings";

For CommonJS:

const { 
  toUtf8Bytes, 
  toUtf8String, 
  formatBytes32String, 
  parseBytes32String,
  nameprep 
} = require("@ethersproject/strings");

Basic Usage

import { 
  toUtf8Bytes, 
  toUtf8String, 
  formatBytes32String, 
  parseBytes32String 
} from "@ethersproject/strings";

// Convert string to UTF-8 bytes
const message = "Hello, Ethereum!";
const bytes = toUtf8Bytes(message);
console.log(bytes); // Uint8Array

// Convert UTF-8 bytes back to string
const decoded = toUtf8String(bytes);
console.log(decoded); // "Hello, Ethereum!"

// Format short string for efficient on-chain storage
const bytes32 = formatBytes32String("ENS");
console.log(bytes32); // "0x454e530000000000000000000000000000000000000000000000000000000000"

// Parse bytes32 back to string
const parsed = parseBytes32String(bytes32);
console.log(parsed); // "ENS"

Architecture

The @ethersproject/strings package is organized around three core capabilities:

  • UTF-8 Operations: Safe encoding and decoding between JavaScript strings and UTF-8 byte arrays with robust error handling
  • Bytes32 Strings: Efficient formatting and parsing of short strings for on-chain storage using 32-byte fixed-length format
  • Unicode Normalization: Nameprep processing for internationalized domain names with full Unicode support

Capabilities

UTF-8 Encoding and Decoding

Safe conversion between JavaScript strings and UTF-8 encoded bytes with comprehensive error handling strategies.

/**
 * Converts a JavaScript string to UTF-8 encoded bytes
 * @param str - JavaScript string to encode
 * @param form - Optional Unicode normalization form (default: current)
 * @returns UTF-8 encoded bytes as Uint8Array
 */
function toUtf8Bytes(
  str: string, 
  form?: UnicodeNormalizationForm
): Uint8Array;

/**
 * Converts UTF-8 encoded bytes to a JavaScript string
 * @param bytes - UTF-8 encoded bytes to decode
 * @param onError - Optional error handling function
 * @returns Decoded JavaScript string
 */
function toUtf8String(
  bytes: BytesLike, 
  onError?: Utf8ErrorFunc
): string;

/**
 * Converts a JavaScript string to an array of UTF-8 code points
 * @param str - JavaScript string to convert
 * @param form - Optional Unicode normalization form (default: current)
 * @returns Array of UTF-8 code points
 */
function toUtf8CodePoints(
  str: string, 
  form?: UnicodeNormalizationForm
): Array<number>;

/**
 * Internal function to convert bytes to escaped UTF-8 string representation
 * @param bytes - Bytes to convert
 * @param onError - Optional error handling function
 * @returns Escaped string representation with proper JSON encoding
 */
function _toEscapedUtf8String(
  bytes: BytesLike, 
  onError?: Utf8ErrorFunc
): string;

Usage Examples:

import { toUtf8Bytes, toUtf8String, toUtf8CodePoints, UnicodeNormalizationForm } from "@ethersproject/strings";

// Basic encoding/decoding
const text = "Hello 世界";
const bytes = toUtf8Bytes(text);
const decoded = toUtf8String(bytes);

// With Unicode normalization
const normalizedBytes = toUtf8Bytes("café", UnicodeNormalizationForm.NFC);

// Get code points
const codePoints = toUtf8CodePoints("🚀");
console.log(codePoints); // [128640]

// Error handling with custom function
const malformedBytes = new Uint8Array([0xff, 0xfe]);
const safeDecoded = toUtf8String(malformedBytes, (reason, offset, bytes, output) => {
  console.log(`UTF-8 error: ${reason} at offset ${offset}`);
  return 0; // Skip invalid bytes
});

Bytes32 String Operations

Efficient formatting and parsing of strings for on-chain storage using 32-byte fixed-length format.

/**
 * Formats a string as a bytes32 hex string for efficient on-chain storage
 * @param text - String to format (must be ≤31 bytes when UTF-8 encoded)
 * @returns Hex-encoded bytes32 string (32 bytes, null-terminated)
 * @throws Error if string is too long (>31 bytes)
 */
function formatBytes32String(text: string): string;

/**
 * Parses a bytes32 hex string back to its original string value
 * @param bytes - Bytes32 data to parse (must be exactly 32 bytes)
 * @returns Original string value
 * @throws Error if not 32 bytes or missing null terminator
 */
function parseBytes32String(bytes: BytesLike): string;

Usage Examples:

import { formatBytes32String, parseBytes32String } from "@ethersproject/strings";

// Format string for on-chain storage
const contractName = "MyToken";
const bytes32Name = formatBytes32String(contractName);
console.log(bytes32Name); 
// "0x4d79546f6b656e00000000000000000000000000000000000000000000000000"

// Parse back to original string
const originalName = parseBytes32String(bytes32Name);
console.log(originalName); // "MyToken"

// Error cases
try {
  formatBytes32String("This string is way too long to fit in 32 bytes");
} catch (error) {
  console.log("String too long for bytes32 format");
}

try {
  parseBytes32String("0x1234"); // Not 32 bytes
} catch (error) {
  console.log("Invalid bytes32 - not 32 bytes long");
}

Unicode Normalization and Nameprep

Unicode normalization functionality for internationalized domain names following RFC 3491.

/**
 * Applies nameprep algorithm for internationalized domain names (RFC 3491)
 * @param value - String to process with nameprep
 * @returns Processed string with case folding and normalization applied
 * @throws Error for prohibited characters or invalid format
 */
function nameprep(value: string): string;

Usage Examples:

import { nameprep } from "@ethersproject/strings";

// Basic nameprep processing
const domain = "EXAMPLE.COM";
const processed = nameprep(domain);
console.log(processed); // "example.com"

// International domain names
const idn = "Bücher.example";
const processedIdn = nameprep(idn);
console.log(processedIdn); // Normalized form

// Error handling
try {
  nameprep("invalid--domain");
} catch (error) {
  console.log("Invalid hyphen pattern");
}

Types and Constants

UnicodeNormalizationForm

Unicode normalization forms for string processing.

enum UnicodeNormalizationForm {
  /** No normalization applied */
  current = "",
  /** Canonical Composition */
  NFC = "NFC",
  /** Canonical Decomposition */
  NFD = "NFD", 
  /** Compatibility Composition */
  NFKC = "NFKC",
  /** Compatibility Decomposition */
  NFKD = "NFKD"
}

Core Types

Essential types used throughout the package.

/**
 * Type representing data that can be interpreted as bytes
 * Accepts hex strings (e.g., "0x1234") or array-like structures containing numbers (0-255)
 */
type BytesLike = ArrayLike<number> | string;

UTF-8 Error Handling

Error handling types and constants for UTF-8 operations.

enum Utf8ErrorReason {
  /** A continuation byte was present where there was nothing to continue */
  UNEXPECTED_CONTINUE = "unexpected continuation byte",
  /** An invalid (non-continuation) byte to start a UTF-8 codepoint was found */
  BAD_PREFIX = "bad codepoint prefix",
  /** The string is too short to process the expected codepoint */
  OVERRUN = "string overrun",
  /** A missing continuation byte was expected but not found */
  MISSING_CONTINUE = "missing continuation byte",
  /** The computed code point is outside the range for UTF-8 */
  OUT_OF_RANGE = "out of UTF-8 range",
  /** UTF-8 strings may not contain UTF-16 surrogate pairs */
  UTF16_SURROGATE = "UTF-16 surrogate",
  /** The string is an overlong representation */
  OVERLONG = "overlong representation"
}

/**
 * Function type for handling UTF-8 decoding errors
 * @param reason - The type of error that occurred
 * @param offset - Byte offset where the error occurred
 * @param bytes - The input byte array being processed
 * @param output - The output array being built
 * @param badCodepoint - The invalid codepoint (if applicable)
 * @returns Number of bytes to skip
 */
type Utf8ErrorFunc = (
  reason: Utf8ErrorReason,
  offset: number,
  bytes: ArrayLike<number>,
  output: Array<number>,
  badCodepoint?: number
) => number;

/**
 * Predefined error handling strategies for UTF-8 decoding
 */
const Utf8ErrorFuncs: {
  /** Throws an error on invalid UTF-8 (default behavior) */
  error: Utf8ErrorFunc;
  /** Skips invalid UTF-8 sequences silently */
  ignore: Utf8ErrorFunc;
  /** Replaces invalid UTF-8 with replacement character (U+FFFD) */
  replace: Utf8ErrorFunc;
};

Usage Examples:

import { toUtf8String, Utf8ErrorFuncs, Utf8ErrorReason } from "@ethersproject/strings";

// Using predefined error handlers
const malformedBytes = new Uint8Array([0xc0, 0x80]); // Invalid sequence

// Throw error (default)
try {
  toUtf8String(malformedBytes, Utf8ErrorFuncs.error);
} catch (error) {
  console.log("UTF-8 decode error");
}

// Ignore invalid sequences
const ignoredResult = toUtf8String(malformedBytes, Utf8ErrorFuncs.ignore);

// Replace with replacement character
const replacedResult = toUtf8String(malformedBytes, Utf8ErrorFuncs.replace);

// Custom error handler
const customHandler = (reason: Utf8ErrorReason, offset: number) => {
  console.log(`Custom handler: ${reason} at ${offset}`);
  return 1; // Skip 1 byte
};
const customResult = toUtf8String(malformedBytes, customHandler);

Error Handling

The package provides comprehensive error handling for various scenarios:

UTF-8 Decoding Errors

  • UNEXPECTED_CONTINUE: Continuation byte without proper prefix
  • BAD_PREFIX: Invalid byte sequence start
  • OVERRUN: Insufficient bytes for expected sequence
  • MISSING_CONTINUE: Expected continuation byte not found
  • OUT_OF_RANGE: Code point outside valid UTF-8 range
  • UTF16_SURROGATE: Invalid UTF-16 surrogate in UTF-8
  • OVERLONG: Unnecessarily long byte sequence

Bytes32 Errors

  • String length exceeding 31 bytes when UTF-8 encoded
  • Invalid bytes32 data (not exactly 32 bytes)
  • Missing null terminator in bytes32 data

Nameprep Errors

  • STRINGPREP_CONTAINS_PROHIBITED: String contains prohibited Unicode characters
  • STRINGPREP_CONTAINS_UNASSIGNED: String contains unassigned Unicode code points
  • Invalid hyphen: Improper hyphen placement in domain names

All errors provide descriptive messages and maintain consistency with Ethereum ecosystem error patterns.