CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/npm-ethersproject--strings

String utility functions for Ethereum development, focusing on UTF-8 encoding/decoding, Bytes32 string formatting, and Unicode normalization.

Pending
Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Pending

The risk profile of this skill

Overview
Eval results
Files

index.mddocs/

@ethersproject/strings

String utility functions for Ethereum development, focusing on safe conversion between UTF-8 data, JavaScript strings, and Bytes32 strings. This package provides essential string manipulation utilities with proper encoding safety and gas-efficient storage patterns for blockchain applications.

Package Information

  • Package Name: @ethersproject/strings
  • Package Type: npm
  • Language: TypeScript
  • Installation: npm install @ethersproject/strings

Core Imports

import { 
  toUtf8Bytes, 
  toUtf8String, 
  toUtf8CodePoints,
  formatBytes32String, 
  parseBytes32String,
  nameprep,
  UnicodeNormalizationForm,
  Utf8ErrorFuncs,
  Utf8ErrorReason,
  type Utf8ErrorFunc
} from "@ethersproject/strings";

For CommonJS:

const { 
  toUtf8Bytes, 
  toUtf8String, 
  formatBytes32String, 
  parseBytes32String,
  nameprep 
} = require("@ethersproject/strings");

Basic Usage

import { 
  toUtf8Bytes, 
  toUtf8String, 
  formatBytes32String, 
  parseBytes32String 
} from "@ethersproject/strings";

// Convert string to UTF-8 bytes
const message = "Hello, Ethereum!";
const bytes = toUtf8Bytes(message);
console.log(bytes); // Uint8Array

// Convert UTF-8 bytes back to string
const decoded = toUtf8String(bytes);
console.log(decoded); // "Hello, Ethereum!"

// Format short string for efficient on-chain storage
const bytes32 = formatBytes32String("ENS");
console.log(bytes32); // "0x454e530000000000000000000000000000000000000000000000000000000000"

// Parse bytes32 back to string
const parsed = parseBytes32String(bytes32);
console.log(parsed); // "ENS"

Architecture

The @ethersproject/strings package is organized around three core capabilities:

  • UTF-8 Operations: Safe encoding and decoding between JavaScript strings and UTF-8 byte arrays with robust error handling
  • Bytes32 Strings: Efficient formatting and parsing of short strings for on-chain storage using 32-byte fixed-length format
  • Unicode Normalization: Nameprep processing for internationalized domain names with full Unicode support

Capabilities

UTF-8 Encoding and Decoding

Safe conversion between JavaScript strings and UTF-8 encoded bytes with comprehensive error handling strategies.

/**
 * Converts a JavaScript string to UTF-8 encoded bytes
 * @param str - JavaScript string to encode
 * @param form - Optional Unicode normalization form (default: current)
 * @returns UTF-8 encoded bytes as Uint8Array
 */
function toUtf8Bytes(
  str: string, 
  form?: UnicodeNormalizationForm
): Uint8Array;

/**
 * Converts UTF-8 encoded bytes to a JavaScript string
 * @param bytes - UTF-8 encoded bytes to decode
 * @param onError - Optional error handling function
 * @returns Decoded JavaScript string
 */
function toUtf8String(
  bytes: BytesLike, 
  onError?: Utf8ErrorFunc
): string;

/**
 * Converts a JavaScript string to an array of UTF-8 code points
 * @param str - JavaScript string to convert
 * @param form - Optional Unicode normalization form (default: current)
 * @returns Array of UTF-8 code points
 */
function toUtf8CodePoints(
  str: string, 
  form?: UnicodeNormalizationForm
): Array<number>;

/**
 * Internal function to convert bytes to escaped UTF-8 string representation
 * @param bytes - Bytes to convert
 * @param onError - Optional error handling function
 * @returns Escaped string representation with proper JSON encoding
 */
function _toEscapedUtf8String(
  bytes: BytesLike, 
  onError?: Utf8ErrorFunc
): string;

Usage Examples:

import { toUtf8Bytes, toUtf8String, toUtf8CodePoints, UnicodeNormalizationForm } from "@ethersproject/strings";

// Basic encoding/decoding
const text = "Hello 世界";
const bytes = toUtf8Bytes(text);
const decoded = toUtf8String(bytes);

// With Unicode normalization
const normalizedBytes = toUtf8Bytes("café", UnicodeNormalizationForm.NFC);

// Get code points
const codePoints = toUtf8CodePoints("🚀");
console.log(codePoints); // [128640]

// Error handling with custom function
const malformedBytes = new Uint8Array([0xff, 0xfe]);
const safeDecoded = toUtf8String(malformedBytes, (reason, offset, bytes, output) => {
  console.log(`UTF-8 error: ${reason} at offset ${offset}`);
  return 0; // Skip invalid bytes
});

Bytes32 String Operations

Efficient formatting and parsing of strings for on-chain storage using 32-byte fixed-length format.

/**
 * Formats a string as a bytes32 hex string for efficient on-chain storage
 * @param text - String to format (must be ≤31 bytes when UTF-8 encoded)
 * @returns Hex-encoded bytes32 string (32 bytes, null-terminated)
 * @throws Error if string is too long (>31 bytes)
 */
function formatBytes32String(text: string): string;

/**
 * Parses a bytes32 hex string back to its original string value
 * @param bytes - Bytes32 data to parse (must be exactly 32 bytes)
 * @returns Original string value
 * @throws Error if not 32 bytes or missing null terminator
 */
function parseBytes32String(bytes: BytesLike): string;

Usage Examples:

import { formatBytes32String, parseBytes32String } from "@ethersproject/strings";

// Format string for on-chain storage
const contractName = "MyToken";
const bytes32Name = formatBytes32String(contractName);
console.log(bytes32Name); 
// "0x4d79546f6b656e00000000000000000000000000000000000000000000000000"

// Parse back to original string
const originalName = parseBytes32String(bytes32Name);
console.log(originalName); // "MyToken"

// Error cases
try {
  formatBytes32String("This string is way too long to fit in 32 bytes");
} catch (error) {
  console.log("String too long for bytes32 format");
}

try {
  parseBytes32String("0x1234"); // Not 32 bytes
} catch (error) {
  console.log("Invalid bytes32 - not 32 bytes long");
}

Unicode Normalization and Nameprep

Unicode normalization functionality for internationalized domain names following RFC 3491.

/**
 * Applies nameprep algorithm for internationalized domain names (RFC 3491)
 * @param value - String to process with nameprep
 * @returns Processed string with case folding and normalization applied
 * @throws Error for prohibited characters or invalid format
 */
function nameprep(value: string): string;

Usage Examples:

import { nameprep } from "@ethersproject/strings";

// Basic nameprep processing
const domain = "EXAMPLE.COM";
const processed = nameprep(domain);
console.log(processed); // "example.com"

// International domain names
const idn = "Bücher.example";
const processedIdn = nameprep(idn);
console.log(processedIdn); // Normalized form

// Error handling
try {
  nameprep("invalid--domain");
} catch (error) {
  console.log("Invalid hyphen pattern");
}

Types and Constants

UnicodeNormalizationForm

Unicode normalization forms for string processing.

enum UnicodeNormalizationForm {
  /** No normalization applied */
  current = "",
  /** Canonical Composition */
  NFC = "NFC",
  /** Canonical Decomposition */
  NFD = "NFD", 
  /** Compatibility Composition */
  NFKC = "NFKC",
  /** Compatibility Decomposition */
  NFKD = "NFKD"
}

Core Types

Essential types used throughout the package.

/**
 * Type representing data that can be interpreted as bytes
 * Accepts hex strings (e.g., "0x1234") or array-like structures containing numbers (0-255)
 */
type BytesLike = ArrayLike<number> | string;

UTF-8 Error Handling

Error handling types and constants for UTF-8 operations.

enum Utf8ErrorReason {
  /** A continuation byte was present where there was nothing to continue */
  UNEXPECTED_CONTINUE = "unexpected continuation byte",
  /** An invalid (non-continuation) byte to start a UTF-8 codepoint was found */
  BAD_PREFIX = "bad codepoint prefix",
  /** The string is too short to process the expected codepoint */
  OVERRUN = "string overrun",
  /** A missing continuation byte was expected but not found */
  MISSING_CONTINUE = "missing continuation byte",
  /** The computed code point is outside the range for UTF-8 */
  OUT_OF_RANGE = "out of UTF-8 range",
  /** UTF-8 strings may not contain UTF-16 surrogate pairs */
  UTF16_SURROGATE = "UTF-16 surrogate",
  /** The string is an overlong representation */
  OVERLONG = "overlong representation"
}

/**
 * Function type for handling UTF-8 decoding errors
 * @param reason - The type of error that occurred
 * @param offset - Byte offset where the error occurred
 * @param bytes - The input byte array being processed
 * @param output - The output array being built
 * @param badCodepoint - The invalid codepoint (if applicable)
 * @returns Number of bytes to skip
 */
type Utf8ErrorFunc = (
  reason: Utf8ErrorReason,
  offset: number,
  bytes: ArrayLike<number>,
  output: Array<number>,
  badCodepoint?: number
) => number;

/**
 * Predefined error handling strategies for UTF-8 decoding
 */
const Utf8ErrorFuncs: {
  /** Throws an error on invalid UTF-8 (default behavior) */
  error: Utf8ErrorFunc;
  /** Skips invalid UTF-8 sequences silently */
  ignore: Utf8ErrorFunc;
  /** Replaces invalid UTF-8 with replacement character (U+FFFD) */
  replace: Utf8ErrorFunc;
};

Usage Examples:

import { toUtf8String, Utf8ErrorFuncs, Utf8ErrorReason } from "@ethersproject/strings";

// Using predefined error handlers
const malformedBytes = new Uint8Array([0xc0, 0x80]); // Invalid sequence

// Throw error (default)
try {
  toUtf8String(malformedBytes, Utf8ErrorFuncs.error);
} catch (error) {
  console.log("UTF-8 decode error");
}

// Ignore invalid sequences
const ignoredResult = toUtf8String(malformedBytes, Utf8ErrorFuncs.ignore);

// Replace with replacement character
const replacedResult = toUtf8String(malformedBytes, Utf8ErrorFuncs.replace);

// Custom error handler
const customHandler = (reason: Utf8ErrorReason, offset: number) => {
  console.log(`Custom handler: ${reason} at ${offset}`);
  return 1; // Skip 1 byte
};
const customResult = toUtf8String(malformedBytes, customHandler);

Error Handling

The package provides comprehensive error handling for various scenarios:

UTF-8 Decoding Errors

  • UNEXPECTED_CONTINUE: Continuation byte without proper prefix
  • BAD_PREFIX: Invalid byte sequence start
  • OVERRUN: Insufficient bytes for expected sequence
  • MISSING_CONTINUE: Expected continuation byte not found
  • OUT_OF_RANGE: Code point outside valid UTF-8 range
  • UTF16_SURROGATE: Invalid UTF-16 surrogate in UTF-8
  • OVERLONG: Unnecessarily long byte sequence

Bytes32 Errors

  • String length exceeding 31 bytes when UTF-8 encoded
  • Invalid bytes32 data (not exactly 32 bytes)
  • Missing null terminator in bytes32 data

Nameprep Errors

  • STRINGPREP_CONTAINS_PROHIBITED: String contains prohibited Unicode characters
  • STRINGPREP_CONTAINS_UNASSIGNED: String contains unassigned Unicode code points
  • Invalid hyphen: Improper hyphen placement in domain names

All errors provide descriptive messages and maintain consistency with Ethereum ecosystem error patterns.

docs

index.md

tile.json