or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

index.md

tile.json

tessl/npm-utf8

A well-tested UTF-8 encoder/decoder written in JavaScript

Workspace: tessl
Visibility: Public
Created: 3 months ago
Last updated: 3 months ago
Describes: pkg:npm/utf8@3.0.x

To install, run

npx @tessl/cli install tessl/npm-utf8@3.0.0

UTF-8 JavaScript Library

UTF-8.js is a well-tested UTF-8 encoder/decoder written in JavaScript that provides proper UTF-8 encoding and decoding according to the WHATWG Encoding Standard. It handles Unicode scalar values correctly and provides comprehensive error handling for malformed input.

Package Information

Package Name: utf8
Package Type: npm
Language: JavaScript
Installation: npm install utf8

Core Imports

const utf8 = require('utf8');

For ES modules:

import * as utf8 from 'utf8';

Browser usage:

<script src="utf8.js"></script>
<!-- Creates global utf8 object -->

Basic Usage

const utf8 = require('utf8');

// Encode JavaScript string to UTF-8 byte string
const encoded = utf8.encode('Hello, 世界!');
console.log(encoded); // Output: UTF-8 encoded byte string

// Decode UTF-8 byte string back to JavaScript string
const decoded = utf8.decode(encoded);
console.log(decoded); // Output: 'Hello, 世界!'

// Check library version
console.log(utf8.version); // Output: '3.0.0'

Capabilities

UTF-8 Encoding

Encodes JavaScript strings as UTF-8 byte strings with proper Unicode scalar value handling.

/**
 * Encodes any given JavaScript string as UTF-8
 * @param {string} string - JavaScript string to encode as UTF-8
 * @returns {string} UTF-8-encoded byte string
 * @throws {Error} When input contains non-scalar values (lone surrogates)
 */
utf8.encode(string);

Usage Examples:

// Basic ASCII encoding
utf8.encode('Hello');
// → 'Hello'

// Unicode characters
utf8.encode('\xA9'); // U+00A9 COPYRIGHT SIGN
// → '\xC2\xA9'

// Supplementary characters (surrogate pairs)
utf8.encode('\uD800\uDC01'); // U+10001 LINEAR B SYLLABLE B038 E
// → '\xF0\x90\x80\x81'

// Multi-byte Unicode
utf8.encode('世界'); // Chinese characters
// → '\xE4\xB8\x96\xE7\x95\x8C'

Error Handling:

try {
  // This will throw an error due to lone surrogate
  utf8.encode('\uD800'); // High surrogate without matching low surrogate
} catch (error) {
  console.error(error.message); // "Lone surrogate U+D800 is not a scalar value"
}

UTF-8 Decoding

Decodes UTF-8 byte strings back to JavaScript strings with malformed input detection.

/**
 * Decodes any given UTF-8-encoded string as UTF-8
 * @param {string} byteString - UTF-8 encoded byte string to decode
 * @returns {string} JavaScript string (UTF-8 decoded)
 * @throws {Error} When malformed UTF-8 is detected
 */
utf8.decode(byteString);

Usage Examples:

// Basic decoding
utf8.decode('\xC2\xA9');
// → '\xA9' (U+00A9 COPYRIGHT SIGN)

// Supplementary characters
utf8.decode('\xF0\x90\x80\x81');
// → '\uD800\uDC01' (U+10001 LINEAR B SYLLABLE B038 E)

// Multi-byte sequences
utf8.decode('\xE4\xB8\x96\xE7\x95\x8C');
// → '世界'

Error Handling:

try {
  // This will throw an error due to malformed UTF-8
  utf8.decode('\xFF\xFE'); // Invalid UTF-8 sequence
} catch (error) {
  console.error(error.message); // "Invalid UTF-8 detected"
}

try {
  // This will throw an error due to incomplete sequence
  utf8.decode('\xC2'); // Incomplete 2-byte sequence
} catch (error) {
  console.error(error.message); // "Invalid byte index"
}

Version Information

Provides the semantic version number of the library.

/**
 * Semantic version number of the utf8 library
 * @type {string}
 */
utf8.version;

Usage Example:

console.log(`Using utf8.js version ${utf8.version}`);
// Output: "Using utf8.js version 3.0.0"

Error Types

The library throws standard JavaScript Error objects with descriptive messages:

Lone Surrogate Error: Thrown when encode() encounters unpaired surrogate characters
Invalid UTF-8 Error: Thrown when decode() encounters malformed UTF-8 sequences
Invalid Byte Index Error: Thrown when decoding exceeds byte boundaries
Invalid Continuation Byte Error: Thrown when UTF-8 continuation bytes are malformed

Unicode Support

Full Unicode Range: Supports U+0000 to U+10FFFF (complete Unicode range)
Surrogate Pair Handling: Properly handles JavaScript surrogate pairs for supplementary characters
Scalar Value Validation: Enforces Unicode scalar value requirements (no lone surrogates)
Standards Compliance: Implements UTF-8 encoding per WHATWG Encoding Standard
Multi-byte Sequences: Correctly handles 1-4 byte UTF-8 sequences

Browser Compatibility

The library has been tested and works in:

Chrome 27+
Firefox 3+
Safari 4+
Opera 10+
Internet Explorer 6+
Node.js v0.10.0+
Various JavaScript engines (Rhino, Narwhal, RingoJS, PhantomJS)