or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

index.md
tile.json

tessl/npm-he

A robust HTML entities encoder/decoder with full Unicode support.

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
npmpkg:npm/he@1.2.x

To install, run

npx @tessl/cli install tessl/npm-he@1.2.0

index.mddocs/

he

he (for "HTML entities") is a robust HTML entity encoder/decoder written in JavaScript. It supports all standardized named character references as per HTML specification, handles ambiguous ampersands and other edge cases just like a browser would, and supports full Unicode including astral symbols. Perfect for HTML parsers, content management systems, and web applications requiring precise HTML entity handling.

Package Information

  • Package Name: he
  • Package Type: npm
  • Language: JavaScript
  • Installation:
    npm install he

Core Imports

const he = require('he');

For ES modules (Node.js with type: "module" or bundlers):

import * as he from 'he';
// or
import he from 'he';

For AMD (RequireJS):

require(['he'], function(he) {
  // use he
});

For browser global:

<script src="he.js"></script>
<script>
  // window.he is available
</script>

Basic Usage

const he = require('he');

// Encode text for safe HTML insertion
const encoded = he.encode('foo Β© bar β‰  baz πŒ† qux');
// β†’ 'foo &#xA9; bar &#x2260; baz &#x1D306; qux'

// Decode HTML entities back to text
const decoded = he.decode('foo &copy; bar &ne; baz &#x1D306; qux');
// β†’ 'foo Β© bar β‰  baz πŒ† qux'

// Escape unsafe characters for HTML contexts
const escaped = he.escape('<img src="x" onerror="alert(1)">');
// β†’ '&lt;img src=&quot;x&quot; onerror=&quot;alert(1)&quot;&gt;'

// Using named references for better readability
const withNames = he.encode('foo Β© bar', { useNamedReferences: true });
// β†’ 'foo &copy; bar'

Architecture

he is built around several key components:

  • UMD Module: Universal module definition supporting CommonJS, AMD, and browser globals
  • Encoding Engine: Converts Unicode characters to HTML entities with multiple encoding strategies
  • Decoding Engine: Parses HTML entities using the official HTML specification algorithm
  • Character Maps: Comprehensive lookup tables for named entities, numeric overrides, and escape sequences
  • Configuration System: Flexible options for customizing encoding/decoding behavior
  • CLI Interface: Command-line tool for batch processing and shell integration

Capabilities

Text Encoding

Converts Unicode text to HTML entities for safe insertion into HTML documents.

/**
 * Encodes a string by converting symbols to character references
 * @param {string} string - The input string to encode
 * @param {object} [options] - Optional configuration object  
 * @param {boolean} [options.useNamedReferences=false] - Use named references like &copy; instead of &#xA9;
 * @param {boolean} [options.decimal=false] - Use decimal escapes &#169; instead of hex &#xA9;
 * @param {boolean} [options.encodeEverything=false] - Encode all symbols including printable ASCII
 * @param {boolean} [options.strict=false] - Throw errors on invalid code points
 * @param {boolean} [options.allowUnsafeSymbols=false] - Don't encode unsafe HTML chars &<>"'`
 * @returns {string} Encoded string safe for HTML insertion
 */
he.encode(string, options)

Usage Examples:

// Basic encoding (hex escapes, safe characters only)
he.encode('foo Β© bar β‰  baz');
// β†’ 'foo &#xA9; bar &#x2260; baz'

// Using named references
he.encode('foo Β© bar β‰  baz', { useNamedReferences: true });
// β†’ 'foo &copy; bar &ne; baz'

// Using decimal escapes
he.encode('foo Β© bar', { decimal: true });
// β†’ 'foo &#169; bar'

// Encoding everything including ASCII
he.encode('hello', { encodeEverything: true });
// β†’ '&#x68;&#x65;&#x6C;&#x6C;&#x6F;'

// Strict mode throws on invalid code points
he.encode('foo\x00bar', { strict: true });
// β†’ Throws Parse error

// Allow unsafe symbols (don't escape HTML characters)
he.encode('foo & bar', { allowUnsafeSymbols: true });
// β†’ 'foo & bar'

HTML Entity Decoding

Decodes named and numerical character references back to Unicode text using the HTML specification algorithm.

/**
 * Decodes HTML entities in a string
 * @param {string} html - HTML string containing entities to decode
 * @param {object} [options] - Optional configuration object
 * @param {boolean} [options.isAttributeValue=false] - Treat input as HTML attribute value context
 * @param {boolean} [options.strict=false] - Throw errors on malformed character references
 * @returns {string} Decoded Unicode string
 */
he.decode(html, options)

Usage Examples:

// Basic decoding
he.decode('foo &copy; bar &ne; baz &#x1D306; qux');
// β†’ 'foo Β© bar β‰  baz πŒ† qux'

// Handles ambiguous ampersands (text context)
he.decode('foo&ampbar');
// β†’ 'foo&bar'

// Attribute value context (different parsing rules)
he.decode('foo&ampbar', { isAttributeValue: true });
// β†’ 'foo&ampbar'

// Strict mode throws on malformed entities
he.decode('foo&ampbar', { strict: true });
// β†’ Throws Parse error

// Mixed entity types
he.decode('&copy; &#169; &#xA9;');
// β†’ 'Β© Β© Β©'

HTML Escaping

Escapes unsafe characters for safe use in HTML text contexts.

/**
 * Escapes unsafe HTML characters for text contexts
 * @param {string} string - The input string to escape
 * @returns {string} String with unsafe characters escaped
 */
he.escape(string)

Usage Examples:

// Escape HTML-unsafe characters
he.escape('<script>alert("xss")</script>');
// β†’ '&lt;script&gt;alert(&quot;xss&quot;)&lt;/script&gt;'

// Escape attribute content
he.escape('value="malicious"');
// β†’ 'value=&quot;malicious&quot;'

// Handles all unsafe characters: & < > " ' `
he.escape('&<>"\'`');
// β†’ '&amp;&lt;&gt;&quot;&#x27;&#x60;'

HTML Unescaping

Alias for decode function providing semantic clarity for unescaping operations.

/**
 * Unescapes HTML entities (alias for decode)
 * @param {string} html - HTML string containing entities to decode
 * @param {object} [options] - Optional configuration object (same as decode)
 * @returns {string} Decoded Unicode string
 */
he.unescape(html, options)

Usage Examples:

// Identical to he.decode()
he.unescape('&lt;script&gt;');
// β†’ '<script>'

// Same options as decode
he.unescape('foo&ampbar', { isAttributeValue: true });
// β†’ 'foo&ampbar'

Global Configuration

Default options can be modified globally to avoid passing options repeatedly.

/** Global default options for encode function */
he.encode.options = {
  allowUnsafeSymbols: false,      // Don't encode unsafe HTML chars &<>"'`
  encodeEverything: false,        // Only encode necessary characters  
  strict: false,                  // Don't throw on invalid code points
  useNamedReferences: false,      // Use hex escapes instead of names
  decimal: false                  // Use hex instead of decimal escapes
};

/** Global default options for decode function */
he.decode.options = {
  isAttributeValue: false,        // Treat input as HTML text context
  strict: false                   // Don't throw on malformed entities
};

Usage Examples:

// Override global encode defaults
he.encode.options.useNamedReferences = true;
he.encode('foo Β© bar');  // Now uses named refs by default
// β†’ 'foo &copy; bar'

// Override global decode defaults  
he.decode.options.strict = true;
he.decode('foo&ampbar');  // Now throws on malformed entities
// β†’ Parse error

// Read current defaults
console.log(he.encode.options.decimal);  // β†’ false
console.log(he.decode.options.isAttributeValue);  // β†’ false

Version Information

Access to the library version for compatibility checks.

/** Semantic version string of the library */
he.version  // '1.2.0'

Usage Examples:

console.log(he.version);  // β†’ '1.2.0'

// Version-based feature detection
if (he.version >= '1.2.0') {
  // Use newer features
}

Command Line Interface

he provides a command-line interface for batch processing and shell integration.

Installation

npm install -g he

Basic Commands

# Encode text
he --encode 'fΓΆo β™₯ bΓ₯r πŒ† baz'
# β†’ f&#xF6;o &#x2665; b&#xE5;r &#x1D306; baz

# Encode with named references  
he --encode --use-named-refs 'fΓΆo β™₯ bΓ₯r'
# β†’ f&ouml;o &hearts; b&aring;r

# Decode entities
he --decode 'f&ouml;o &hearts; b&aring;r'
# β†’ fΓΆo β™₯ bΓ₯r

# Escape HTML
he --escape '<img src="x" onerror="alert(1)">'
# β†’ &lt;img src=&quot;x&quot; onerror=&quot;alert(1)&quot;&gt;

Encoding Options

# Use named character references
he --encode --use-named-refs 'text Β© symbol'

# Encode everything including ASCII  
he --encode --everything 'hello'

# Use decimal instead of hex escapes
he --encode --decimal 'text Β© symbol'

# Allow unsafe HTML characters
he --encode --allow-unsafe 'text & symbol'

Decoding Options

# Treat as HTML attribute value
he --decode --attribute 'foo&ampbar'

# Enable strict parsing mode
he --decode --strict 'foo&ampbar'

File Processing

# Process files with redirection
he --encode < input.txt > output.html
he --decode < input.html > output.txt

# Process remote content
curl -s "https://example.com/data.txt" | he --encode > encoded.html

Help and Version

# Show version
he --version
he -v

# Show help
he --help  
he -h

Error Handling

he provides comprehensive error handling for various edge cases:

Invalid Code Points:

// In non-strict mode, invalid code points are preserved
he.encode('foo\x00bar');  
// β†’ 'foo\x00bar'

// In strict mode, throws Parse error
he.encode('foo\x00bar', { strict: true });
// β†’ Parse error: forbidden code point

Malformed Entities:

// In non-strict mode, malformed entities are left as-is
he.decode('foo&ampbar');
// β†’ 'foo&bar'

// In strict mode, throws Parse error  
he.decode('foo&ampbar', { strict: true });
// β†’ Parse error: named character reference was not terminated by a semicolon

Unicode Edge Cases:

// Handles astral symbols (4-byte Unicode) correctly
he.encode('πŒ†');  // Mathematical symbol
// β†’ '&#x1D306;'

he.decode('&#x1D306;');
// β†’ 'πŒ†'

// Handles surrogate pairs correctly
he.encode('\uD834\uDF06');  // Same symbol as above
// β†’ '&#x1D306;'

Types

/** Main library object */
const he = {
  version: '1.2.0',
  encode: function(string, options) { /* ... */ },
  decode: function(html, options) { /* ... */ },  
  escape: function(string) { /* ... */ },
  unescape: function(html, options) { /* ... */ }  // alias for decode
};

/** Encode options object properties */
const encodeOptions = {
  useNamedReferences: false,    // Use named refs like &copy; instead of &#xA9;
  decimal: false,               // Use decimal &#169; instead of hex &#xA9;
  encodeEverything: false,      // Encode all symbols including ASCII
  strict: false,                // Throw on invalid code points
  allowUnsafeSymbols: false     // Don't encode &<>"'` characters
};

/** Decode options object properties */
const decodeOptions = {
  isAttributeValue: false,      // Treat as HTML attribute value context
  strict: false                 // Throw on malformed character references
};

/** Error thrown in strict mode for invalid input */
throw new Error('Parse error: ' + message);