CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/npm-he

A robust HTML entities encoder/decoder with full Unicode support.

Pending
Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Pending

The risk profile of this skill

Overview
Eval results
Files

he

he (for "HTML entities") is a robust HTML entity encoder/decoder written in JavaScript. It supports all standardized named character references as per HTML specification, handles ambiguous ampersands and other edge cases just like a browser would, and supports full Unicode including astral symbols. Perfect for HTML parsers, content management systems, and web applications requiring precise HTML entity handling.

Package Information

  • Package Name: he
  • Package Type: npm
  • Language: JavaScript
  • Installation: npm install he

Core Imports

const he = require('he');

For ES modules (Node.js with type: "module" or bundlers):

import * as he from 'he';
// or
import he from 'he';

For AMD (RequireJS):

require(['he'], function(he) {
  // use he
});

For browser global:

<script src="he.js"></script>
<script>
  // window.he is available
</script>

Basic Usage

const he = require('he');

// Encode text for safe HTML insertion
const encoded = he.encode('foo © bar ≠ baz 𝌆 qux');
// → 'foo &#xA9; bar &#x2260; baz &#x1D306; qux'

// Decode HTML entities back to text
const decoded = he.decode('foo &copy; bar &ne; baz &#x1D306; qux');
// → 'foo © bar ≠ baz 𝌆 qux'

// Escape unsafe characters for HTML contexts
const escaped = he.escape('<img src="x" onerror="alert(1)">');
// → '&lt;img src=&quot;x&quot; onerror=&quot;alert(1)&quot;&gt;'

// Using named references for better readability
const withNames = he.encode('foo © bar', { useNamedReferences: true });
// → 'foo &copy; bar'

Architecture

he is built around several key components:

  • UMD Module: Universal module definition supporting CommonJS, AMD, and browser globals
  • Encoding Engine: Converts Unicode characters to HTML entities with multiple encoding strategies
  • Decoding Engine: Parses HTML entities using the official HTML specification algorithm
  • Character Maps: Comprehensive lookup tables for named entities, numeric overrides, and escape sequences
  • Configuration System: Flexible options for customizing encoding/decoding behavior
  • CLI Interface: Command-line tool for batch processing and shell integration

Capabilities

Text Encoding

Converts Unicode text to HTML entities for safe insertion into HTML documents.

/**
 * Encodes a string by converting symbols to character references
 * @param {string} string - The input string to encode
 * @param {object} [options] - Optional configuration object  
 * @param {boolean} [options.useNamedReferences=false] - Use named references like &copy; instead of &#xA9;
 * @param {boolean} [options.decimal=false] - Use decimal escapes &#169; instead of hex &#xA9;
 * @param {boolean} [options.encodeEverything=false] - Encode all symbols including printable ASCII
 * @param {boolean} [options.strict=false] - Throw errors on invalid code points
 * @param {boolean} [options.allowUnsafeSymbols=false] - Don't encode unsafe HTML chars &<>"'`
 * @returns {string} Encoded string safe for HTML insertion
 */
he.encode(string, options)

Usage Examples:

// Basic encoding (hex escapes, safe characters only)
he.encode('foo © bar ≠ baz');
// → 'foo &#xA9; bar &#x2260; baz'

// Using named references
he.encode('foo © bar ≠ baz', { useNamedReferences: true });
// → 'foo &copy; bar &ne; baz'

// Using decimal escapes
he.encode('foo © bar', { decimal: true });
// → 'foo &#169; bar'

// Encoding everything including ASCII
he.encode('hello', { encodeEverything: true });
// → '&#x68;&#x65;&#x6C;&#x6C;&#x6F;'

// Strict mode throws on invalid code points
he.encode('foo\x00bar', { strict: true });
// → Throws Parse error

// Allow unsafe symbols (don't escape HTML characters)
he.encode('foo & bar', { allowUnsafeSymbols: true });
// → 'foo & bar'

HTML Entity Decoding

Decodes named and numerical character references back to Unicode text using the HTML specification algorithm.

/**
 * Decodes HTML entities in a string
 * @param {string} html - HTML string containing entities to decode
 * @param {object} [options] - Optional configuration object
 * @param {boolean} [options.isAttributeValue=false] - Treat input as HTML attribute value context
 * @param {boolean} [options.strict=false] - Throw errors on malformed character references
 * @returns {string} Decoded Unicode string
 */
he.decode(html, options)

Usage Examples:

// Basic decoding
he.decode('foo &copy; bar &ne; baz &#x1D306; qux');
// → 'foo © bar ≠ baz 𝌆 qux'

// Handles ambiguous ampersands (text context)
he.decode('foo&ampbar');
// → 'foo&bar'

// Attribute value context (different parsing rules)
he.decode('foo&ampbar', { isAttributeValue: true });
// → 'foo&ampbar'

// Strict mode throws on malformed entities
he.decode('foo&ampbar', { strict: true });
// → Throws Parse error

// Mixed entity types
he.decode('&copy; &#169; &#xA9;');
// → '© © ©'

HTML Escaping

Escapes unsafe characters for safe use in HTML text contexts.

/**
 * Escapes unsafe HTML characters for text contexts
 * @param {string} string - The input string to escape
 * @returns {string} String with unsafe characters escaped
 */
he.escape(string)

Usage Examples:

// Escape HTML-unsafe characters
he.escape('<script>alert("xss")</script>');
// → '&lt;script&gt;alert(&quot;xss&quot;)&lt;/script&gt;'

// Escape attribute content
he.escape('value="malicious"');
// → 'value=&quot;malicious&quot;'

// Handles all unsafe characters: & < > " ' `
he.escape('&<>"\'`');
// → '&amp;&lt;&gt;&quot;&#x27;&#x60;'

HTML Unescaping

Alias for decode function providing semantic clarity for unescaping operations.

/**
 * Unescapes HTML entities (alias for decode)
 * @param {string} html - HTML string containing entities to decode
 * @param {object} [options] - Optional configuration object (same as decode)
 * @returns {string} Decoded Unicode string
 */
he.unescape(html, options)

Usage Examples:

// Identical to he.decode()
he.unescape('&lt;script&gt;');
// → '<script>'

// Same options as decode
he.unescape('foo&ampbar', { isAttributeValue: true });
// → 'foo&ampbar'

Global Configuration

Default options can be modified globally to avoid passing options repeatedly.

/** Global default options for encode function */
he.encode.options = {
  allowUnsafeSymbols: false,      // Don't encode unsafe HTML chars &<>"'`
  encodeEverything: false,        // Only encode necessary characters  
  strict: false,                  // Don't throw on invalid code points
  useNamedReferences: false,      // Use hex escapes instead of names
  decimal: false                  // Use hex instead of decimal escapes
};

/** Global default options for decode function */
he.decode.options = {
  isAttributeValue: false,        // Treat input as HTML text context
  strict: false                   // Don't throw on malformed entities
};

Usage Examples:

// Override global encode defaults
he.encode.options.useNamedReferences = true;
he.encode('foo © bar');  // Now uses named refs by default
// → 'foo &copy; bar'

// Override global decode defaults  
he.decode.options.strict = true;
he.decode('foo&ampbar');  // Now throws on malformed entities
// → Parse error

// Read current defaults
console.log(he.encode.options.decimal);  // → false
console.log(he.decode.options.isAttributeValue);  // → false

Version Information

Access to the library version for compatibility checks.

/** Semantic version string of the library */
he.version  // '1.2.0'

Usage Examples:

console.log(he.version);  // → '1.2.0'

// Version-based feature detection
if (he.version >= '1.2.0') {
  // Use newer features
}

Command Line Interface

he provides a command-line interface for batch processing and shell integration.

Installation

npm install -g he

Basic Commands

# Encode text
he --encode 'föo ♥ bår 𝌆 baz'
# → f&#xF6;o &#x2665; b&#xE5;r &#x1D306; baz

# Encode with named references  
he --encode --use-named-refs 'föo ♥ bår'
# → f&ouml;o &hearts; b&aring;r

# Decode entities
he --decode 'f&ouml;o &hearts; b&aring;r'
# → föo ♥ bår

# Escape HTML
he --escape '<img src="x" onerror="alert(1)">'
# → &lt;img src=&quot;x&quot; onerror=&quot;alert(1)&quot;&gt;

Encoding Options

# Use named character references
he --encode --use-named-refs 'text © symbol'

# Encode everything including ASCII  
he --encode --everything 'hello'

# Use decimal instead of hex escapes
he --encode --decimal 'text © symbol'

# Allow unsafe HTML characters
he --encode --allow-unsafe 'text & symbol'

Decoding Options

# Treat as HTML attribute value
he --decode --attribute 'foo&ampbar'

# Enable strict parsing mode
he --decode --strict 'foo&ampbar'

File Processing

# Process files with redirection
he --encode < input.txt > output.html
he --decode < input.html > output.txt

# Process remote content
curl -s "https://example.com/data.txt" | he --encode > encoded.html

Help and Version

# Show version
he --version
he -v

# Show help
he --help  
he -h

Error Handling

he provides comprehensive error handling for various edge cases:

Invalid Code Points:

// In non-strict mode, invalid code points are preserved
he.encode('foo\x00bar');  
// → 'foo\x00bar'

// In strict mode, throws Parse error
he.encode('foo\x00bar', { strict: true });
// → Parse error: forbidden code point

Malformed Entities:

// In non-strict mode, malformed entities are left as-is
he.decode('foo&ampbar');
// → 'foo&bar'

// In strict mode, throws Parse error  
he.decode('foo&ampbar', { strict: true });
// → Parse error: named character reference was not terminated by a semicolon

Unicode Edge Cases:

// Handles astral symbols (4-byte Unicode) correctly
he.encode('𝌆');  // Mathematical symbol
// → '&#x1D306;'

he.decode('&#x1D306;');
// → '𝌆'

// Handles surrogate pairs correctly
he.encode('\uD834\uDF06');  // Same symbol as above
// → '&#x1D306;'

Types

/** Main library object */
const he = {
  version: '1.2.0',
  encode: function(string, options) { /* ... */ },
  decode: function(html, options) { /* ... */ },  
  escape: function(string) { /* ... */ },
  unescape: function(html, options) { /* ... */ }  // alias for decode
};

/** Encode options object properties */
const encodeOptions = {
  useNamedReferences: false,    // Use named refs like &copy; instead of &#xA9;
  decimal: false,               // Use decimal &#169; instead of hex &#xA9;
  encodeEverything: false,      // Encode all symbols including ASCII
  strict: false,                // Throw on invalid code points
  allowUnsafeSymbols: false     // Don't encode &<>"'` characters
};

/** Decode options object properties */
const decodeOptions = {
  isAttributeValue: false,      // Treat as HTML attribute value context
  strict: false                 // Throw on malformed character references
};

/** Error thrown in strict mode for invalid input */
throw new Error('Parse error: ' + message);
Workspace
tessl
Visibility
Public
Created
Last updated
Describes
npmpkg:npm/he@1.2.x
Publish Source
CLI
Badge
tessl/npm-he badge