or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

advanced-decoding.mddecoding.mdencoding.mdescaping.mdindex.md
tile.json

encoding.mddocs/

Entity Encoding

Flexible HTML and XML entity encoding with multiple output modes for different use cases and environments. Provides both unified API and specialized encoding functions.

Capabilities

Universal Encode Function

Main encoding function that handles both HTML and XML entities with configurable output modes.

/**
 * Encodes a string with entities using configurable options
 * @param input - String to encode
 * @param options - Encoding configuration or EntityLevel shorthand
 * @returns Encoded string with characters converted to entities
 */
function encode(
  input: string,
  options?: EncodingOptions | EntityLevel
): string;

interface EncodingOptions {
  /** The level of entities to support (default: EntityLevel.XML) */
  level?: EntityLevel;
  /** Output encoding format (default: EncodingMode.Extensive) */
  mode?: EncodingMode;
}

enum EntityLevel {
  /** Use XML entities only (&, <, >, ", ') */
  XML = 0,
  /** Use HTML entities (includes named entities like  , ©) */
  HTML = 1
}

enum EncodingMode {
  /** UTF-8 encoded output, only XML characters escaped */
  UTF8,
  /** ASCII-only output with HTML escaping for non-ASCII characters */
  ASCII,
  /** Encode all characters that have entities + non-ASCII characters */
  Extensive,
  /** HTML attribute escaping following WHATWG specification */
  Attribute,
  /** HTML text escaping following WHATWG specification */
  Text
}

Usage Examples:

import { encode, EntityLevel, EncodingMode } from "entities";

// Basic XML encoding (default: extensive mode)
encode('<script>alert("XSS")</script>');
// Result: "&lt;script&gt;alert(&quot;XSS&quot;)&lt;/script&gt;"

// UTF-8 optimized encoding (minimal escaping)
encode("Hello & 世界", { mode: EncodingMode.UTF8 });
// Result: "Hello &amp; 世界"

// ASCII-only output
encode("Café & 世界", { 
  level: EntityLevel.HTML, 
  mode: EncodingMode.ASCII 
});
// Result: "Caf&#xe9; &amp; &#x4e16;&#x754c;"

// HTML extensive encoding with named entities
encode("© 2023 → ♥", { 
  level: EntityLevel.HTML, 
  mode: EncodingMode.Extensive 
});
// Result: "&copy; 2023 &rarr; &hearts;"

// HTML attribute context encoding
encode('Say "Hello" & goodbye', { 
  level: EntityLevel.HTML,
  mode: EncodingMode.Attribute 
});
// Result: "Say &quot;Hello&quot; &amp; goodbye"

// HTML text context encoding  
encode("2 < 3 & 4 > 1", {
  level: EntityLevel.HTML,
  mode: EncodingMode.Text
});
// Result: "2 &lt; 3 &amp; 4 &gt; 1"

HTML-Specific Encoding Functions

Specialized functions for HTML entity encoding with different coverage levels.

/**
 * Encodes all characters using HTML entities, including valid ASCII characters
 * Most comprehensive encoding - encodes characters like # and ; as entities
 * @param input - String to encode
 * @returns Extensively encoded string
 */
function encodeHTML(input: string): string;

/**
 * Encodes only non-ASCII characters and HTML-invalid characters
 * More compact than encodeHTML - leaves valid ASCII characters unencoded
 * @param input - String to encode  
 * @returns Selectively encoded string
 */
function encodeNonAsciiHTML(input: string): string;

Usage Examples:

import { encodeHTML, encodeNonAsciiHTML } from "entities";

// Comprehensive HTML encoding
encodeHTML("Hello! #hashtag & café");
// Result: "Hello&excl; &num;hashtag &amp; caf&eacute;"

// Selective HTML encoding (more compact)
encodeNonAsciiHTML("Hello! #hashtag & café");  
// Result: "Hello! #hashtag &amp; caf&#xe9;"

Legacy Compatibility

The library includes deprecated aliases for backward compatibility:

// Deprecated aliases - use encodeHTML instead
function encodeHTML4(input: string): string;
function encodeHTML5(input: string): string;

Encoding Mode Details

UTF8 Mode

  • Purpose: Minimal escaping for UTF-8 environments
  • Encodes: Only XML-critical characters ("&'<>)
  • Best for: Modern web applications, UTF-8 content
  • Output: Preserves Unicode characters, minimal entity usage
encode("Café & 日本語", { mode: EncodingMode.UTF8 });
// Result: "Café &amp; 日本語"

ASCII Mode

  • Purpose: ASCII-only output for legacy systems
  • Encodes: Non-ASCII characters + HTML-invalid characters
  • Best for: Email, legacy systems, ASCII-only environments
  • Output: All non-ASCII as numeric entities
encode("Café & 日本語", { 
  level: EntityLevel.HTML,
  mode: EncodingMode.ASCII 
});
// Result: "Caf&#xe9; &amp; &#x65e5;&#x672c;&#x8a9e;"

Extensive Mode (Default)

  • Purpose: Maximum entity usage with named entities
  • Encodes: All characters with available entities + non-ASCII
  • Best for: Maximum compatibility, semantic markup
  • Output: Uses named entities when available
encode("© 2023 → ♥", { 
  level: EntityLevel.HTML,
  mode: EncodingMode.Extensive 
});
// Result: "&copy; 2023 &rarr; &hearts;"

Attribute Mode

  • Purpose: WHATWG HTML specification compliant attribute escaping
  • Encodes: "&\u00A0 (quote, ampersand, non-breaking space)
  • Best for: HTML attribute values
  • Output: Minimal escaping for attribute context
encode('title="Hello & world"', { mode: EncodingMode.Attribute });
// Result: "title=&quot;Hello &amp; world&quot;"

Text Mode

  • Purpose: WHATWG HTML specification compliant text escaping
  • Encodes: &<>\u00A0 (ampersand, less-than, greater-than, nbsp)
  • Best for: HTML text content
  • Output: Minimal escaping for text context
encode("2 < 3 & 4 > 1", { mode: EncodingMode.Text });
// Result: "2 &lt; 3 &amp; 4 &gt; 1"

Advanced Usage Patterns

Conditional Encoding

import { encode, EntityLevel, EncodingMode } from "entities";

function encodeForContext(text: string, context: 'xml' | 'html-attr' | 'html-text') {
  switch (context) {
    case 'xml':
      return encode(text, { level: EntityLevel.XML, mode: EncodingMode.UTF8 });
    case 'html-attr':
      return encode(text, { level: EntityLevel.HTML, mode: EncodingMode.Attribute });
    case 'html-text':
      return encode(text, { level: EntityLevel.HTML, mode: EncodingMode.Text });
    default:
      return encode(text);
  }
}

Template Processing

import { encode, EntityLevel, EncodingMode } from "entities";

function createHTMLTemplate(data: Record<string, string>) {
  return `
    <div class="user-card">
      <h2>${encode(data.name, { level: EntityLevel.HTML, mode: EncodingMode.Text })}</h2>
      <p title="${encode(data.bio, { level: EntityLevel.HTML, mode: EncodingMode.Attribute })}">
        ${encode(data.description, { level: EntityLevel.HTML, mode: EncodingMode.Text })}
      </p>
    </div>
  `;
}

Batch Processing

import { encode, EntityLevel, EncodingMode } from "entities";

const userInputs = ['<script>', 'Hello & world', '"quotes"'];

const safeOutputs = userInputs.map(input => 
  encode(input, { 
    level: EntityLevel.HTML, 
    mode: EncodingMode.Text 
  })
);

Performance Considerations

  • UTF8 mode: Fastest, minimal regex matching
  • Extensive mode: Moderate performance, trie-based entity lookup
  • ASCII mode: Slower, requires Unicode code point processing
  • HTML vs XML level: HTML level slightly slower due to larger entity set

Choose the most restrictive mode that meets your requirements for optimal performance.

Security Best Practices

  1. Always encode user input before inserting into HTML/XML
  2. Use appropriate context encoding (Attribute vs Text mode)
  3. Prefer strict modes for security-sensitive applications
  4. Validate encoding results in security-critical contexts
// Security-focused encoding example
function secureHTMLInsertion(userInput: string, context: 'text' | 'attribute') {
  const mode = context === 'text' ? EncodingMode.Text : EncodingMode.Attribute;
  return encode(userInput, { 
    level: EntityLevel.HTML, 
    mode 
  });
}