CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/npm-parse5

HTML parser and serializer that is fully compliant with the WHATWG HTML Living Standard.

Pending

Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

Overview
Eval results
Files

serialization.mddocs/

HTML Serialization

HTML serialization functionality for converting parsed AST nodes back to HTML strings. The serializer handles proper HTML formatting, void elements, and namespace-aware serialization.

Capabilities

Inner Content Serialization

Serializes the inner content of a node (children only) to an HTML string.

/**
 * Serializes an AST node's inner content to an HTML string
 * @param node - Parent node whose children will be serialized
 * @param options - Optional serialization configuration
 * @returns HTML string representing the node's inner content
 */
function serialize<T extends TreeAdapterTypeMap = DefaultTreeAdapterMap>(
  node: T['parentNode'],
  options?: SerializerOptions<T>
): string;

Usage Examples:

import { parse, serialize } from "parse5";

// Parse and serialize document content
const document = parse('<!DOCTYPE html><html><head></head><body>Hi there!</body></html>');
const html = serialize(document);
console.log(html); // '<html><head></head><body>Hi there!</body></html>'

// Serialize element's inner content
const bodyElement = document.childNodes[1].childNodes[1]; // html > body
const bodyContent = serialize(bodyElement);
console.log(bodyContent); // 'Hi there!'

// Serialize with custom tree adapter
import { htmlparser2TreeAdapter } from "parse5-htmlparser2-tree-adapter";
const customHtml = serialize(document, {
  treeAdapter: htmlparser2TreeAdapter
});

Outer Element Serialization

Serializes an element including the element tag itself (outerHTML equivalent).

/**
 * Serializes an element including its opening and closing tags
 * @param node - Element node to serialize completely
 * @param options - Optional serialization configuration
 * @returns HTML string including the element's outer tags
 */
function serializeOuter<T extends TreeAdapterTypeMap = DefaultTreeAdapterMap>(
  node: T['node'],
  options?: SerializerOptions<T>
): string;

Usage Examples:

import { parseFragment, serializeOuter } from "parse5";

// Parse fragment and serialize complete element
const fragment = parseFragment('<div class="container"><span>Hello</span></div>');
const divElement = fragment.childNodes[0];
const outerHTML = serializeOuter(divElement);
console.log(outerHTML); // '<div class="container"><span>Hello</span></div>'

// Serialize nested elements
const complexFragment = parseFragment(`
  <article data-id="123">
    <header><h1>Title</h1></header>
    <section><p>Content paragraph</p></section>
  </article>
`);
const articleElement = complexFragment.childNodes[0];
const fullArticle = serializeOuter(articleElement);

Serialization Options

Control serialization behavior through configuration options.

interface SerializerOptions<T extends TreeAdapterTypeMap> {
  /**
   * Specifies input tree format. Defaults to the default tree adapter.
   */
  treeAdapter?: TreeAdapter<T>;

  /**
   * The scripting flag. If set to true, noscript element content 
   * will not be escaped. Defaults to true.
   */
  scriptingEnabled?: boolean;
}

Usage Examples:

import { parse, serialize, parseFragment, serializeOuter } from "parse5";

// Serialize with scripting disabled
const docWithNoscript = parse('<html><body><noscript>No JS content</noscript></body></html>');
const htmlWithoutScripting = serialize(docWithNoscript, {
  scriptingEnabled: false
});

// Use custom tree adapter
import { customTreeAdapter } from "./my-tree-adapter";
const customSerialized = serialize(docWithNoscript, {
  treeAdapter: customTreeAdapter
});

Serialization Behavior

Void Elements

The serializer properly handles void elements (self-closing tags):

import { parseFragment, serializeOuter } from "parse5";

// Void elements are serialized without closing tags
const fragment = parseFragment('<img src="image.jpg" alt="Image"><br><input type="text">');
const serialized = serialize(fragment);
console.log(serialized); // '<img src="image.jpg" alt="Image"><br><input type="text">'

// Even if child nodes are added programmatically, void elements ignore them
const brElement = parseFragment('<br>').childNodes[0];
// Adding children to void elements has no effect during serialization
const brSerialized = serializeOuter(brElement);
console.log(brSerialized); // '<br>' (children are ignored)

Namespace Handling

The serializer handles XML namespaces correctly:

import { parseFragment, serialize } from "parse5";

// SVG and MathML namespaces are preserved
const svgFragment = parseFragment(`
  <svg xmlns="http://www.w3.org/2000/svg">
    <circle cx="50" cy="50" r="40"/>
  </svg>
`);
const svgSerialized = serialize(svgFragment);

// Namespace declarations and prefixes are maintained
const xmlFragment = parseFragment('<root xmlns:custom="http://example.com/ns"><custom:element/></root>');
const xmlSerialized = serialize(xmlFragment);

Attribute Serialization

Attributes are properly escaped and formatted:

import { parseFragment, serializeOuter } from "parse5";

// Special characters in attributes are escaped
const fragment = parseFragment('<div title="Quote: &quot;Hello&quot;" data-value=\'Single "quotes"\'></div>');
const element = fragment.childNodes[0];
const serialized = serializeOuter(element);
console.log(serialized); // Attributes properly escaped

// Boolean attributes
const inputFragment = parseFragment('<input type="checkbox" checked disabled>');
const inputSerialized = serialize(inputFragment);
console.log(inputSerialized); // '<input type="checkbox" checked disabled>'

Text Content Escaping

Text content is automatically escaped:

import { parseFragment, serialize } from "parse5";

// Special HTML characters are escaped in text content
const fragment = parseFragment('<p>Text with &lt;script&gt; and &amp; entities</p>');
const serialized = serialize(fragment);
// Text content maintains proper escaping

// Script and style elements preserve their content
const scriptFragment = parseFragment('<script>if (x < y && y > z) { /* code */ }</script>');
const scriptSerialized = serialize(scriptFragment);
// Script content is not double-escaped

Template Element Handling

Template elements receive special handling:

import { parseFragment, serialize, serializeOuter } from "parse5";

// Template content is serialized as inner content
const templateFragment = parseFragment('<template><div>Template content</div></template>');
const templateElement = templateFragment.childNodes[0];

// serialize() on template element returns the template's inner content
const templateInner = serialize(templateElement);
console.log(templateInner); // '<div>Template content</div>'

// serializeOuter() includes the template tags
const templateOuter = serializeOuter(templateElement);
console.log(templateOuter); // '<template><div>Template content</div></template>'

Common Serialization Patterns

Round-trip Parsing and Serialization

import { parse, serialize } from "parse5";

// Parse HTML and serialize back - should be equivalent
const originalHtml = '<!DOCTYPE html><html><head><title>Test</title></head><body><div class="content">Hello World</div></body></html>';
const document = parse(originalHtml);
const serializedHtml = serialize(document);

// The serialized HTML maintains the same structure
// (though formatting may differ slightly)

Selective Content Serialization

import { parseFragment, serialize } from "parse5";

// Parse complex structure and serialize specific parts
const complexFragment = parseFragment(`
  <article>
    <header><h1>Article Title</h1></header>
    <section class="content">
      <p>First paragraph</p>
      <p>Second paragraph</p>
    </section>
    <footer>Article footer</footer>
  </article>
`);

const article = complexFragment.childNodes[0];
const contentSection = article.childNodes[1]; // section.content
const contentHtml = serialize(contentSection);
// Returns only the content section's inner HTML

HTML Cleaning and Transformation

import { parse, serialize } from "parse5";

// Parse potentially malformed HTML and serialize clean output
const messyHtml = '<div><p>Unclosed paragraph<span>Nested content<div>Misplaced div</div>';
const document = parse(messyHtml);
const cleanHtml = serialize(document);
// Results in properly structured, valid HTML

Install with Tessl CLI

npx tessl i tessl/npm-parse5

docs

error-handling.md

html-utilities.md

index.md

parsing.md

serialization.md

tokenization.md

tree-adapters.md

tile.json