CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/npm-node-html-parser

A very fast HTML parser, generating a simplified DOM, with basic element query support.

Pending
Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Pending

The risk profile of this skill

Overview
Eval results
Files

parsing.mddocs/

HTML Parsing

Core HTML parsing functionality that converts HTML strings into manipulable DOM trees with comprehensive configuration options for different parsing scenarios.

Capabilities

Parse Function

Main parsing function that converts HTML strings to DOM trees with optional configuration.

/**
 * Parses HTML and returns a root element containing the DOM tree
 * @param data - HTML string to parse
 * @param options - Optional parsing configuration
 * @returns Root HTMLElement containing parsed DOM
 */
function parse(data: string, options?: Partial<Options>): HTMLElement;

Usage Examples:

import { parse } from "node-html-parser";

// Basic parsing
const root = parse('<div>Hello World</div>');

// With parsing options
const root = parse('<div>Content</div>', {
  lowerCaseTagName: true,
  comment: true,
  voidTag: {
    closingSlash: true
  }
});

// Parse complex HTML
const html = `
<html>
  <head><title>Test</title></head>
  <body>
    <div class="container">
      <p>Paragraph content</p>
      <!-- This is a comment -->
    </div>
  </body>
</html>`;

const document = parse(html, { comment: true });

HTML Validation

Validates if HTML string parses to a single root element.

/**
 * Validates HTML structure by checking if it parses to single root
 * @param data - HTML string to validate
 * @param options - Optional parsing configuration
 * @returns true if HTML is valid (single root), false otherwise
 */
function valid(data: string, options?: Partial<Options>): boolean;

Usage Examples:

import { valid } from "node-html-parser";

// Valid HTML (single root)
console.log(valid('<div><p>Content</p></div>')); // true

// Invalid HTML (multiple roots)
console.log(valid('<div>First</div><div>Second</div>')); // false

// With options
console.log(valid('<DIV>Content</DIV>', { lowerCaseTagName: true })); // true

Parsing Options

Comprehensive configuration interface for customizing parsing behavior.

interface Options {
  /** Convert all tag names to lowercase */
  lowerCaseTagName?: boolean;
  
  /** Parse and include comment nodes in the DOM tree */
  comment?: boolean;
  
  /** Fix nested anchor tags by properly closing them */
  fixNestedATags?: boolean;
  
  /** Parse tags that don't have closing tags */
  parseNoneClosedTags?: boolean;
  
  /** Define which elements should preserve their text content as-is */
  blockTextElements?: { [tag: string]: boolean };
  
  /** Void element configuration */
  voidTag?: {
    /** Custom list of void elements (defaults to HTML5 void elements) */
    tags?: string[];
    /** Add closing slash to void elements (e.g., <br/>) */
    closingSlash?: boolean;
  };
}

Default Values:

// Default blockTextElements (when not specified)
{
  script: true,
  noscript: true,
  style: true,
  pre: true
}

// Default void elements (HTML5 standard)
['area', 'base', 'br', 'col', 'embed', 'hr', 'img', 'input', 'link', 'meta', 'param', 'source', 'track', 'wbr']

Configuration Examples:

import { parse } from "node-html-parser";

// Preserve original case
const root = parse('<DIV>Content</DIV>', {
  lowerCaseTagName: false
});

// Include comments in parsing
const withComments = parse('<!-- comment --><div>content</div>', {
  comment: true
});

// Custom void elements with closing slashes
const customVoid = parse('<custom-void></custom-void>', {
  voidTag: {
    tags: ['custom-void'],
    closingSlash: true
  }
});

// Custom block text elements
const customBlocks = parse('<code>preserved content</code>', {
  blockTextElements: {
    code: true,
    pre: true
  }
});

Performance Considerations

  • Designed for speed over strict HTML specification compliance
  • Handles most common malformed HTML patterns
  • Optimized for processing large HTML files
  • Uses simplified DOM structure for better performance
  • May not parse all edge cases of malformed HTML correctly

Static Properties

The parse function exposes additional utilities as static properties:

// Access to internal classes and utilities
parse.HTMLElement: typeof HTMLElement;
parse.Node: typeof Node;
parse.TextNode: typeof TextNode;
parse.CommentNode: typeof CommentNode;
parse.NodeType: typeof NodeType;
parse.valid: typeof valid;
parse.parse: typeof baseParse; // Internal parsing function

Usage:

import { parse } from "node-html-parser";

// Create elements directly
const element = new parse.HTMLElement('div', {}, '');

// Check node types
if (node.nodeType === parse.NodeType.ELEMENT_NODE) {
  // Handle element node
}

// Use validation
const isValid = parse.valid('<div>content</div>');

docs

attributes-properties.md

dom-elements.md

index.md

node-types.md

parsing.md

query-selection.md

tile.json