or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

attributes-properties.mddom-elements.mdindex.mdnode-types.mdparsing.mdquery-selection.md
tile.json

tessl/npm-node-html-parser

A very fast HTML parser, generating a simplified DOM, with basic element query support.

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
npmpkg:npm/node-html-parser@7.0.x

To install, run

npx @tessl/cli install tessl/npm-node-html-parser@7.0.0

index.mddocs/

Node HTML Parser

Node HTML Parser is a very fast HTML parser that generates a simplified DOM tree with comprehensive element query support. Designed for high performance when processing large HTML files, it offers a complete API for parsing HTML strings, querying elements using CSS selectors, manipulating DOM structures, and serializing back to HTML.

Package Information

  • Package Name: node-html-parser
  • Package Type: npm
  • Language: TypeScript/JavaScript
  • Installation: npm install node-html-parser

Core Imports

import { parse } from "node-html-parser";

For named imports:

import { parse, HTMLElement, TextNode, CommentNode, NodeType, valid } from "node-html-parser";

For CommonJS:

const { parse } = require("node-html-parser");

Basic Usage

import { parse } from "node-html-parser";

// Parse HTML string
const root = parse('<ul id="list"><li>Hello World</li></ul>');

// Query elements
const listItem = root.querySelector('li');
console.log(listItem.text); // "Hello World"

// Manipulate DOM
const newLi = parse('<li>New Item</li>');
root.appendChild(newLi);

// Access attributes
const list = root.querySelector('#list');
console.log(list.id); // "list"

// Convert back to HTML
console.log(root.toString());

Architecture

Node HTML Parser is built around several key components:

  • Parse Function: Main entry point for converting HTML strings to DOM trees
  • DOM Classes: HTMLElement, TextNode, and CommentNode classes providing web-standard APIs
  • Query Engine: CSS selector support via css-select integration for powerful element queries
  • Performance Focus: Optimized for speed over strict HTML specification compliance
  • Simplified DOM: Lightweight DOM structure for efficient processing of large HTML files

Capabilities

HTML Parsing

Core HTML parsing functionality that converts HTML strings into manipulable DOM trees with configurable parsing options.

function parse(data: string, options?: Partial<Options>): HTMLElement;

HTML Parsing

DOM Elements

Complete HTMLElement implementation with DOM manipulation methods, property access, and web-standard APIs for content modification.

class HTMLElement extends Node {
  // Properties
  tagName: string;
  id: string;
  classList: DOMTokenList;
  innerHTML: string;
  textContent: string;
  
  // Methods
  appendChild<T extends Node>(node: T): T;
  querySelector(selector: string): HTMLElement | null;
  getAttribute(key: string): string | undefined;
  setAttribute(key: string, value: string): HTMLElement;
}

DOM Elements

Node Types

Base Node classes and node type system including TextNode and CommentNode for complete DOM tree representation.

abstract class Node {
  childNodes: Node[];
  parentNode: HTMLElement | null;
  textContent: string;
  remove(): Node;
}

class TextNode extends Node {
  text: string;
  rawText: string;
  isWhitespace: boolean;
}

class CommentNode extends Node {
  rawText: string;
}

enum NodeType {
  ELEMENT_NODE = 1,
  TEXT_NODE = 3,
  COMMENT_NODE = 8
}

Node Types

Query & Selection

Powerful element querying capabilities using CSS selectors, tag names, IDs, and DOM traversal methods.

// CSS selector queries
querySelector(selector: string): HTMLElement | null;
querySelectorAll(selector: string): HTMLElement[];

// Element queries  
getElementsByTagName(tagName: string): HTMLElement[];
getElementById(id: string): HTMLElement | null;
closest(selector: string): HTMLElement | null;

Query & Selection

Attributes & Properties

Comprehensive attribute manipulation and property access with support for both raw and decoded attribute values.

// Attribute methods
getAttribute(key: string): string | undefined;
setAttribute(key: string, value: string): HTMLElement;
removeAttribute(key: string): HTMLElement;
hasAttribute(key: string): boolean;

// Property access
get attributes(): Record<string, string>;
get rawAttributes(): RawAttributes;
get classList(): DOMTokenList;

Attributes & Properties

Types

interface Options {
  lowerCaseTagName?: boolean;
  comment?: boolean;
  fixNestedATags?: boolean;
  parseNoneClosedTags?: boolean;
  blockTextElements?: { [tag: string]: boolean };
  voidTag?: {
    tags?: string[];
    closingSlash?: boolean;
  };
}

interface Attributes {
  [key: string]: string;
}

type InsertPosition = 'beforebegin' | 'afterbegin' | 'beforeend' | 'afterend';
type NodeInsertable = Node | string;