A very fast HTML parser, generating a simplified DOM, with basic element query support.
npx @tessl/cli install tessl/npm-node-html-parser@7.0.0Node HTML Parser is a very fast HTML parser that generates a simplified DOM tree with comprehensive element query support. Designed for high performance when processing large HTML files, it offers a complete API for parsing HTML strings, querying elements using CSS selectors, manipulating DOM structures, and serializing back to HTML.
npm install node-html-parserimport { parse } from "node-html-parser";For named imports:
import { parse, HTMLElement, TextNode, CommentNode, NodeType, valid } from "node-html-parser";For CommonJS:
const { parse } = require("node-html-parser");import { parse } from "node-html-parser";
// Parse HTML string
const root = parse('<ul id="list"><li>Hello World</li></ul>');
// Query elements
const listItem = root.querySelector('li');
console.log(listItem.text); // "Hello World"
// Manipulate DOM
const newLi = parse('<li>New Item</li>');
root.appendChild(newLi);
// Access attributes
const list = root.querySelector('#list');
console.log(list.id); // "list"
// Convert back to HTML
console.log(root.toString());Node HTML Parser is built around several key components:
Core HTML parsing functionality that converts HTML strings into manipulable DOM trees with configurable parsing options.
function parse(data: string, options?: Partial<Options>): HTMLElement;Complete HTMLElement implementation with DOM manipulation methods, property access, and web-standard APIs for content modification.
class HTMLElement extends Node {
// Properties
tagName: string;
id: string;
classList: DOMTokenList;
innerHTML: string;
textContent: string;
// Methods
appendChild<T extends Node>(node: T): T;
querySelector(selector: string): HTMLElement | null;
getAttribute(key: string): string | undefined;
setAttribute(key: string, value: string): HTMLElement;
}Base Node classes and node type system including TextNode and CommentNode for complete DOM tree representation.
abstract class Node {
childNodes: Node[];
parentNode: HTMLElement | null;
textContent: string;
remove(): Node;
}
class TextNode extends Node {
text: string;
rawText: string;
isWhitespace: boolean;
}
class CommentNode extends Node {
rawText: string;
}
enum NodeType {
ELEMENT_NODE = 1,
TEXT_NODE = 3,
COMMENT_NODE = 8
}Powerful element querying capabilities using CSS selectors, tag names, IDs, and DOM traversal methods.
// CSS selector queries
querySelector(selector: string): HTMLElement | null;
querySelectorAll(selector: string): HTMLElement[];
// Element queries
getElementsByTagName(tagName: string): HTMLElement[];
getElementById(id: string): HTMLElement | null;
closest(selector: string): HTMLElement | null;Comprehensive attribute manipulation and property access with support for both raw and decoded attribute values.
// Attribute methods
getAttribute(key: string): string | undefined;
setAttribute(key: string, value: string): HTMLElement;
removeAttribute(key: string): HTMLElement;
hasAttribute(key: string): boolean;
// Property access
get attributes(): Record<string, string>;
get rawAttributes(): RawAttributes;
get classList(): DOMTokenList;interface Options {
lowerCaseTagName?: boolean;
comment?: boolean;
fixNestedATags?: boolean;
parseNoneClosedTags?: boolean;
blockTextElements?: { [tag: string]: boolean };
voidTag?: {
tags?: string[];
closingSlash?: boolean;
};
}
interface Attributes {
[key: string]: string;
}
type InsertPosition = 'beforebegin' | 'afterbegin' | 'beforeend' | 'afterend';
type NodeInsertable = Node | string;