0
# Node HTML Parser
1
2
Node HTML Parser is a very fast HTML parser that generates a simplified DOM tree with comprehensive element query support. Designed for high performance when processing large HTML files, it offers a complete API for parsing HTML strings, querying elements using CSS selectors, manipulating DOM structures, and serializing back to HTML.
3
4
## Package Information
5
6
- **Package Name**: node-html-parser
7
- **Package Type**: npm
8
- **Language**: TypeScript/JavaScript
9
- **Installation**: `npm install node-html-parser`
10
11
## Core Imports
12
13
```typescript
14
import { parse } from "node-html-parser";
15
```
16
17
For named imports:
18
19
```typescript
20
import { parse, HTMLElement, TextNode, CommentNode, NodeType, valid } from "node-html-parser";
21
```
22
23
For CommonJS:
24
25
```javascript
26
const { parse } = require("node-html-parser");
27
```
28
29
## Basic Usage
30
31
```typescript
32
import { parse } from "node-html-parser";
33
34
// Parse HTML string
35
const root = parse('<ul id="list"><li>Hello World</li></ul>');
36
37
// Query elements
38
const listItem = root.querySelector('li');
39
console.log(listItem.text); // "Hello World"
40
41
// Manipulate DOM
42
const newLi = parse('<li>New Item</li>');
43
root.appendChild(newLi);
44
45
// Access attributes
46
const list = root.querySelector('#list');
47
console.log(list.id); // "list"
48
49
// Convert back to HTML
50
console.log(root.toString());
51
```
52
53
## Architecture
54
55
Node HTML Parser is built around several key components:
56
57
- **Parse Function**: Main entry point for converting HTML strings to DOM trees
58
- **DOM Classes**: HTMLElement, TextNode, and CommentNode classes providing web-standard APIs
59
- **Query Engine**: CSS selector support via css-select integration for powerful element queries
60
- **Performance Focus**: Optimized for speed over strict HTML specification compliance
61
- **Simplified DOM**: Lightweight DOM structure for efficient processing of large HTML files
62
63
## Capabilities
64
65
### HTML Parsing
66
67
Core HTML parsing functionality that converts HTML strings into manipulable DOM trees with configurable parsing options.
68
69
```typescript { .api }
70
function parse(data: string, options?: Partial<Options>): HTMLElement;
71
```
72
73
[HTML Parsing](./parsing.md)
74
75
### DOM Elements
76
77
Complete HTMLElement implementation with DOM manipulation methods, property access, and web-standard APIs for content modification.
78
79
```typescript { .api }
80
class HTMLElement extends Node {
81
// Properties
82
tagName: string;
83
id: string;
84
classList: DOMTokenList;
85
innerHTML: string;
86
textContent: string;
87
88
// Methods
89
appendChild<T extends Node>(node: T): T;
90
querySelector(selector: string): HTMLElement | null;
91
getAttribute(key: string): string | undefined;
92
setAttribute(key: string, value: string): HTMLElement;
93
}
94
```
95
96
[DOM Elements](./dom-elements.md)
97
98
### Node Types
99
100
Base Node classes and node type system including TextNode and CommentNode for complete DOM tree representation.
101
102
```typescript { .api }
103
abstract class Node {
104
childNodes: Node[];
105
parentNode: HTMLElement | null;
106
textContent: string;
107
remove(): Node;
108
}
109
110
class TextNode extends Node {
111
text: string;
112
rawText: string;
113
isWhitespace: boolean;
114
}
115
116
class CommentNode extends Node {
117
rawText: string;
118
}
119
120
enum NodeType {
121
ELEMENT_NODE = 1,
122
TEXT_NODE = 3,
123
COMMENT_NODE = 8
124
}
125
```
126
127
[Node Types](./node-types.md)
128
129
### Query & Selection
130
131
Powerful element querying capabilities using CSS selectors, tag names, IDs, and DOM traversal methods.
132
133
```typescript { .api }
134
// CSS selector queries
135
querySelector(selector: string): HTMLElement | null;
136
querySelectorAll(selector: string): HTMLElement[];
137
138
// Element queries
139
getElementsByTagName(tagName: string): HTMLElement[];
140
getElementById(id: string): HTMLElement | null;
141
closest(selector: string): HTMLElement | null;
142
```
143
144
[Query & Selection](./query-selection.md)
145
146
### Attributes & Properties
147
148
Comprehensive attribute manipulation and property access with support for both raw and decoded attribute values.
149
150
```typescript { .api }
151
// Attribute methods
152
getAttribute(key: string): string | undefined;
153
setAttribute(key: string, value: string): HTMLElement;
154
removeAttribute(key: string): HTMLElement;
155
hasAttribute(key: string): boolean;
156
157
// Property access
158
get attributes(): Record<string, string>;
159
get rawAttributes(): RawAttributes;
160
get classList(): DOMTokenList;
161
```
162
163
[Attributes & Properties](./attributes-properties.md)
164
165
## Types
166
167
```typescript { .api }
168
interface Options {
169
lowerCaseTagName?: boolean;
170
comment?: boolean;
171
fixNestedATags?: boolean;
172
parseNoneClosedTags?: boolean;
173
blockTextElements?: { [tag: string]: boolean };
174
voidTag?: {
175
tags?: string[];
176
closingSlash?: boolean;
177
};
178
}
179
180
interface Attributes {
181
[key: string]: string;
182
}
183
184
type InsertPosition = 'beforebegin' | 'afterbegin' | 'beforeend' | 'afterend';
185
type NodeInsertable = Node | string;
186
```