0
# HTML Parsing
1
2
Core HTML parsing functionality that converts HTML strings into manipulable DOM trees with comprehensive configuration options for different parsing scenarios.
3
4
## Capabilities
5
6
### Parse Function
7
8
Main parsing function that converts HTML strings to DOM trees with optional configuration.
9
10
```typescript { .api }
11
/**
12
* Parses HTML and returns a root element containing the DOM tree
13
* @param data - HTML string to parse
14
* @param options - Optional parsing configuration
15
* @returns Root HTMLElement containing parsed DOM
16
*/
17
function parse(data: string, options?: Partial<Options>): HTMLElement;
18
```
19
20
**Usage Examples:**
21
22
```typescript
23
import { parse } from "node-html-parser";
24
25
// Basic parsing
26
const root = parse('<div>Hello World</div>');
27
28
// With parsing options
29
const root = parse('<div>Content</div>', {
30
lowerCaseTagName: true,
31
comment: true,
32
voidTag: {
33
closingSlash: true
34
}
35
});
36
37
// Parse complex HTML
38
const html = `
39
<html>
40
<head><title>Test</title></head>
41
<body>
42
<div class="container">
43
<p>Paragraph content</p>
44
<!-- This is a comment -->
45
</div>
46
</body>
47
</html>`;
48
49
const document = parse(html, { comment: true });
50
```
51
52
### HTML Validation
53
54
Validates if HTML string parses to a single root element.
55
56
```typescript { .api }
57
/**
58
* Validates HTML structure by checking if it parses to single root
59
* @param data - HTML string to validate
60
* @param options - Optional parsing configuration
61
* @returns true if HTML is valid (single root), false otherwise
62
*/
63
function valid(data: string, options?: Partial<Options>): boolean;
64
```
65
66
**Usage Examples:**
67
68
```typescript
69
import { valid } from "node-html-parser";
70
71
// Valid HTML (single root)
72
console.log(valid('<div><p>Content</p></div>')); // true
73
74
// Invalid HTML (multiple roots)
75
console.log(valid('<div>First</div><div>Second</div>')); // false
76
77
// With options
78
console.log(valid('<DIV>Content</DIV>', { lowerCaseTagName: true })); // true
79
```
80
81
### Parsing Options
82
83
Comprehensive configuration interface for customizing parsing behavior.
84
85
```typescript { .api }
86
interface Options {
87
/** Convert all tag names to lowercase */
88
lowerCaseTagName?: boolean;
89
90
/** Parse and include comment nodes in the DOM tree */
91
comment?: boolean;
92
93
/** Fix nested anchor tags by properly closing them */
94
fixNestedATags?: boolean;
95
96
/** Parse tags that don't have closing tags */
97
parseNoneClosedTags?: boolean;
98
99
/** Define which elements should preserve their text content as-is */
100
blockTextElements?: { [tag: string]: boolean };
101
102
/** Void element configuration */
103
voidTag?: {
104
/** Custom list of void elements (defaults to HTML5 void elements) */
105
tags?: string[];
106
/** Add closing slash to void elements (e.g., <br/>) */
107
closingSlash?: boolean;
108
};
109
}
110
```
111
112
**Default Values:**
113
114
```typescript
115
// Default blockTextElements (when not specified)
116
{
117
script: true,
118
noscript: true,
119
style: true,
120
pre: true
121
}
122
123
// Default void elements (HTML5 standard)
124
['area', 'base', 'br', 'col', 'embed', 'hr', 'img', 'input', 'link', 'meta', 'param', 'source', 'track', 'wbr']
125
```
126
127
**Configuration Examples:**
128
129
```typescript
130
import { parse } from "node-html-parser";
131
132
// Preserve original case
133
const root = parse('<DIV>Content</DIV>', {
134
lowerCaseTagName: false
135
});
136
137
// Include comments in parsing
138
const withComments = parse('<!-- comment --><div>content</div>', {
139
comment: true
140
});
141
142
// Custom void elements with closing slashes
143
const customVoid = parse('<custom-void></custom-void>', {
144
voidTag: {
145
tags: ['custom-void'],
146
closingSlash: true
147
}
148
});
149
150
// Custom block text elements
151
const customBlocks = parse('<code>preserved content</code>', {
152
blockTextElements: {
153
code: true,
154
pre: true
155
}
156
});
157
```
158
159
## Performance Considerations
160
161
- Designed for speed over strict HTML specification compliance
162
- Handles most common malformed HTML patterns
163
- Optimized for processing large HTML files
164
- Uses simplified DOM structure for better performance
165
- May not parse all edge cases of malformed HTML correctly
166
167
## Static Properties
168
169
The parse function exposes additional utilities as static properties:
170
171
```typescript { .api }
172
// Access to internal classes and utilities
173
parse.HTMLElement: typeof HTMLElement;
174
parse.Node: typeof Node;
175
parse.TextNode: typeof TextNode;
176
parse.CommentNode: typeof CommentNode;
177
parse.NodeType: typeof NodeType;
178
parse.valid: typeof valid;
179
parse.parse: typeof baseParse; // Internal parsing function
180
```
181
182
**Usage:**
183
184
```typescript
185
import { parse } from "node-html-parser";
186
187
// Create elements directly
188
const element = new parse.HTMLElement('div', {}, '');
189
190
// Check node types
191
if (node.nodeType === parse.NodeType.ELEMENT_NODE) {
192
// Handle element node
193
}
194
195
// Use validation
196
const isValid = parse.valid('<div>content</div>');
197
```