or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

attributes-properties.mddom-elements.mdindex.mdnode-types.mdparsing.mdquery-selection.md

parsing.mddocs/

0

# HTML Parsing

1

2

Core HTML parsing functionality that converts HTML strings into manipulable DOM trees with comprehensive configuration options for different parsing scenarios.

3

4

## Capabilities

5

6

### Parse Function

7

8

Main parsing function that converts HTML strings to DOM trees with optional configuration.

9

10

```typescript { .api }

11

/**

12

* Parses HTML and returns a root element containing the DOM tree

13

* @param data - HTML string to parse

14

* @param options - Optional parsing configuration

15

* @returns Root HTMLElement containing parsed DOM

16

*/

17

function parse(data: string, options?: Partial<Options>): HTMLElement;

18

```

19

20

**Usage Examples:**

21

22

```typescript

23

import { parse } from "node-html-parser";

24

25

// Basic parsing

26

const root = parse('<div>Hello World</div>');

27

28

// With parsing options

29

const root = parse('<div>Content</div>', {

30

lowerCaseTagName: true,

31

comment: true,

32

voidTag: {

33

closingSlash: true

34

}

35

});

36

37

// Parse complex HTML

38

const html = `

39

<html>

40

<head><title>Test</title></head>

41

<body>

42

<div class="container">

43

<p>Paragraph content</p>

44

<!-- This is a comment -->

45

</div>

46

</body>

47

</html>`;

48

49

const document = parse(html, { comment: true });

50

```

51

52

### HTML Validation

53

54

Validates if HTML string parses to a single root element.

55

56

```typescript { .api }

57

/**

58

* Validates HTML structure by checking if it parses to single root

59

* @param data - HTML string to validate

60

* @param options - Optional parsing configuration

61

* @returns true if HTML is valid (single root), false otherwise

62

*/

63

function valid(data: string, options?: Partial<Options>): boolean;

64

```

65

66

**Usage Examples:**

67

68

```typescript

69

import { valid } from "node-html-parser";

70

71

// Valid HTML (single root)

72

console.log(valid('<div><p>Content</p></div>')); // true

73

74

// Invalid HTML (multiple roots)

75

console.log(valid('<div>First</div><div>Second</div>')); // false

76

77

// With options

78

console.log(valid('<DIV>Content</DIV>', { lowerCaseTagName: true })); // true

79

```

80

81

### Parsing Options

82

83

Comprehensive configuration interface for customizing parsing behavior.

84

85

```typescript { .api }

86

interface Options {

87

/** Convert all tag names to lowercase */

88

lowerCaseTagName?: boolean;

89

90

/** Parse and include comment nodes in the DOM tree */

91

comment?: boolean;

92

93

/** Fix nested anchor tags by properly closing them */

94

fixNestedATags?: boolean;

95

96

/** Parse tags that don't have closing tags */

97

parseNoneClosedTags?: boolean;

98

99

/** Define which elements should preserve their text content as-is */

100

blockTextElements?: { [tag: string]: boolean };

101

102

/** Void element configuration */

103

voidTag?: {

104

/** Custom list of void elements (defaults to HTML5 void elements) */

105

tags?: string[];

106

/** Add closing slash to void elements (e.g., <br/>) */

107

closingSlash?: boolean;

108

};

109

}

110

```

111

112

**Default Values:**

113

114

```typescript

115

// Default blockTextElements (when not specified)

116

{

117

script: true,

118

noscript: true,

119

style: true,

120

pre: true

121

}

122

123

// Default void elements (HTML5 standard)

124

['area', 'base', 'br', 'col', 'embed', 'hr', 'img', 'input', 'link', 'meta', 'param', 'source', 'track', 'wbr']

125

```

126

127

**Configuration Examples:**

128

129

```typescript

130

import { parse } from "node-html-parser";

131

132

// Preserve original case

133

const root = parse('<DIV>Content</DIV>', {

134

lowerCaseTagName: false

135

});

136

137

// Include comments in parsing

138

const withComments = parse('<!-- comment --><div>content</div>', {

139

comment: true

140

});

141

142

// Custom void elements with closing slashes

143

const customVoid = parse('<custom-void></custom-void>', {

144

voidTag: {

145

tags: ['custom-void'],

146

closingSlash: true

147

}

148

});

149

150

// Custom block text elements

151

const customBlocks = parse('<code>preserved content</code>', {

152

blockTextElements: {

153

code: true,

154

pre: true

155

}

156

});

157

```

158

159

## Performance Considerations

160

161

- Designed for speed over strict HTML specification compliance

162

- Handles most common malformed HTML patterns

163

- Optimized for processing large HTML files

164

- Uses simplified DOM structure for better performance

165

- May not parse all edge cases of malformed HTML correctly

166

167

## Static Properties

168

169

The parse function exposes additional utilities as static properties:

170

171

```typescript { .api }

172

// Access to internal classes and utilities

173

parse.HTMLElement: typeof HTMLElement;

174

parse.Node: typeof Node;

175

parse.TextNode: typeof TextNode;

176

parse.CommentNode: typeof CommentNode;

177

parse.NodeType: typeof NodeType;

178

parse.valid: typeof valid;

179

parse.parse: typeof baseParse; // Internal parsing function

180

```

181

182

**Usage:**

183

184

```typescript

185

import { parse } from "node-html-parser";

186

187

// Create elements directly

188

const element = new parse.HTMLElement('div', {}, '');

189

190

// Check node types

191

if (node.nodeType === parse.NodeType.ELEMENT_NODE) {

192

// Handle element node

193

}

194

195

// Use validation

196

const isValid = parse.valid('<div>content</div>');

197

```