or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

index.mdrules.md

rules.mddocs/

0

# Rule System

1

2

Turndown's rule system provides fine-grained control over how HTML elements are converted to Markdown. The system uses a flexible filter-and-replacement pattern that allows both built-in and custom conversion logic.

3

4

## Capabilities

5

6

### Rule Definition

7

8

Rules define how specific HTML elements should be converted to Markdown using a filter and replacement function.

9

10

```javascript { .api }

11

/**

12

* Rule object structure

13

*/

14

interface Rule {

15

/** Selector that determines which HTML elements this rule applies to */

16

filter: string | string[] | Function;

17

/** Function that converts the matched element to Markdown */

18

replacement: Function;

19

/** Optional function that appends content after processing (used internally) */

20

append?: Function;

21

}

22

23

/**

24

* Replacement function signature

25

* @param {string} content - The inner content of the element

26

* @param {HTMLElement} node - The DOM node being converted

27

* @param {TurndownOptions} options - TurndownService options

28

* @returns {string} Markdown representation

29

*/

30

type ReplacementFunction = (content: string, node: HTMLElement, options: TurndownOptions) => string;

31

```

32

33

### Adding Custom Rules

34

35

Add custom conversion rules to handle specific HTML elements or patterns.

36

37

```javascript { .api }

38

/**

39

* Add a custom conversion rule

40

* @param {string} key - Unique identifier for the rule

41

* @param {Rule} rule - Rule object with filter and replacement

42

* @returns {TurndownService} TurndownService instance for chaining

43

*/

44

addRule(key, rule)

45

```

46

47

**Usage Examples:**

48

49

```javascript

50

const turndownService = new TurndownService();

51

52

// Simple element conversion

53

turndownService.addRule('strikethrough', {

54

filter: ['del', 's', 'strike'],

55

replacement: function(content) {

56

return '~~' + content + '~~';

57

}

58

});

59

60

// Conditional rule with function filter

61

turndownService.addRule('customLink', {

62

filter: function(node, options) {

63

return (

64

node.nodeName === 'A' &&

65

node.getAttribute('href') &&

66

node.getAttribute('data-custom')

67

);

68

},

69

replacement: function(content, node) {

70

const href = node.getAttribute('href');

71

const custom = node.getAttribute('data-custom');

72

return `[${content}](${href} "${custom}")`;

73

}

74

});

75

76

// Complex content processing

77

turndownService.addRule('highlight', {

78

filter: 'mark',

79

replacement: function(content, node, options) {

80

// Use options to customize output

81

if (options.highlightStyle === 'html') {

82

return '<mark>' + content + '</mark>';

83

}

84

return '==' + content + '==';

85

}

86

});

87

```

88

89

### Filter Types

90

91

Rules use filters to select which HTML elements they should handle.

92

93

```javascript { .api }

94

/**

95

* Filter types for selecting HTML elements

96

*/

97

type RuleFilter = string | string[] | FilterFunction;

98

99

/**

100

* Function filter signature

101

*/

102

type FilterFunction = (node: HTMLElement, options: TurndownOptions) => boolean;

103

104

// Examples:

105

filter: 'p' // String filter - matches <p> elements

106

filter: ['em', 'i'] // Array filter - matches <em> or <i> elements

107

108

// Function filter - custom logic for matching elements

109

filter: function(node, options) {

110

return node.nodeName === 'DIV' && node.className.includes('special');

111

}

112

```

113

114

### Built-in Rules

115

116

Turndown includes comprehensive built-in rules for standard HTML elements.

117

118

```javascript { .api }

119

/**

120

* Built-in CommonMark rules (partial list)

121

*/

122

const BuiltInRules = {

123

paragraph: { filter: 'p' },

124

lineBreak: { filter: 'br' },

125

heading: { filter: ['h1', 'h2', 'h3', 'h4', 'h5', 'h6'] },

126

blockquote: { filter: 'blockquote' },

127

list: { filter: ['ul', 'ol'] },

128

listItem: { filter: 'li' },

129

indentedCodeBlock: { filter: function(node, options) { /* ... */ } },

130

fencedCodeBlock: { filter: function(node, options) { /* ... */ } },

131

horizontalRule: { filter: 'hr' },

132

inlineLink: { filter: function(node, options) { /* ... */ } },

133

referenceLink: { filter: function(node, options) { /* ... */ } },

134

emphasis: { filter: ['em', 'i'] },

135

strong: { filter: ['strong', 'b'] },

136

code: { filter: function(node) { /* ... */ } },

137

image: { filter: 'img' }

138

};

139

```

140

141

### Rule Precedence

142

143

Rules are applied in a specific order of precedence:

144

145

1. **Blank rule** - Handles blank/empty elements

146

2. **Added rules** - Custom rules added via `addRule()`

147

3. **CommonMark rules** - Built-in HTML to Markdown conversion rules

148

4. **Keep rules** - Elements marked to keep as HTML via `keep()`

149

5. **Remove rules** - Elements marked for removal via `remove()`

150

6. **Default rule** - Fallback for unmatched elements

151

152

### Special Rules

153

154

Turndown uses special internal rules for edge cases and element control.

155

156

```javascript { .api }

157

/**

158

* Special rule types used internally

159

*/

160

interface SpecialRules {

161

/** Handles elements that contain only whitespace */

162

blankRule: {

163

replacement: (content: string, node: HTMLElement) => string;

164

};

165

166

/** Handles elements marked to keep as HTML */

167

keepReplacement: (content: string, node: HTMLElement) => string;

168

169

/** Handles unrecognized elements */

170

defaultRule: {

171

replacement: (content: string, node: HTMLElement) => string;

172

};

173

}

174

```

175

176

### Keep and Remove Rules

177

178

Control element processing with keep and remove operations.

179

180

```javascript { .api }

181

/**

182

* Keep elements as HTML in the output

183

* @param {string|string[]|Function} filter - Elements to keep

184

* @returns {TurndownService} Instance for chaining

185

*/

186

keep(filter)

187

188

/**

189

* Remove elements entirely from output

190

* @param {string|string[]|Function} filter - Elements to remove

191

* @returns {TurndownService} Instance for chaining

192

*/

193

remove(filter)

194

```

195

196

**Usage Examples:**

197

198

```javascript

199

const turndownService = new TurndownService();

200

201

// Keep specific elements as HTML

202

turndownService.keep(['del', 'ins', 'sub', 'sup']);

203

const html1 = '<p>H<sub>2</sub>O and E=mc<sup>2</sup></p>';

204

const result1 = turndownService.turndown(html1);

205

// Result: "H<sub>2</sub>O and E=mc<sup>2</sup>"

206

207

// Remove unwanted elements

208

turndownService.remove(['script', 'style', 'noscript']);

209

const html2 = '<p>Content</p><script>alert("bad")</script><style>body{}</style>';

210

const result2 = turndownService.turndown(html2);

211

// Result: "Content"

212

213

// Function-based keep/remove

214

turndownService.keep(function(node) {

215

return node.nodeName === 'SPAN' && node.className.includes('preserve');

216

});

217

218

turndownService.remove(function(node) {

219

return node.hasAttribute('data-remove');

220

});

221

```

222

223

### Advanced Rule Patterns

224

225

Complex rule implementations for specialized conversion needs.

226

227

**Content Transformation:**

228

229

```javascript

230

turndownService.addRule('codeWithLanguage', {

231

filter: function(node) {

232

return (

233

node.nodeName === 'PRE' &&

234

node.firstChild &&

235

node.firstChild.nodeName === 'CODE' &&

236

node.firstChild.className

237

);

238

},

239

replacement: function(content, node, options) {

240

const codeNode = node.firstChild;

241

const className = codeNode.getAttribute('class') || '';

242

const language = (className.match(/language-(\S+)/) || [null, ''])[1];

243

const code = codeNode.textContent;

244

245

return '\n\n```' + language + '\n' + code + '\n```\n\n';

246

}

247

});

248

```

249

250

**Attribute Processing:**

251

252

```javascript

253

turndownService.addRule('linkWithTitle', {

254

filter: function(node) {

255

return (

256

node.nodeName === 'A' &&

257

node.getAttribute('href') &&

258

node.getAttribute('title')

259

);

260

},

261

replacement: function(content, node) {

262

const href = node.getAttribute('href');

263

const title = node.getAttribute('title').replace(/"/g, '\\"');

264

return `[${content}](${href} "${title}")`;

265

}

266

});

267

```

268

269

**Nested Content Handling:**

270

271

```javascript

272

turndownService.addRule('definition', {

273

filter: 'dl',

274

replacement: function(content, node) {

275

// Process definition list with custom formatting

276

const items = [];

277

const children = Array.from(node.children);

278

279

for (let i = 0; i < children.length; i += 2) {

280

const dt = children[i];

281

const dd = children[i + 1];

282

if (dt && dd && dt.nodeName === 'DT' && dd.nodeName === 'DD') {

283

items.push(`**${dt.textContent}**\n: ${dd.textContent}`);

284

}

285

}

286

287

return '\n\n' + items.join('\n\n') + '\n\n';

288

}

289

});

290

```

291

292

## Rule Development Guidelines

293

294

### Filter Best Practices

295

296

- Use string filters for simple tag matching

297

- Use array filters for multiple tags with identical processing

298

- Use function filters for complex conditions involving attributes, content, or context

299

- Always check node existence and properties in function filters

300

301

### Replacement Function Guidelines

302

303

- Handle empty content gracefully

304

- Respect the options parameter for customizable behavior

305

- Use the node parameter to access attributes and context

306

- Return empty string to effectively remove elements

307

- Apply proper spacing for block vs inline elements

308

309

### Performance Considerations

310

311

- Function filters are evaluated for every element, so keep them efficient

312

- Cache expensive computations outside the replacement function when possible

313

- Use built-in utility functions from Turndown where available

314

315

### Testing Custom Rules

316

317

```javascript

318

// Test rule with various inputs

319

const turndownService = new TurndownService();

320

turndownService.addRule('testRule', myRule);

321

322

const testCases = [

323

'<div class="special">Content</div>',

324

'<div>Regular content</div>',

325

'<div class="special"></div>',

326

];

327

328

testCases.forEach(html => {

329

console.log('Input:', html);

330

console.log('Output:', turndownService.turndown(html));

331

});

332

```