0
# Rule System
1
2
Turndown's rule system provides fine-grained control over how HTML elements are converted to Markdown. The system uses a flexible filter-and-replacement pattern that allows both built-in and custom conversion logic.
3
4
## Capabilities
5
6
### Rule Definition
7
8
Rules define how specific HTML elements should be converted to Markdown using a filter and replacement function.
9
10
```javascript { .api }
11
/**
12
* Rule object structure
13
*/
14
interface Rule {
15
/** Selector that determines which HTML elements this rule applies to */
16
filter: string | string[] | Function;
17
/** Function that converts the matched element to Markdown */
18
replacement: Function;
19
/** Optional function that appends content after processing (used internally) */
20
append?: Function;
21
}
22
23
/**
24
* Replacement function signature
25
* @param {string} content - The inner content of the element
26
* @param {HTMLElement} node - The DOM node being converted
27
* @param {TurndownOptions} options - TurndownService options
28
* @returns {string} Markdown representation
29
*/
30
type ReplacementFunction = (content: string, node: HTMLElement, options: TurndownOptions) => string;
31
```
32
33
### Adding Custom Rules
34
35
Add custom conversion rules to handle specific HTML elements or patterns.
36
37
```javascript { .api }
38
/**
39
* Add a custom conversion rule
40
* @param {string} key - Unique identifier for the rule
41
* @param {Rule} rule - Rule object with filter and replacement
42
* @returns {TurndownService} TurndownService instance for chaining
43
*/
44
addRule(key, rule)
45
```
46
47
**Usage Examples:**
48
49
```javascript
50
const turndownService = new TurndownService();
51
52
// Simple element conversion
53
turndownService.addRule('strikethrough', {
54
filter: ['del', 's', 'strike'],
55
replacement: function(content) {
56
return '~~' + content + '~~';
57
}
58
});
59
60
// Conditional rule with function filter
61
turndownService.addRule('customLink', {
62
filter: function(node, options) {
63
return (
64
node.nodeName === 'A' &&
65
node.getAttribute('href') &&
66
node.getAttribute('data-custom')
67
);
68
},
69
replacement: function(content, node) {
70
const href = node.getAttribute('href');
71
const custom = node.getAttribute('data-custom');
72
return `[${content}](${href} "${custom}")`;
73
}
74
});
75
76
// Complex content processing
77
turndownService.addRule('highlight', {
78
filter: 'mark',
79
replacement: function(content, node, options) {
80
// Use options to customize output
81
if (options.highlightStyle === 'html') {
82
return '<mark>' + content + '</mark>';
83
}
84
return '==' + content + '==';
85
}
86
});
87
```
88
89
### Filter Types
90
91
Rules use filters to select which HTML elements they should handle.
92
93
```javascript { .api }
94
/**
95
* Filter types for selecting HTML elements
96
*/
97
type RuleFilter = string | string[] | FilterFunction;
98
99
/**
100
* Function filter signature
101
*/
102
type FilterFunction = (node: HTMLElement, options: TurndownOptions) => boolean;
103
104
// Examples:
105
filter: 'p' // String filter - matches <p> elements
106
filter: ['em', 'i'] // Array filter - matches <em> or <i> elements
107
108
// Function filter - custom logic for matching elements
109
filter: function(node, options) {
110
return node.nodeName === 'DIV' && node.className.includes('special');
111
}
112
```
113
114
### Built-in Rules
115
116
Turndown includes comprehensive built-in rules for standard HTML elements.
117
118
```javascript { .api }
119
/**
120
* Built-in CommonMark rules (partial list)
121
*/
122
const BuiltInRules = {
123
paragraph: { filter: 'p' },
124
lineBreak: { filter: 'br' },
125
heading: { filter: ['h1', 'h2', 'h3', 'h4', 'h5', 'h6'] },
126
blockquote: { filter: 'blockquote' },
127
list: { filter: ['ul', 'ol'] },
128
listItem: { filter: 'li' },
129
indentedCodeBlock: { filter: function(node, options) { /* ... */ } },
130
fencedCodeBlock: { filter: function(node, options) { /* ... */ } },
131
horizontalRule: { filter: 'hr' },
132
inlineLink: { filter: function(node, options) { /* ... */ } },
133
referenceLink: { filter: function(node, options) { /* ... */ } },
134
emphasis: { filter: ['em', 'i'] },
135
strong: { filter: ['strong', 'b'] },
136
code: { filter: function(node) { /* ... */ } },
137
image: { filter: 'img' }
138
};
139
```
140
141
### Rule Precedence
142
143
Rules are applied in a specific order of precedence:
144
145
1. **Blank rule** - Handles blank/empty elements
146
2. **Added rules** - Custom rules added via `addRule()`
147
3. **CommonMark rules** - Built-in HTML to Markdown conversion rules
148
4. **Keep rules** - Elements marked to keep as HTML via `keep()`
149
5. **Remove rules** - Elements marked for removal via `remove()`
150
6. **Default rule** - Fallback for unmatched elements
151
152
### Special Rules
153
154
Turndown uses special internal rules for edge cases and element control.
155
156
```javascript { .api }
157
/**
158
* Special rule types used internally
159
*/
160
interface SpecialRules {
161
/** Handles elements that contain only whitespace */
162
blankRule: {
163
replacement: (content: string, node: HTMLElement) => string;
164
};
165
166
/** Handles elements marked to keep as HTML */
167
keepReplacement: (content: string, node: HTMLElement) => string;
168
169
/** Handles unrecognized elements */
170
defaultRule: {
171
replacement: (content: string, node: HTMLElement) => string;
172
};
173
}
174
```
175
176
### Keep and Remove Rules
177
178
Control element processing with keep and remove operations.
179
180
```javascript { .api }
181
/**
182
* Keep elements as HTML in the output
183
* @param {string|string[]|Function} filter - Elements to keep
184
* @returns {TurndownService} Instance for chaining
185
*/
186
keep(filter)
187
188
/**
189
* Remove elements entirely from output
190
* @param {string|string[]|Function} filter - Elements to remove
191
* @returns {TurndownService} Instance for chaining
192
*/
193
remove(filter)
194
```
195
196
**Usage Examples:**
197
198
```javascript
199
const turndownService = new TurndownService();
200
201
// Keep specific elements as HTML
202
turndownService.keep(['del', 'ins', 'sub', 'sup']);
203
const html1 = '<p>H<sub>2</sub>O and E=mc<sup>2</sup></p>';
204
const result1 = turndownService.turndown(html1);
205
// Result: "H<sub>2</sub>O and E=mc<sup>2</sup>"
206
207
// Remove unwanted elements
208
turndownService.remove(['script', 'style', 'noscript']);
209
const html2 = '<p>Content</p><script>alert("bad")</script><style>body{}</style>';
210
const result2 = turndownService.turndown(html2);
211
// Result: "Content"
212
213
// Function-based keep/remove
214
turndownService.keep(function(node) {
215
return node.nodeName === 'SPAN' && node.className.includes('preserve');
216
});
217
218
turndownService.remove(function(node) {
219
return node.hasAttribute('data-remove');
220
});
221
```
222
223
### Advanced Rule Patterns
224
225
Complex rule implementations for specialized conversion needs.
226
227
**Content Transformation:**
228
229
```javascript
230
turndownService.addRule('codeWithLanguage', {
231
filter: function(node) {
232
return (
233
node.nodeName === 'PRE' &&
234
node.firstChild &&
235
node.firstChild.nodeName === 'CODE' &&
236
node.firstChild.className
237
);
238
},
239
replacement: function(content, node, options) {
240
const codeNode = node.firstChild;
241
const className = codeNode.getAttribute('class') || '';
242
const language = (className.match(/language-(\S+)/) || [null, ''])[1];
243
const code = codeNode.textContent;
244
245
return '\n\n```' + language + '\n' + code + '\n```\n\n';
246
}
247
});
248
```
249
250
**Attribute Processing:**
251
252
```javascript
253
turndownService.addRule('linkWithTitle', {
254
filter: function(node) {
255
return (
256
node.nodeName === 'A' &&
257
node.getAttribute('href') &&
258
node.getAttribute('title')
259
);
260
},
261
replacement: function(content, node) {
262
const href = node.getAttribute('href');
263
const title = node.getAttribute('title').replace(/"/g, '\\"');
264
return `[${content}](${href} "${title}")`;
265
}
266
});
267
```
268
269
**Nested Content Handling:**
270
271
```javascript
272
turndownService.addRule('definition', {
273
filter: 'dl',
274
replacement: function(content, node) {
275
// Process definition list with custom formatting
276
const items = [];
277
const children = Array.from(node.children);
278
279
for (let i = 0; i < children.length; i += 2) {
280
const dt = children[i];
281
const dd = children[i + 1];
282
if (dt && dd && dt.nodeName === 'DT' && dd.nodeName === 'DD') {
283
items.push(`**${dt.textContent}**\n: ${dd.textContent}`);
284
}
285
}
286
287
return '\n\n' + items.join('\n\n') + '\n\n';
288
}
289
});
290
```
291
292
## Rule Development Guidelines
293
294
### Filter Best Practices
295
296
- Use string filters for simple tag matching
297
- Use array filters for multiple tags with identical processing
298
- Use function filters for complex conditions involving attributes, content, or context
299
- Always check node existence and properties in function filters
300
301
### Replacement Function Guidelines
302
303
- Handle empty content gracefully
304
- Respect the options parameter for customizable behavior
305
- Use the node parameter to access attributes and context
306
- Return empty string to effectively remove elements
307
- Apply proper spacing for block vs inline elements
308
309
### Performance Considerations
310
311
- Function filters are evaluated for every element, so keep them efficient
312
- Cache expensive computations outside the replacement function when possible
313
- Use built-in utility functions from Turndown where available
314
315
### Testing Custom Rules
316
317
```javascript
318
// Test rule with various inputs
319
const turndownService = new TurndownService();
320
turndownService.addRule('testRule', myRule);
321
322
const testCases = [
323
'<div class="special">Content</div>',
324
'<div>Regular content</div>',
325
'<div class="special"></div>',
326
];
327
328
testCases.forEach(html => {
329
console.log('Input:', html);
330
console.log('Output:', turndownService.turndown(html));
331
});
332
```