0
# Document Transforms
1
2
Document transformation utilities for modifying document elements before conversion, enabling custom preprocessing of document structure.
3
4
**Note**: The API for document transforms should be considered unstable and may change between versions. Pin to a specific version if you rely on this behavior.
5
6
## transforms.paragraph
7
8
Apply a transformation to paragraph elements in the document.
9
10
```javascript { .api }
11
function paragraph(transform: (element: any) => any): (element: any) => any;
12
```
13
14
### Parameters
15
16
- `transform`: Function that takes a paragraph element and returns the modified element
17
18
### Returns
19
20
A transformation function that can be used with the `transformDocument` option.
21
22
### Usage Example
23
24
```javascript
25
const mammoth = require("mammoth");
26
27
function transformParagraph(element) {
28
// Convert center-aligned paragraphs to headings
29
if (element.alignment === "center" && !element.styleId) {
30
return {...element, styleId: "Heading2"};
31
}
32
return element;
33
}
34
35
const options = {
36
transformDocument: mammoth.transforms.paragraph(transformParagraph)
37
};
38
39
mammoth.convertToHtml({path: "document.docx"}, options);
40
```
41
42
## transforms.run
43
44
Apply a transformation to run elements (text runs) in the document.
45
46
```javascript { .api }
47
function run(transform: (element: any) => any): (element: any) => any;
48
```
49
50
### Parameters
51
52
- `transform`: Function that takes a run element and returns the modified element
53
54
### Returns
55
56
A transformation function that can be used with the `transformDocument` option.
57
58
### Usage Example
59
60
```javascript
61
function transformRun(element) {
62
// Convert runs with monospace font to code
63
if (element.font && element.font.name === "Courier New") {
64
return {...element, styleId: "Code"};
65
}
66
return element;
67
}
68
69
const options = {
70
transformDocument: mammoth.transforms.run(transformRun)
71
};
72
```
73
74
## transforms.getDescendants
75
76
Get all descendant elements from a document element.
77
78
```javascript { .api }
79
function getDescendants(element: any): any[];
80
```
81
82
### Parameters
83
84
- `element`: The document element to traverse
85
86
### Returns
87
88
Array of all descendant elements found in the element tree.
89
90
### Usage Example
91
92
```javascript
93
function analyzeDocument(documentElement) {
94
const allDescendants = mammoth.transforms.getDescendants(documentElement);
95
console.log(`Document contains ${allDescendants.length} elements`);
96
97
allDescendants.forEach(function(descendant) {
98
console.log(`Element type: ${descendant.type}`);
99
});
100
}
101
```
102
103
## transforms.getDescendantsOfType
104
105
Get all descendant elements of a specific type from a document element.
106
107
```javascript { .api }
108
function getDescendantsOfType(element: any, type: string): any[];
109
```
110
111
### Parameters
112
113
- `element`: The document element to traverse
114
- `type`: The element type to filter for (e.g., "paragraph", "run", "table")
115
116
### Returns
117
118
Array of descendant elements matching the specified type.
119
120
### Usage Example
121
122
```javascript
123
function countParagraphs(documentElement) {
124
const paragraphs = mammoth.transforms.getDescendantsOfType(documentElement, "paragraph");
125
console.log(`Document contains ${paragraphs.length} paragraphs`);
126
return paragraphs;
127
}
128
129
function findTables(documentElement) {
130
const tables = mammoth.transforms.getDescendantsOfType(documentElement, "table");
131
return tables;
132
}
133
```
134
135
## Manual Element Transformation
136
137
For more complex transformations, you can write your own recursive transformation function:
138
139
```javascript { .api }
140
function transformElement(element: any): any {
141
if (element.children) {
142
const children = element.children.map(transformElement);
143
element = {...element, children: children};
144
}
145
146
// Apply specific transformations based on element type
147
if (element.type === "paragraph") {
148
return transformParagraph(element);
149
} else if (element.type === "run") {
150
return transformRun(element);
151
}
152
153
return element;
154
}
155
```
156
157
### Usage Example
158
159
```javascript
160
function transformElement(element) {
161
// Recursively transform children first
162
if (element.children) {
163
const children = element.children.map(transformElement);
164
element = {...element, children: children};
165
}
166
167
// Transform paragraphs
168
if (element.type === "paragraph") {
169
// Convert center-aligned paragraphs to headings
170
if (element.alignment === "center" && !element.styleId) {
171
return {...element, styleId: "Heading2"};
172
}
173
174
// Convert paragraphs with specific text patterns
175
if (element.children && element.children.length > 0) {
176
const text = element.children
177
.filter(child => child.type === "text")
178
.map(child => child.value)
179
.join("");
180
181
if (text.startsWith("TODO:")) {
182
return {...element, styleId: "TodoItem"};
183
}
184
}
185
}
186
187
// Transform runs
188
if (element.type === "run") {
189
// Convert monospace font runs to code
190
if (element.font && element.font.name === "Courier New") {
191
return {...element, styleId: "Code"};
192
}
193
}
194
195
return element;
196
}
197
198
const options = {
199
transformDocument: transformElement
200
};
201
202
mammoth.convertToHtml({path: "document.docx"}, options);
203
```
204
205
## Common Element Types
206
207
Document elements you might encounter during transformation:
208
209
- `"paragraph"`: Paragraph elements
210
- `"run"`: Text runs within paragraphs
211
- `"text"`: Text content
212
- `"table"`: Table elements
213
- `"table-row"`: Table row elements
214
- `"table-cell"`: Table cell elements
215
- `"hyperlink"`: Link elements
216
- `"image"`: Image elements
217
- `"line-break"`: Line break elements
218
- `"footnote-reference"`: Footnote references
219
- `"endnote-reference"`: Endnote references
220
221
## Element Properties
222
223
Common properties found on document elements:
224
225
### Paragraph Elements
226
- `type`: "paragraph"
227
- `styleId`: Style identifier from the document
228
- `styleName`: Human-readable style name
229
- `alignment`: Text alignment ("left", "center", "right", "justify")
230
- `children`: Array of child elements
231
232
### Run Elements
233
- `type`: "run"
234
- `font`: Font information object
235
- `isBold`: Boolean indicating bold formatting
236
- `isItalic`: Boolean indicating italic formatting
237
- `isUnderline`: Boolean indicating underline formatting
238
- `isStrikethrough`: Boolean indicating strikethrough formatting
239
- `verticalAlignment`: "superscript" or "subscript"
240
- `children`: Array of child elements (usually text)
241
242
### Text Elements
243
- `type`: "text"
244
- `value`: The actual text content