0
# Document Conversion
1
2
Core functionality for converting DOCX documents to HTML and Markdown formats, with support for custom style mappings and conversion options.
3
4
## convertToHtml
5
6
Converts the source document to HTML.
7
8
```javascript { .api }
9
function convertToHtml(input: Input, options?: Options): Promise<Result>;
10
```
11
12
### Parameters
13
14
- `input`: Document input - can be a file path, Buffer, or ArrayBuffer
15
- `{path: string}` - Path to the .docx file (Node.js)
16
- `{buffer: Buffer}` - Buffer containing .docx file (Node.js)
17
- `{arrayBuffer: ArrayBuffer}` - ArrayBuffer containing .docx file (Browser)
18
19
- `options` (optional): Conversion options
20
- `styleMap`: Custom style mappings (string or string array)
21
- `includeEmbeddedStyleMap`: Include embedded style maps (default: true)
22
- `includeDefaultStyleMap`: Include default style mappings (default: true)
23
- `convertImage`: Custom image converter function
24
- `ignoreEmptyParagraphs`: Ignore empty paragraphs (default: true)
25
- `idPrefix`: Prefix for generated IDs (default: "")
26
- `transformDocument`: Document transformation function
27
28
### Returns
29
30
Promise resolving to a Result object:
31
- `value`: The generated HTML string
32
- `messages`: Array of warnings/errors during conversion
33
34
### Usage Examples
35
36
#### Basic HTML Conversion
37
38
```javascript
39
const mammoth = require("mammoth");
40
41
mammoth.convertToHtml({path: "document.docx"})
42
.then(function(result){
43
const html = result.value;
44
const messages = result.messages;
45
console.log(html);
46
})
47
.catch(function(error) {
48
console.error(error);
49
});
50
```
51
52
#### With Custom Style Mapping
53
54
```javascript
55
const options = {
56
styleMap: [
57
"p[style-name='Section Title'] => h1:fresh",
58
"p[style-name='Subsection Title'] => h2:fresh"
59
]
60
};
61
62
mammoth.convertToHtml({path: "document.docx"}, options);
63
```
64
65
#### With Custom Image Handler
66
67
```javascript
68
const options = {
69
convertImage: mammoth.images.imgElement(function(image) {
70
return image.readAsBase64String().then(function(imageBuffer) {
71
return {
72
src: "data:" + image.contentType + ";base64," + imageBuffer
73
};
74
});
75
})
76
};
77
78
mammoth.convertToHtml({buffer: docxBuffer}, options);
79
```
80
81
## convertToMarkdown
82
83
Converts the source document to Markdown. **Note**: Markdown support is deprecated.
84
85
```javascript { .api }
86
function convertToMarkdown(input: Input, options?: Options): Promise<Result>;
87
```
88
89
### Parameters
90
91
Same as `convertToHtml`, but returns Markdown instead of HTML.
92
93
### Returns
94
95
Promise resolving to a Result object:
96
- `value`: The generated Markdown string
97
- `messages`: Array of warnings/errors during conversion
98
99
### Usage Example
100
101
```javascript
102
mammoth.convertToMarkdown({path: "document.docx"})
103
.then(function(result){
104
const markdown = result.value;
105
console.log(markdown);
106
});
107
```
108
109
## extractRawText
110
111
Extract the raw text of the document, ignoring all formatting. Each paragraph is followed by two newlines.
112
113
```javascript { .api }
114
function extractRawText(input: Input): Promise<Result>;
115
```
116
117
### Parameters
118
119
- `input`: Document input (same format as convertToHtml)
120
121
### Returns
122
123
Promise resolving to a Result object:
124
- `value`: The raw text string
125
- `messages`: Array of warnings/errors during extraction
126
127
### Usage Example
128
129
```javascript
130
mammoth.extractRawText({path: "document.docx"})
131
.then(function(result){
132
const text = result.value;
133
console.log(text);
134
});
135
```
136
137
## Style Mapping Syntax
138
139
Style mappings control how Word styles are converted to HTML elements:
140
141
```javascript
142
// Basic style mapping
143
"p[style-name='Heading 1'] => h1"
144
145
// With CSS classes
146
"p[style-name='Warning'] => p.warning"
147
148
// Fresh elements (avoid nested elements)
149
"p[style-name='Title'] => h1:fresh"
150
151
// Character styles
152
"r[style-name='Code'] => code"
153
154
// Bold/italic/underline
155
"b => strong"
156
"i => em"
157
"u => span.underline"
158
```
159
160
## Supported Features
161
162
- Headings (h1-h6)
163
- Lists (ordered and unordered)
164
- Tables (structure preserved, styling ignored)
165
- Footnotes and endnotes
166
- Images (with customizable handling)
167
- Bold, italic, underline, strikethrough
168
- Superscript and subscript
169
- Links
170
- Line breaks
171
- Text boxes
172
- Comments (when enabled via style mapping)
173
174
## Security Considerations
175
176
**Mammoth performs no sanitization of the source document** and should be used extremely carefully with untrusted user input. Source documents can contain:
177
178
- Links with `javascript:` targets
179
- References to external files
180
- Malicious content that could lead to XSS or file access vulnerabilities
181
182
Always sanitize the output HTML when embedding in web pages.