0
# Mammoth
1
2
Mammoth is designed to convert .docx documents, such as those created by Microsoft Word, Google Docs and LibreOffice, to HTML and Markdown formats. It focuses on semantic markup preservation rather than visual formatting, converting document styles (like Heading 1) to appropriate HTML elements (like h1 tags) while ignoring font styling details.
3
4
## Package Information
5
6
- **Package Name**: mammoth
7
- **Package Type**: npm
8
- **Language**: JavaScript with TypeScript definitions
9
- **Installation**: `npm install mammoth`
10
11
## Core Imports
12
13
```javascript
14
const mammoth = require("mammoth");
15
```
16
17
TypeScript:
18
19
```typescript
20
import mammoth = require("mammoth");
21
// or
22
const mammoth = require("mammoth");
23
```
24
25
Browser (standalone):
26
27
```javascript
28
// Include mammoth.browser.js or mammoth.browser.min.js
29
const mammoth = window.mammoth;
30
```
31
32
## Basic Usage
33
34
```javascript
35
const mammoth = require("mammoth");
36
37
// Convert DOCX to HTML
38
mammoth.convertToHtml({path: "document.docx"})
39
.then(function(result){
40
const html = result.value; // The generated HTML
41
const messages = result.messages; // Any messages, such as warnings
42
})
43
.catch(function(error) {
44
console.error(error);
45
});
46
47
// Extract raw text
48
mammoth.extractRawText({path: "document.docx"})
49
.then(function(result){
50
const text = result.value; // The raw text
51
const messages = result.messages;
52
});
53
```
54
55
## CLI Usage
56
57
Mammoth also provides a command-line interface:
58
59
```bash
60
# Convert DOCX to HTML
61
mammoth document.docx output.html
62
63
# Convert with style map
64
mammoth document.docx output.html --style-map=custom-style-map
65
66
# Convert to Markdown (deprecated)
67
mammoth document.docx --output-format=markdown
68
69
# Extract images to directory
70
mammoth document.docx --output-dir=output-dir
71
```
72
73
## Architecture
74
75
Mammoth is built around several key components:
76
77
- **Document Conversion**: Core DOCX to HTML/Markdown conversion with customizable style mappings
78
- **Image Processing**: Flexible image handling with built-in and custom converters
79
- **Document Transformation**: Pre-conversion document modification and element transforms
80
- **Style Mapping**: Custom styling rules for converting Word styles to HTML elements
81
82
## Capabilities
83
84
### Document Conversion
85
86
Core functionality for converting DOCX documents to HTML and Markdown formats, with support for custom style mappings and conversion options.
87
88
```javascript { .api }
89
function convertToHtml(input: Input, options?: Options): Promise<Result>;
90
function convertToMarkdown(input: Input, options?: Options): Promise<Result>;
91
function extractRawText(input: Input): Promise<Result>;
92
```
93
94
[Document Conversion](./conversion.md)
95
96
### Image Handling
97
98
Image conversion utilities for customizing how images in DOCX documents are processed and included in the output.
99
100
```javascript { .api }
101
const images: {
102
dataUri: ImageConverter;
103
imgElement: (func: (image: Image) => Promise<ImageAttributes>) => ImageConverter;
104
};
105
```
106
107
[Image Handling](./images.md)
108
109
### Document Transforms
110
111
Document transformation utilities for modifying document elements before conversion, enabling custom preprocessing of document structure.
112
113
```javascript { .api }
114
const transforms: {
115
paragraph: (transform: (element: any) => any) => (element: any) => any;
116
run: (transform: (element: any) => any) => (element: any) => any;
117
getDescendants: (element: any) => any[];
118
getDescendantsOfType: (element: any, type: string) => any[];
119
};
120
```
121
122
[Document Transforms](./transforms.md)
123
124
### Style Utilities
125
126
Utilities for handling underline and other styling elements in document conversion.
127
128
```javascript { .api }
129
const underline: {
130
element: (name: string) => (html: any) => any;
131
};
132
```
133
134
[Style Utilities](./styles.md)
135
136
### Style Map Management
137
138
Functions for embedding and reading custom style maps in DOCX documents.
139
140
```javascript { .api }
141
function embedStyleMap(input: Input, styleMap: string): Promise<{
142
toArrayBuffer: () => ArrayBuffer;
143
toBuffer: () => Buffer;
144
}>;
145
function readEmbeddedStyleMap(input: Input): Promise<string>;
146
```
147
148
[Style Map Management](./style-maps.md)
149
150
## Types
151
152
```javascript { .api }
153
type Input = PathInput | BufferInput | ArrayBufferInput;
154
155
interface PathInput {
156
path: string;
157
}
158
159
interface BufferInput {
160
buffer: Buffer;
161
}
162
163
interface ArrayBufferInput {
164
arrayBuffer: ArrayBuffer;
165
}
166
167
interface Options {
168
styleMap?: string | string[];
169
includeEmbeddedStyleMap?: boolean;
170
includeDefaultStyleMap?: boolean;
171
convertImage?: ImageConverter;
172
ignoreEmptyParagraphs?: boolean;
173
idPrefix?: string;
174
transformDocument?: (element: any) => any;
175
}
176
177
interface Result {
178
value: string;
179
messages: Message[];
180
}
181
182
type Message = Warning | Error;
183
184
interface Warning {
185
type: "warning";
186
message: string;
187
}
188
189
interface Error {
190
type: "error";
191
message: string;
192
error: unknown;
193
}
194
195
interface Image {
196
contentType: string;
197
readAsArrayBuffer(): Promise<ArrayBuffer>;
198
readAsBase64String(): Promise<string>;
199
readAsBuffer(): Promise<Buffer>;
200
read(): Promise<Buffer>;
201
read(encoding: string): Promise<string>;
202
}
203
204
interface ImageConverter {
205
__mammothBrand: "ImageConverter";
206
}
207
208
interface ImageAttributes {
209
src: string;
210
[key: string]: string;
211
}
212
```