or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

conversion.mdimages.mdindex.mdstyle-maps.mdstyles.mdtransforms.md

conversion.mddocs/

0

# Document Conversion

1

2

Core functionality for converting DOCX documents to HTML and Markdown formats, with support for custom style mappings and conversion options.

3

4

## convertToHtml

5

6

Converts the source document to HTML.

7

8

```javascript { .api }

9

function convertToHtml(input: Input, options?: Options): Promise<Result>;

10

```

11

12

### Parameters

13

14

- `input`: Document input - can be a file path, Buffer, or ArrayBuffer

15

- `{path: string}` - Path to the .docx file (Node.js)

16

- `{buffer: Buffer}` - Buffer containing .docx file (Node.js)

17

- `{arrayBuffer: ArrayBuffer}` - ArrayBuffer containing .docx file (Browser)

18

19

- `options` (optional): Conversion options

20

- `styleMap`: Custom style mappings (string or string array)

21

- `includeEmbeddedStyleMap`: Include embedded style maps (default: true)

22

- `includeDefaultStyleMap`: Include default style mappings (default: true)

23

- `convertImage`: Custom image converter function

24

- `ignoreEmptyParagraphs`: Ignore empty paragraphs (default: true)

25

- `idPrefix`: Prefix for generated IDs (default: "")

26

- `transformDocument`: Document transformation function

27

28

### Returns

29

30

Promise resolving to a Result object:

31

- `value`: The generated HTML string

32

- `messages`: Array of warnings/errors during conversion

33

34

### Usage Examples

35

36

#### Basic HTML Conversion

37

38

```javascript

39

const mammoth = require("mammoth");

40

41

mammoth.convertToHtml({path: "document.docx"})

42

.then(function(result){

43

const html = result.value;

44

const messages = result.messages;

45

console.log(html);

46

})

47

.catch(function(error) {

48

console.error(error);

49

});

50

```

51

52

#### With Custom Style Mapping

53

54

```javascript

55

const options = {

56

styleMap: [

57

"p[style-name='Section Title'] => h1:fresh",

58

"p[style-name='Subsection Title'] => h2:fresh"

59

]

60

};

61

62

mammoth.convertToHtml({path: "document.docx"}, options);

63

```

64

65

#### With Custom Image Handler

66

67

```javascript

68

const options = {

69

convertImage: mammoth.images.imgElement(function(image) {

70

return image.readAsBase64String().then(function(imageBuffer) {

71

return {

72

src: "data:" + image.contentType + ";base64," + imageBuffer

73

};

74

});

75

})

76

};

77

78

mammoth.convertToHtml({buffer: docxBuffer}, options);

79

```

80

81

## convertToMarkdown

82

83

Converts the source document to Markdown. **Note**: Markdown support is deprecated.

84

85

```javascript { .api }

86

function convertToMarkdown(input: Input, options?: Options): Promise<Result>;

87

```

88

89

### Parameters

90

91

Same as `convertToHtml`, but returns Markdown instead of HTML.

92

93

### Returns

94

95

Promise resolving to a Result object:

96

- `value`: The generated Markdown string

97

- `messages`: Array of warnings/errors during conversion

98

99

### Usage Example

100

101

```javascript

102

mammoth.convertToMarkdown({path: "document.docx"})

103

.then(function(result){

104

const markdown = result.value;

105

console.log(markdown);

106

});

107

```

108

109

## extractRawText

110

111

Extract the raw text of the document, ignoring all formatting. Each paragraph is followed by two newlines.

112

113

```javascript { .api }

114

function extractRawText(input: Input): Promise<Result>;

115

```

116

117

### Parameters

118

119

- `input`: Document input (same format as convertToHtml)

120

121

### Returns

122

123

Promise resolving to a Result object:

124

- `value`: The raw text string

125

- `messages`: Array of warnings/errors during extraction

126

127

### Usage Example

128

129

```javascript

130

mammoth.extractRawText({path: "document.docx"})

131

.then(function(result){

132

const text = result.value;

133

console.log(text);

134

});

135

```

136

137

## Style Mapping Syntax

138

139

Style mappings control how Word styles are converted to HTML elements:

140

141

```javascript

142

// Basic style mapping

143

"p[style-name='Heading 1'] => h1"

144

145

// With CSS classes

146

"p[style-name='Warning'] => p.warning"

147

148

// Fresh elements (avoid nested elements)

149

"p[style-name='Title'] => h1:fresh"

150

151

// Character styles

152

"r[style-name='Code'] => code"

153

154

// Bold/italic/underline

155

"b => strong"

156

"i => em"

157

"u => span.underline"

158

```

159

160

## Supported Features

161

162

- Headings (h1-h6)

163

- Lists (ordered and unordered)

164

- Tables (structure preserved, styling ignored)

165

- Footnotes and endnotes

166

- Images (with customizable handling)

167

- Bold, italic, underline, strikethrough

168

- Superscript and subscript

169

- Links

170

- Line breaks

171

- Text boxes

172

- Comments (when enabled via style mapping)

173

174

## Security Considerations

175

176

**Mammoth performs no sanitization of the source document** and should be used extremely carefully with untrusted user input. Source documents can contain:

177

178

- Links with `javascript:` targets

179

- References to external files

180

- Malicious content that could lead to XSS or file access vulnerabilities

181

182

Always sanitize the output HTML when embedding in web pages.