or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/npm-mammoth

Convert Word documents from docx to simple HTML and Markdown

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
npmpkg:npm/mammoth@1.10.x

To install, run

npx @tessl/cli install tessl/npm-mammoth@1.10.0

0

# Mammoth

1

2

Mammoth is designed to convert .docx documents, such as those created by Microsoft Word, Google Docs and LibreOffice, to HTML and Markdown formats. It focuses on semantic markup preservation rather than visual formatting, converting document styles (like Heading 1) to appropriate HTML elements (like h1 tags) while ignoring font styling details.

3

4

## Package Information

5

6

- **Package Name**: mammoth

7

- **Package Type**: npm

8

- **Language**: JavaScript with TypeScript definitions

9

- **Installation**: `npm install mammoth`

10

11

## Core Imports

12

13

```javascript

14

const mammoth = require("mammoth");

15

```

16

17

TypeScript:

18

19

```typescript

20

import mammoth = require("mammoth");

21

// or

22

const mammoth = require("mammoth");

23

```

24

25

Browser (standalone):

26

27

```javascript

28

// Include mammoth.browser.js or mammoth.browser.min.js

29

const mammoth = window.mammoth;

30

```

31

32

## Basic Usage

33

34

```javascript

35

const mammoth = require("mammoth");

36

37

// Convert DOCX to HTML

38

mammoth.convertToHtml({path: "document.docx"})

39

.then(function(result){

40

const html = result.value; // The generated HTML

41

const messages = result.messages; // Any messages, such as warnings

42

})

43

.catch(function(error) {

44

console.error(error);

45

});

46

47

// Extract raw text

48

mammoth.extractRawText({path: "document.docx"})

49

.then(function(result){

50

const text = result.value; // The raw text

51

const messages = result.messages;

52

});

53

```

54

55

## CLI Usage

56

57

Mammoth also provides a command-line interface:

58

59

```bash

60

# Convert DOCX to HTML

61

mammoth document.docx output.html

62

63

# Convert with style map

64

mammoth document.docx output.html --style-map=custom-style-map

65

66

# Convert to Markdown (deprecated)

67

mammoth document.docx --output-format=markdown

68

69

# Extract images to directory

70

mammoth document.docx --output-dir=output-dir

71

```

72

73

## Architecture

74

75

Mammoth is built around several key components:

76

77

- **Document Conversion**: Core DOCX to HTML/Markdown conversion with customizable style mappings

78

- **Image Processing**: Flexible image handling with built-in and custom converters

79

- **Document Transformation**: Pre-conversion document modification and element transforms

80

- **Style Mapping**: Custom styling rules for converting Word styles to HTML elements

81

82

## Capabilities

83

84

### Document Conversion

85

86

Core functionality for converting DOCX documents to HTML and Markdown formats, with support for custom style mappings and conversion options.

87

88

```javascript { .api }

89

function convertToHtml(input: Input, options?: Options): Promise<Result>;

90

function convertToMarkdown(input: Input, options?: Options): Promise<Result>;

91

function extractRawText(input: Input): Promise<Result>;

92

```

93

94

[Document Conversion](./conversion.md)

95

96

### Image Handling

97

98

Image conversion utilities for customizing how images in DOCX documents are processed and included in the output.

99

100

```javascript { .api }

101

const images: {

102

dataUri: ImageConverter;

103

imgElement: (func: (image: Image) => Promise<ImageAttributes>) => ImageConverter;

104

};

105

```

106

107

[Image Handling](./images.md)

108

109

### Document Transforms

110

111

Document transformation utilities for modifying document elements before conversion, enabling custom preprocessing of document structure.

112

113

```javascript { .api }

114

const transforms: {

115

paragraph: (transform: (element: any) => any) => (element: any) => any;

116

run: (transform: (element: any) => any) => (element: any) => any;

117

getDescendants: (element: any) => any[];

118

getDescendantsOfType: (element: any, type: string) => any[];

119

};

120

```

121

122

[Document Transforms](./transforms.md)

123

124

### Style Utilities

125

126

Utilities for handling underline and other styling elements in document conversion.

127

128

```javascript { .api }

129

const underline: {

130

element: (name: string) => (html: any) => any;

131

};

132

```

133

134

[Style Utilities](./styles.md)

135

136

### Style Map Management

137

138

Functions for embedding and reading custom style maps in DOCX documents.

139

140

```javascript { .api }

141

function embedStyleMap(input: Input, styleMap: string): Promise<{

142

toArrayBuffer: () => ArrayBuffer;

143

toBuffer: () => Buffer;

144

}>;

145

function readEmbeddedStyleMap(input: Input): Promise<string>;

146

```

147

148

[Style Map Management](./style-maps.md)

149

150

## Types

151

152

```javascript { .api }

153

type Input = PathInput | BufferInput | ArrayBufferInput;

154

155

interface PathInput {

156

path: string;

157

}

158

159

interface BufferInput {

160

buffer: Buffer;

161

}

162

163

interface ArrayBufferInput {

164

arrayBuffer: ArrayBuffer;

165

}

166

167

interface Options {

168

styleMap?: string | string[];

169

includeEmbeddedStyleMap?: boolean;

170

includeDefaultStyleMap?: boolean;

171

convertImage?: ImageConverter;

172

ignoreEmptyParagraphs?: boolean;

173

idPrefix?: string;

174

transformDocument?: (element: any) => any;

175

}

176

177

interface Result {

178

value: string;

179

messages: Message[];

180

}

181

182

type Message = Warning | Error;

183

184

interface Warning {

185

type: "warning";

186

message: string;

187

}

188

189

interface Error {

190

type: "error";

191

message: string;

192

error: unknown;

193

}

194

195

interface Image {

196

contentType: string;

197

readAsArrayBuffer(): Promise<ArrayBuffer>;

198

readAsBase64String(): Promise<string>;

199

readAsBuffer(): Promise<Buffer>;

200

read(): Promise<Buffer>;

201

read(encoding: string): Promise<string>;

202

}

203

204

interface ImageConverter {

205

__mammothBrand: "ImageConverter";

206

}

207

208

interface ImageAttributes {

209

src: string;

210

[key: string]: string;

211

}

212

```