tessl/npm-mammoth

Convert Word documents from docx to simple HTML and Markdown

—

Pending

Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

Securityby

Pending

The risk profile of this skill

Overview

Eval results

Files

Document Conversion

Name: tessl/npm-mammoth
Author: tessl

Core functionality for converting DOCX documents to HTML and Markdown formats, with support for custom style mappings and conversion options.

convertToHtml

Converts the source document to HTML.

function convertToHtml(input: Input, options?: Options): Promise<Result>;

Parameters

input: Document input - can be a file path, Buffer, or ArrayBuffer
- {path: string} - Path to the .docx file (Node.js)
- {buffer: Buffer} - Buffer containing .docx file (Node.js)
- {arrayBuffer: ArrayBuffer} - ArrayBuffer containing .docx file (Browser)
options (optional): Conversion options
- styleMap: Custom style mappings (string or string array)
- includeEmbeddedStyleMap: Include embedded style maps (default: true)
- includeDefaultStyleMap: Include default style mappings (default: true)
- convertImage: Custom image converter function
- ignoreEmptyParagraphs: Ignore empty paragraphs (default: true)
- idPrefix: Prefix for generated IDs (default: "")
- transformDocument: Document transformation function

Returns

Promise resolving to a Result object:

value: The generated HTML string
messages: Array of warnings/errors during conversion

Usage Examples

Basic HTML Conversion

const mammoth = require("mammoth");

mammoth.convertToHtml({path: "document.docx"})
    .then(function(result){
        const html = result.value;
        const messages = result.messages;
        console.log(html);
    })
    .catch(function(error) {
        console.error(error);
    });

With Custom Style Mapping

const options = {
    styleMap: [
        "p[style-name='Section Title'] => h1:fresh",
        "p[style-name='Subsection Title'] => h2:fresh"
    ]
};

mammoth.convertToHtml({path: "document.docx"}, options);

With Custom Image Handler

const options = {
    convertImage: mammoth.images.imgElement(function(image) {
        return image.readAsBase64String().then(function(imageBuffer) {
            return {
                src: "data:" + image.contentType + ";base64," + imageBuffer
            };
        });
    })
};

mammoth.convertToHtml({buffer: docxBuffer}, options);

convertToMarkdown

Converts the source document to Markdown. Note: Markdown support is deprecated.

function convertToMarkdown(input: Input, options?: Options): Promise<Result>;

Parameters

Same as convertToHtml, but returns Markdown instead of HTML.

Returns

Promise resolving to a Result object:

value: The generated Markdown string
messages: Array of warnings/errors during conversion

Usage Example

mammoth.convertToMarkdown({path: "document.docx"})
    .then(function(result){
        const markdown = result.value;
        console.log(markdown);
    });

extractRawText

Extract the raw text of the document, ignoring all formatting. Each paragraph is followed by two newlines.

function extractRawText(input: Input): Promise<Result>;

Parameters

input: Document input (same format as convertToHtml)

Returns

Promise resolving to a Result object:

value: The raw text string
messages: Array of warnings/errors during extraction

Usage Example

mammoth.extractRawText({path: "document.docx"})
    .then(function(result){
        const text = result.value;
        console.log(text);
    });

Style Mapping Syntax

Style mappings control how Word styles are converted to HTML elements:

// Basic style mapping
"p[style-name='Heading 1'] => h1"

// With CSS classes
"p[style-name='Warning'] => p.warning"

// Fresh elements (avoid nested elements)
"p[style-name='Title'] => h1:fresh"

// Character styles
"r[style-name='Code'] => code"

// Bold/italic/underline
"b => strong"
"i => em"
"u => span.underline"

Supported Features

Headings (h1-h6)
Lists (ordered and unordered)
Tables (structure preserved, styling ignored)
Footnotes and endnotes
Images (with customizable handling)
Bold, italic, underline, strikethrough
Superscript and subscript
Links
Line breaks
Text boxes
Comments (when enabled via style mapping)

Security Considerations

Mammoth performs no sanitization of the source document and should be used extremely carefully with untrusted user input. Source documents can contain: