CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/npm-mammoth

Convert Word documents from docx to simple HTML and Markdown

Pending
Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Pending

The risk profile of this skill

Overview
Eval results
Files

conversion.mddocs/

Document Conversion

Core functionality for converting DOCX documents to HTML and Markdown formats, with support for custom style mappings and conversion options.

convertToHtml

Converts the source document to HTML.

function convertToHtml(input: Input, options?: Options): Promise<Result>;

Parameters

  • input: Document input - can be a file path, Buffer, or ArrayBuffer

    • {path: string} - Path to the .docx file (Node.js)
    • {buffer: Buffer} - Buffer containing .docx file (Node.js)
    • {arrayBuffer: ArrayBuffer} - ArrayBuffer containing .docx file (Browser)
  • options (optional): Conversion options

    • styleMap: Custom style mappings (string or string array)
    • includeEmbeddedStyleMap: Include embedded style maps (default: true)
    • includeDefaultStyleMap: Include default style mappings (default: true)
    • convertImage: Custom image converter function
    • ignoreEmptyParagraphs: Ignore empty paragraphs (default: true)
    • idPrefix: Prefix for generated IDs (default: "")
    • transformDocument: Document transformation function

Returns

Promise resolving to a Result object:

  • value: The generated HTML string
  • messages: Array of warnings/errors during conversion

Usage Examples

Basic HTML Conversion

const mammoth = require("mammoth");

mammoth.convertToHtml({path: "document.docx"})
    .then(function(result){
        const html = result.value;
        const messages = result.messages;
        console.log(html);
    })
    .catch(function(error) {
        console.error(error);
    });

With Custom Style Mapping

const options = {
    styleMap: [
        "p[style-name='Section Title'] => h1:fresh",
        "p[style-name='Subsection Title'] => h2:fresh"
    ]
};

mammoth.convertToHtml({path: "document.docx"}, options);

With Custom Image Handler

const options = {
    convertImage: mammoth.images.imgElement(function(image) {
        return image.readAsBase64String().then(function(imageBuffer) {
            return {
                src: "data:" + image.contentType + ";base64," + imageBuffer
            };
        });
    })
};

mammoth.convertToHtml({buffer: docxBuffer}, options);

convertToMarkdown

Converts the source document to Markdown. Note: Markdown support is deprecated.

function convertToMarkdown(input: Input, options?: Options): Promise<Result>;

Parameters

Same as convertToHtml, but returns Markdown instead of HTML.

Returns

Promise resolving to a Result object:

  • value: The generated Markdown string
  • messages: Array of warnings/errors during conversion

Usage Example

mammoth.convertToMarkdown({path: "document.docx"})
    .then(function(result){
        const markdown = result.value;
        console.log(markdown);
    });

extractRawText

Extract the raw text of the document, ignoring all formatting. Each paragraph is followed by two newlines.

function extractRawText(input: Input): Promise<Result>;

Parameters

  • input: Document input (same format as convertToHtml)

Returns

Promise resolving to a Result object:

  • value: The raw text string
  • messages: Array of warnings/errors during extraction

Usage Example

mammoth.extractRawText({path: "document.docx"})
    .then(function(result){
        const text = result.value;
        console.log(text);
    });

Style Mapping Syntax

Style mappings control how Word styles are converted to HTML elements:

// Basic style mapping
"p[style-name='Heading 1'] => h1"

// With CSS classes
"p[style-name='Warning'] => p.warning"

// Fresh elements (avoid nested elements)
"p[style-name='Title'] => h1:fresh"

// Character styles
"r[style-name='Code'] => code"

// Bold/italic/underline
"b => strong"
"i => em"
"u => span.underline"

Supported Features

  • Headings (h1-h6)
  • Lists (ordered and unordered)
  • Tables (structure preserved, styling ignored)
  • Footnotes and endnotes
  • Images (with customizable handling)
  • Bold, italic, underline, strikethrough
  • Superscript and subscript
  • Links
  • Line breaks
  • Text boxes
  • Comments (when enabled via style mapping)

Security Considerations

Mammoth performs no sanitization of the source document and should be used extremely carefully with untrusted user input. Source documents can contain:

  • Links with javascript: targets
  • References to external files
  • Malicious content that could lead to XSS or file access vulnerabilities

Always sanitize the output HTML when embedding in web pages.

docs

conversion.md

images.md

index.md

style-maps.md

styles.md

transforms.md

tile.json