CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/npm-mammoth

Convert Word documents from docx to simple HTML and Markdown

Pending
Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Pending

The risk profile of this skill

Overview
Eval results
Files

transforms.mddocs/

Document Transforms

Document transformation utilities for modifying document elements before conversion, enabling custom preprocessing of document structure.

Note: The API for document transforms should be considered unstable and may change between versions. Pin to a specific version if you rely on this behavior.

transforms.paragraph

Apply a transformation to paragraph elements in the document.

function paragraph(transform: (element: any) => any): (element: any) => any;

Parameters

  • transform: Function that takes a paragraph element and returns the modified element

Returns

A transformation function that can be used with the transformDocument option.

Usage Example

const mammoth = require("mammoth");

function transformParagraph(element) {
    // Convert center-aligned paragraphs to headings
    if (element.alignment === "center" && !element.styleId) {
        return {...element, styleId: "Heading2"};
    }
    return element;
}

const options = {
    transformDocument: mammoth.transforms.paragraph(transformParagraph)
};

mammoth.convertToHtml({path: "document.docx"}, options);

transforms.run

Apply a transformation to run elements (text runs) in the document.

function run(transform: (element: any) => any): (element: any) => any;

Parameters

  • transform: Function that takes a run element and returns the modified element

Returns

A transformation function that can be used with the transformDocument option.

Usage Example

function transformRun(element) {
    // Convert runs with monospace font to code
    if (element.font && element.font.name === "Courier New") {
        return {...element, styleId: "Code"};
    }
    return element;
}

const options = {
    transformDocument: mammoth.transforms.run(transformRun)
};

transforms.getDescendants

Get all descendant elements from a document element.

function getDescendants(element: any): any[];

Parameters

  • element: The document element to traverse

Returns

Array of all descendant elements found in the element tree.

Usage Example

function analyzeDocument(documentElement) {
    const allDescendants = mammoth.transforms.getDescendants(documentElement);
    console.log(`Document contains ${allDescendants.length} elements`);
    
    allDescendants.forEach(function(descendant) {
        console.log(`Element type: ${descendant.type}`);
    });
}

transforms.getDescendantsOfType

Get all descendant elements of a specific type from a document element.

function getDescendantsOfType(element: any, type: string): any[];

Parameters

  • element: The document element to traverse
  • type: The element type to filter for (e.g., "paragraph", "run", "table")

Returns

Array of descendant elements matching the specified type.

Usage Example

function countParagraphs(documentElement) {
    const paragraphs = mammoth.transforms.getDescendantsOfType(documentElement, "paragraph");
    console.log(`Document contains ${paragraphs.length} paragraphs`);
    return paragraphs;
}

function findTables(documentElement) {
    const tables = mammoth.transforms.getDescendantsOfType(documentElement, "table");
    return tables;
}

Manual Element Transformation

For more complex transformations, you can write your own recursive transformation function:

function transformElement(element: any): any {
    if (element.children) {
        const children = element.children.map(transformElement);
        element = {...element, children: children};
    }
    
    // Apply specific transformations based on element type
    if (element.type === "paragraph") {
        return transformParagraph(element);
    } else if (element.type === "run") {
        return transformRun(element);
    }
    
    return element;
}

Usage Example

function transformElement(element) {
    // Recursively transform children first
    if (element.children) {
        const children = element.children.map(transformElement);
        element = {...element, children: children};
    }

    // Transform paragraphs
    if (element.type === "paragraph") {
        // Convert center-aligned paragraphs to headings
        if (element.alignment === "center" && !element.styleId) {
            return {...element, styleId: "Heading2"};
        }
        
        // Convert paragraphs with specific text patterns
        if (element.children && element.children.length > 0) {
            const text = element.children
                .filter(child => child.type === "text")
                .map(child => child.value)
                .join("");
            
            if (text.startsWith("TODO:")) {
                return {...element, styleId: "TodoItem"};
            }
        }
    }

    // Transform runs
    if (element.type === "run") {
        // Convert monospace font runs to code
        if (element.font && element.font.name === "Courier New") {
            return {...element, styleId: "Code"};
        }
    }

    return element;
}

const options = {
    transformDocument: transformElement
};

mammoth.convertToHtml({path: "document.docx"}, options);

Common Element Types

Document elements you might encounter during transformation:

  • "paragraph": Paragraph elements
  • "run": Text runs within paragraphs
  • "text": Text content
  • "table": Table elements
  • "table-row": Table row elements
  • "table-cell": Table cell elements
  • "hyperlink": Link elements
  • "image": Image elements
  • "line-break": Line break elements
  • "footnote-reference": Footnote references
  • "endnote-reference": Endnote references

Element Properties

Common properties found on document elements:

Paragraph Elements

  • type: "paragraph"
  • styleId: Style identifier from the document
  • styleName: Human-readable style name
  • alignment: Text alignment ("left", "center", "right", "justify")
  • children: Array of child elements

Run Elements

  • type: "run"
  • font: Font information object
  • isBold: Boolean indicating bold formatting
  • isItalic: Boolean indicating italic formatting
  • isUnderline: Boolean indicating underline formatting
  • isStrikethrough: Boolean indicating strikethrough formatting
  • verticalAlignment: "superscript" or "subscript"
  • children: Array of child elements (usually text)

Text Elements

  • type: "text"
  • value: The actual text content

docs

conversion.md

images.md

index.md

style-maps.md

styles.md

transforms.md

tile.json