Convert Word documents from docx to simple HTML and Markdown
—
Pending
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Pending
The risk profile of this skill
Functions for embedding and reading custom style maps in DOCX documents.
Embed a style map into a DOCX file, creating a new DOCX file with the embedded style map that will be automatically used when the document is processed by Mammoth.
function embedStyleMap(input: Input, styleMap: string): Promise<{
toArrayBuffer: () => ArrayBuffer;
toBuffer: () => Buffer;
}>;input: Document input
{path: string} - Path to the .docx file (Node.js){buffer: Buffer} - Buffer containing .docx file (Node.js){arrayBuffer: ArrayBuffer} - ArrayBuffer containing .docx file (Browser)styleMap: Style map string containing the mapping rules
Promise resolving to an object with methods to access the new document:
toArrayBuffer(): Get the new document as an ArrayBuffertoBuffer(): Get the new document as a Buffer (Node.js only)const mammoth = require("mammoth");
const fs = require("fs");
const styleMap = `
p[style-name='Section Title'] => h1:fresh
p[style-name='Subsection Title'] => h2:fresh
p[style-name='Code Block'] => pre
r[style-name='Code'] => code
`;
mammoth.embedStyleMap({path: "source.docx"}, styleMap)
.then(function(docx) {
return new Promise(function(resolve, reject) {
fs.writeFile("output-with-styles.docx", docx.toBuffer(), function(err) {
if (err) reject(err);
else resolve();
});
});
})
.then(function() {
console.log("Style map embedded successfully");
});// In browser environment
function embedStylesInBrowser(fileArrayBuffer) {
const styleMap = "p[style-name='Title'] => h1:fresh";
return mammoth.embedStyleMap({arrayBuffer: fileArrayBuffer}, styleMap)
.then(function(docx) {
const newArrayBuffer = docx.toArrayBuffer();
// Create download link
const blob = new Blob([newArrayBuffer], {
type: 'application/vnd.openxmlformats-officedocument.wordprocessingml.document'
});
const url = URL.createObjectURL(blob);
const a = document.createElement('a');
a.href = url;
a.download = 'document-with-styles.docx';
a.click();
});
}Read the embedded style map from a DOCX document that was previously created with embedStyleMap.
function readEmbeddedStyleMap(input: Input): Promise<string>;input: Document input (same format as other mammoth functions)Promise resolving to a string containing the embedded style map, or empty string if no embedded style map is found.
const mammoth = require("mammoth");
mammoth.readEmbeddedStyleMap({path: "document-with-styles.docx"})
.then(function(styleMap) {
if (styleMap) {
console.log("Found embedded style map:");
console.log(styleMap);
} else {
console.log("No embedded style map found");
}
});Complete workflow showing how to create a document with embedded styles and then use it:
const mammoth = require("mammoth");
const fs = require("fs");
// Step 1: Define custom style map
const customStyleMap = `
# Custom styles for corporate documents
p[style-name='Corporate Header'] => h1.corporate-header
p[style-name='Section Header'] => h2.section-header
p[style-name='Highlight Box'] => div.highlight-box > p:fresh
r[style-name='Brand Name'] => span.brand-name
r[style-name='Important'] => strong.important
`;
// Step 2: Embed style map into document
function createStyledDocument() {
return mammoth.embedStyleMap({path: "template.docx"}, customStyleMap)
.then(function(docx) {
return new Promise(function(resolve, reject) {
fs.writeFile("styled-template.docx", docx.toBuffer(), function(err) {
if (err) reject(err);
else resolve("styled-template.docx");
});
});
});
}
// Step 3: Use document with embedded styles
function convertStyledDocument(filePath) {
// The embedded style map will be automatically used
return mammoth.convertToHtml({path: filePath})
.then(function(result) {
console.log("Converted with embedded styles:");
console.log(result.value);
return result;
});
}
// Step 4: Verify embedded styles
function verifyEmbeddedStyles(filePath) {
return mammoth.readEmbeddedStyleMap({path: filePath})
.then(function(styleMap) {
console.log("Embedded style map:");
console.log(styleMap);
return styleMap;
});
}
// Complete workflow
createStyledDocument()
.then(convertStyledDocument)
.then(() => verifyEmbeddedStyles("styled-template.docx"))
.catch(console.error);The style map string uses the same format as the styleMap option in conversion functions:
const styleMap = `
p[style-name='Heading 1'] => h1
p[style-name='Heading 2'] => h2
r[style-name='Code'] => code
b => strong
i => em
`;const advancedStyleMap = `
# Comments start with #
# This is ignored
# Paragraph styles
p[style-name='Title'] => h1:fresh
p[style-name='Subtitle'] => h2.subtitle:fresh
# Character styles
r[style-name='Highlight'] => span.highlight
r[style-name='Code'] => code
# Text formatting overrides
b => strong
i => em
u => span.underline
strike => del
# Special selectors
p:unordered-list(1) => ul > li:fresh
p:ordered-list(1) => ol > li:fresh
`;Embedded style maps work alongside conversion options:
// The embedded style map will be combined with these options
const options = {
includeEmbeddedStyleMap: true, // Include embedded styles (default)
includeDefaultStyleMap: true, // Include default mappings (default)
styleMap: [ // Additional style mappings
"p[style-name='Special'] => div.special"
]
};
mammoth.convertToHtml({path: "styled-document.docx"}, options);Style maps are applied in this order (highest to lowest priority):
options.styleMap - Style mappings passed to conversion functions// Ignore embedded style map
const options = {
includeEmbeddedStyleMap: false
};
mammoth.convertToHtml({path: "document.docx"}, options);mammoth.embedStyleMap({path: "nonexistent.docx"}, "p => h1")
.catch(function(error) {
console.error("Failed to embed style map:", error.message);
});
mammoth.readEmbeddedStyleMap({path: "document.docx"})
.then(function(styleMap) {
if (!styleMap) {
console.log("No embedded style map found");
}
})
.catch(function(error) {
console.error("Failed to read embedded style map:", error.message);
});