Convert Word documents from docx to simple HTML and Markdown
—
Pending
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Pending
The risk profile of this skill
Image conversion utilities for customizing how images in DOCX documents are processed and included in the output.
Creates an image converter that generates <img> elements for each image in the original DOCX.
function imgElement(func: (image: Image) => Promise<ImageAttributes>): ImageConverter;func: Function that processes an image and returns attributes for the <img> element
Image object with image data and metadatasrc attributeImageConverter object that can be used with the convertImage option.
const mammoth = require("mammoth");
const options = {
convertImage: mammoth.images.imgElement(function(image) {
return image.readAsBase64String().then(function(imageBuffer) {
return {
src: "data:" + image.contentType + ";base64," + imageBuffer,
alt: "Image from document"
};
});
})
};
mammoth.convertToHtml({path: "document.docx"}, options);const fs = require("fs");
const path = require("path");
let imageIndex = 0;
const options = {
convertImage: mammoth.images.imgElement(function(image) {
imageIndex++;
const extension = image.contentType.split("/")[1];
const filename = `image-${imageIndex}.${extension}`;
const imagePath = path.join("./images", filename);
return image.readAsBuffer().then(function(imageBuffer) {
return new Promise(function(resolve, reject) {
fs.writeFile(imagePath, imageBuffer, function(err) {
if (err) reject(err);
else resolve({ src: `./images/${filename}` });
});
});
});
})
};Default image converter that embeds images as data URIs in the HTML output.
const dataUri: ImageConverter;This is equivalent to:
mammoth.images.imgElement(function(image) {
return image.readAsBase64String().then(function(imageBuffer) {
return {
src: "data:" + image.contentType + ";base64," + imageBuffer
};
});
})// This is the default behavior, so no explicit configuration needed
mammoth.convertToHtml({path: "document.docx"});
// Or explicitly specify:
const options = {
convertImage: mammoth.images.dataUri
};
mammoth.convertToHtml({path: "document.docx"}, options);The Image object passed to image converter functions provides access to image data and metadata.
interface Image {
contentType: string;
readAsArrayBuffer(): Promise<ArrayBuffer>;
readAsBase64String(): Promise<string>;
readAsBuffer(): Promise<Buffer>;
read(): Promise<Buffer>;
read(encoding: string): Promise<string>;
}contentType: MIME type of the image (e.g., "image/png", "image/jpeg")readAsArrayBuffer(): Read image as ArrayBuffer (browser-compatible)readAsBase64String(): Read image as base64-encoded stringreadAsBuffer(): Read image as Node.js Buffer (Node.js only)read(): Read image as Buffer (deprecated, use readAsBuffer)read(encoding): Read image as string with specified encoding (deprecated)const imageConverter = mammoth.images.imgElement(function(image) {
console.log("Image type:", image.contentType);
// For data URIs (most common)
return image.readAsBase64String().then(function(base64) {
return { src: `data:${image.contentType};base64,${base64}` };
});
// For saving to files
// return image.readAsBuffer().then(function(buffer) {
// // Save buffer to file...
// return { src: "/path/to/saved/image.png" };
// });
// For browser environments
// return image.readAsArrayBuffer().then(function(arrayBuffer) {
// // Process arrayBuffer...
// return { src: "..." };
// });
});The object returned by image converter functions becomes attributes for the generated <img> element.
interface ImageAttributes {
src: string;
[key: string]: string;
}const imageConverter = mammoth.images.imgElement(function(image) {
return image.readAsBase64String().then(function(imageBuffer) {
return {
src: "data:" + image.contentType + ";base64," + imageBuffer,
alt: "Document image",
class: "document-image",
width: "300",
height: "200"
};
});
});If alt text is found for an image in the original document, it will be automatically added to the element's attributes, even if not specified in the image converter function.
// Alt text from the document is automatically preserved
const imageConverter = mammoth.images.imgElement(function(image) {
return {
src: "data:" + image.contentType + ";base64," + imageBuffer
// Alt text will be added automatically if present in the document
};
});