or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

conversion.mdimages.mdindex.mdstyles.mdtransforms.mdwriters.md

images.mddocs/

0

# Image Handling

1

2

Functions for processing and converting images embedded in DOCX documents. Mammoth provides flexible image handling capabilities, including data URI conversion and support for custom image processing functions.

3

4

## Capabilities

5

6

### Image Element Decorator

7

8

Creates image conversion functions that produce HTML img elements with proper attributes and alt text handling.

9

10

```python { .api }

11

def img_element(func):

12

"""

13

Decorator that converts image conversion functions to HTML img elements.

14

15

Parameters:

16

- func: function, takes an image object and returns attributes dict

17

18

Returns:

19

Image conversion function that returns list of HTML img elements

20

"""

21

```

22

23

Usage example:

24

25

```python

26

import mammoth

27

28

@mammoth.images.img_element

29

def custom_image_handler(image):

30

return {

31

"src": f"/images/{image.filename}",

32

"class": "document-image"

33

}

34

35

# Use with conversion

36

with open("document.docx", "rb") as docx_file:

37

result = mammoth.convert_to_html(

38

docx_file,

39

convert_image=custom_image_handler

40

)

41

```

42

43

### Data URI Conversion

44

45

Converts images to base64 data URIs, embedding image data directly in the HTML output.

46

47

```python { .api }

48

def data_uri(image):

49

"""

50

Convert images to base64 data URIs.

51

52

Parameters:

53

- image: Image object with .open() method and .content_type property

54

55

Returns:

56

List containing HTML img element with data URI src

57

"""

58

```

59

60

Usage example:

61

62

```python

63

import mammoth

64

65

# Use data URI conversion for embedded images

66

with open("document.docx", "rb") as docx_file:

67

result = mammoth.convert_to_html(

68

docx_file,

69

convert_image=mammoth.images.data_uri

70

)

71

# Images will be embedded as data URIs in the HTML

72

```

73

74

### Inline Image Handler

75

76

Backwards compatibility alias for `img_element`. Retained for compatibility with version 0.3.x.

77

78

```python { .api }

79

inline = img_element # Alias for backwards compatibility

80

```

81

82

## Image Object Properties

83

84

When working with custom image handlers, image objects have these properties:

85

86

```python { .api }

87

class Image:

88

"""Image object passed to conversion functions."""

89

alt_text: str # Alternative text for the image

90

content_type: str # MIME type (e.g., "image/png", "image/jpeg")

91

92

def open(self):

93

"""

94

Open image data for reading.

95

96

Returns:

97

File-like object with image binary data

98

"""

99

```

100

101

## Custom Image Handling Examples

102

103

### Save Images to Files

104

105

```python

106

import mammoth

107

import os

108

from uuid import uuid4

109

110

@mammoth.images.img_element

111

def save_image_to_file(image):

112

# Generate unique filename

113

extension = {

114

"image/png": ".png",

115

"image/jpeg": ".jpg",

116

"image/gif": ".gif"

117

}.get(image.content_type, ".bin")

118

119

filename = f"image_{uuid4()}{extension}"

120

filepath = f"./images/{filename}"

121

122

# Ensure directory exists

123

os.makedirs("./images", exist_ok=True)

124

125

# Save image data

126

with image.open() as image_bytes:

127

with open(filepath, "wb") as f:

128

f.write(image_bytes.read())

129

130

return {"src": filepath}

131

132

# Use the custom handler

133

with open("document.docx", "rb") as docx_file:

134

result = mammoth.convert_to_html(

135

docx_file,

136

convert_image=save_image_to_file

137

)

138

```

139

140

### Remote Image Upload

141

142

```python

143

import mammoth

144

import requests

145

146

@mammoth.images.img_element

147

def upload_to_server(image):

148

# Upload image to remote server

149

with image.open() as image_bytes:

150

files = {"image": (f"image{extension}", image_bytes, image.content_type)}

151

response = requests.post("https://api.example.com/upload", files=files)

152

153

if response.status_code == 200:

154

image_url = response.json()["url"]

155

return {"src": image_url}

156

else:

157

# Fallback to data URI

158

return mammoth.images.data_uri(image)[0].attributes

159

160

# Use the upload handler

161

with open("document.docx", "rb") as docx_file:

162

result = mammoth.convert_to_html(

163

docx_file,

164

convert_image=upload_to_server

165

)

166

```

167

168

### Image Processing

169

170

```python

171

import mammoth

172

from PIL import Image as PILImage

173

import base64

174

import io

175

176

@mammoth.images.img_element

177

def resize_and_convert(image):

178

with image.open() as image_bytes:

179

# Open with PIL

180

pil_image = PILImage.open(image_bytes)

181

182

# Resize if too large

183

max_width = 800

184

if pil_image.width > max_width:

185

ratio = max_width / pil_image.width

186

new_height = int(pil_image.height * ratio)

187

pil_image = pil_image.resize((max_width, new_height))

188

189

# Convert to JPEG and encode as data URI

190

output = io.BytesIO()

191

pil_image.save(output, format="JPEG", quality=85)

192

encoded = base64.b64encode(output.getvalue()).decode("ascii")

193

194

return {"src": f"data:image/jpeg;base64,{encoded}"}

195

196

# Use the processing handler

197

with open("document.docx", "rb") as docx_file:

198

result = mammoth.convert_to_html(

199

docx_file,

200

convert_image=resize_and_convert

201

)

202

```

203

204

## Image Alt Text

205

206

Mammoth automatically preserves alt text from Word documents when available:

207

208

```python

209

@mammoth.images.img_element

210

def preserve_alt_text(image):

211

attributes = {"src": f"/images/{uuid4()}.jpg"}

212

213

# Alt text is automatically added by img_element decorator

214

# if image.alt_text is not None

215

216

return attributes

217

```

218

219

The `img_element` decorator automatically adds alt text to the generated img element if `image.alt_text` is available from the source document.