0
# Image Handling
1
2
Functions for processing and converting images embedded in DOCX documents. Mammoth provides flexible image handling capabilities, including data URI conversion and support for custom image processing functions.
3
4
## Capabilities
5
6
### Image Element Decorator
7
8
Creates image conversion functions that produce HTML img elements with proper attributes and alt text handling.
9
10
```python { .api }
11
def img_element(func):
12
"""
13
Decorator that converts image conversion functions to HTML img elements.
14
15
Parameters:
16
- func: function, takes an image object and returns attributes dict
17
18
Returns:
19
Image conversion function that returns list of HTML img elements
20
"""
21
```
22
23
Usage example:
24
25
```python
26
import mammoth
27
28
@mammoth.images.img_element
29
def custom_image_handler(image):
30
return {
31
"src": f"/images/{image.filename}",
32
"class": "document-image"
33
}
34
35
# Use with conversion
36
with open("document.docx", "rb") as docx_file:
37
result = mammoth.convert_to_html(
38
docx_file,
39
convert_image=custom_image_handler
40
)
41
```
42
43
### Data URI Conversion
44
45
Converts images to base64 data URIs, embedding image data directly in the HTML output.
46
47
```python { .api }
48
def data_uri(image):
49
"""
50
Convert images to base64 data URIs.
51
52
Parameters:
53
- image: Image object with .open() method and .content_type property
54
55
Returns:
56
List containing HTML img element with data URI src
57
"""
58
```
59
60
Usage example:
61
62
```python
63
import mammoth
64
65
# Use data URI conversion for embedded images
66
with open("document.docx", "rb") as docx_file:
67
result = mammoth.convert_to_html(
68
docx_file,
69
convert_image=mammoth.images.data_uri
70
)
71
# Images will be embedded as data URIs in the HTML
72
```
73
74
### Inline Image Handler
75
76
Backwards compatibility alias for `img_element`. Retained for compatibility with version 0.3.x.
77
78
```python { .api }
79
inline = img_element # Alias for backwards compatibility
80
```
81
82
## Image Object Properties
83
84
When working with custom image handlers, image objects have these properties:
85
86
```python { .api }
87
class Image:
88
"""Image object passed to conversion functions."""
89
alt_text: str # Alternative text for the image
90
content_type: str # MIME type (e.g., "image/png", "image/jpeg")
91
92
def open(self):
93
"""
94
Open image data for reading.
95
96
Returns:
97
File-like object with image binary data
98
"""
99
```
100
101
## Custom Image Handling Examples
102
103
### Save Images to Files
104
105
```python
106
import mammoth
107
import os
108
from uuid import uuid4
109
110
@mammoth.images.img_element
111
def save_image_to_file(image):
112
# Generate unique filename
113
extension = {
114
"image/png": ".png",
115
"image/jpeg": ".jpg",
116
"image/gif": ".gif"
117
}.get(image.content_type, ".bin")
118
119
filename = f"image_{uuid4()}{extension}"
120
filepath = f"./images/{filename}"
121
122
# Ensure directory exists
123
os.makedirs("./images", exist_ok=True)
124
125
# Save image data
126
with image.open() as image_bytes:
127
with open(filepath, "wb") as f:
128
f.write(image_bytes.read())
129
130
return {"src": filepath}
131
132
# Use the custom handler
133
with open("document.docx", "rb") as docx_file:
134
result = mammoth.convert_to_html(
135
docx_file,
136
convert_image=save_image_to_file
137
)
138
```
139
140
### Remote Image Upload
141
142
```python
143
import mammoth
144
import requests
145
146
@mammoth.images.img_element
147
def upload_to_server(image):
148
# Upload image to remote server
149
with image.open() as image_bytes:
150
files = {"image": (f"image{extension}", image_bytes, image.content_type)}
151
response = requests.post("https://api.example.com/upload", files=files)
152
153
if response.status_code == 200:
154
image_url = response.json()["url"]
155
return {"src": image_url}
156
else:
157
# Fallback to data URI
158
return mammoth.images.data_uri(image)[0].attributes
159
160
# Use the upload handler
161
with open("document.docx", "rb") as docx_file:
162
result = mammoth.convert_to_html(
163
docx_file,
164
convert_image=upload_to_server
165
)
166
```
167
168
### Image Processing
169
170
```python
171
import mammoth
172
from PIL import Image as PILImage
173
import base64
174
import io
175
176
@mammoth.images.img_element
177
def resize_and_convert(image):
178
with image.open() as image_bytes:
179
# Open with PIL
180
pil_image = PILImage.open(image_bytes)
181
182
# Resize if too large
183
max_width = 800
184
if pil_image.width > max_width:
185
ratio = max_width / pil_image.width
186
new_height = int(pil_image.height * ratio)
187
pil_image = pil_image.resize((max_width, new_height))
188
189
# Convert to JPEG and encode as data URI
190
output = io.BytesIO()
191
pil_image.save(output, format="JPEG", quality=85)
192
encoded = base64.b64encode(output.getvalue()).decode("ascii")
193
194
return {"src": f"data:image/jpeg;base64,{encoded}"}
195
196
# Use the processing handler
197
with open("document.docx", "rb") as docx_file:
198
result = mammoth.convert_to_html(
199
docx_file,
200
convert_image=resize_and_convert
201
)
202
```
203
204
## Image Alt Text
205
206
Mammoth automatically preserves alt text from Word documents when available:
207
208
```python
209
@mammoth.images.img_element
210
def preserve_alt_text(image):
211
attributes = {"src": f"/images/{uuid4()}.jpg"}
212
213
# Alt text is automatically added by img_element decorator
214
# if image.alt_text is not None
215
216
return attributes
217
```
218
219
The `img_element` decorator automatically adds alt text to the generated img element if `image.alt_text` is available from the source document.