0
# Style System
1
2
Comprehensive style mapping system for converting Word document styles to HTML elements. Mammoth's style system includes parsers, matchers, and embedded style map support for complex styling rules and customization.
3
4
## Capabilities
5
6
### Style Map Management
7
8
Functions for embedding and reading style maps directly in DOCX files.
9
10
```python { .api }
11
def embed_style_map(fileobj, style_map):
12
"""
13
Embed a style map directly into DOCX file.
14
15
Parameters:
16
- fileobj: DOCX file object (must be writable)
17
- style_map: str, style mapping rules as text
18
19
Note: Modifies the DOCX file to include the style map
20
"""
21
22
def read_embedded_style_map(fileobj):
23
"""
24
Read embedded style map from DOCX file.
25
26
Parameters:
27
- fileobj: DOCX file object
28
29
Returns:
30
str, style map text or None if no embedded map exists
31
"""
32
```
33
34
### Style Mapping Parser
35
36
Parse style mapping strings into internal representations.
37
38
```python { .api }
39
def read_style_mapping(string):
40
"""
41
Parse style mapping strings.
42
43
Parameters:
44
- string: str, style mapping text line
45
46
Returns:
47
Result object with parsed style mapping or warning
48
49
Raises:
50
LineParseError: When style mapping syntax is invalid
51
"""
52
53
class LineParseError(Exception):
54
"""Raised for style mapping parse errors."""
55
56
def style(document_matcher, html_path):
57
"""
58
Create style mapping from document matcher to HTML path.
59
60
Parameters:
61
- document_matcher: DocumentMatcher, matcher for document elements
62
- html_path: HtmlPath, target HTML structure
63
64
Returns:
65
Style named tuple
66
"""
67
```
68
69
### Document Matchers
70
71
Matchers for identifying specific document elements and formatting.
72
73
```python { .api }
74
def paragraph(style_id=None, style_name=None, numbering=None):
75
"""
76
Create paragraph matcher.
77
78
Parameters:
79
- style_id: str, Word style ID to match
80
- style_name: str, Word style name to match
81
- numbering: object, numbering level to match
82
83
Returns:
84
ParagraphMatcher instance
85
"""
86
87
def run(style_id=None, style_name=None):
88
"""
89
Create run matcher.
90
91
Parameters:
92
- style_id: str, Word style ID to match
93
- style_name: str, Word style name to match
94
95
Returns:
96
RunMatcher instance
97
"""
98
99
def table(style_id=None, style_name=None):
100
"""
101
Create table matcher.
102
103
Parameters:
104
- style_id: str, Word style ID to match
105
- style_name: str, Word style name to match
106
107
Returns:
108
TableMatcher instance
109
"""
110
111
def highlight(color=None):
112
"""
113
Create highlight matcher.
114
115
Parameters:
116
- color: str, highlight color to match (optional)
117
118
Returns:
119
HighlightMatcher instance
120
"""
121
```
122
123
### Formatting Matchers
124
125
Pre-defined matchers for common text formatting.
126
127
```python { .api }
128
# Formatting matcher constants
129
bold = BoldMatcher() # Matches bold formatting
130
italic = ItalicMatcher() # Matches italic formatting
131
underline = UnderlineMatcher() # Matches underline formatting
132
strikethrough = StrikethroughMatcher() # Matches strikethrough formatting
133
all_caps = AllCapsMatcher() # Matches all-caps formatting
134
small_caps = SmallCapsMatcher() # Matches small-caps formatting
135
comment_reference = CommentReferenceMatcher() # Matches comment references
136
```
137
138
### Break Matchers
139
140
Matchers for different types of document breaks.
141
142
```python { .api }
143
# Break matcher constants
144
line_break = LineBreakMatcher() # Matches line breaks
145
page_break = PageBreakMatcher() # Matches page breaks
146
column_break = ColumnBreakMatcher() # Matches column breaks
147
```
148
149
### String Matchers
150
151
Matchers for string comparison in style names and IDs.
152
153
```python { .api }
154
def equal_to(value):
155
"""
156
Create case-insensitive string equality matcher.
157
158
Parameters:
159
- value: str, string to match exactly (case-insensitive)
160
161
Returns:
162
StringMatcher instance
163
"""
164
165
def starts_with(value):
166
"""
167
Create case-insensitive string prefix matcher.
168
169
Parameters:
170
- value: str, prefix to match (case-insensitive)
171
172
Returns:
173
StringMatcher instance
174
"""
175
```
176
177
## HTML Path System
178
179
System for defining HTML output structures in style mappings.
180
181
```python { .api }
182
def path(elements):
183
"""
184
Create HTML path from elements.
185
186
Parameters:
187
- elements: list, HTML path elements
188
189
Returns:
190
HtmlPath instance
191
"""
192
193
def element(names, attributes=None, class_names=None,
194
fresh=None, separator=None):
195
"""
196
Create HTML path elements for style mapping.
197
198
Parameters:
199
- names: str or list, HTML element name(s)
200
- attributes: dict, HTML attributes
201
- class_names: list, CSS class names
202
- fresh: bool, whether element should be fresh (force new element)
203
- separator: str, separator for multiple elements
204
205
Returns:
206
HtmlPathElement instance
207
"""
208
209
# Special path constants
210
empty = EmptyPath() # Empty HTML path (no output)
211
ignore = IgnorePath() # Path that ignores/removes content
212
```
213
214
## Style Mapping Syntax
215
216
Mammoth uses a simple text-based syntax for style mappings:
217
218
### Basic Syntax
219
220
```
221
<document_matcher> => <html_path>
222
```
223
224
### Examples
225
226
```python
227
# Style mapping examples
228
style_map = """
229
# Headings
230
p.Heading1 => h1:fresh
231
p.Heading2 => h2:fresh
232
p[style-name='Custom Heading'] => h3.custom:fresh
233
234
# Text formatting
235
r.Strong => strong
236
r[style-name='Emphasis'] => em
237
238
# Lists
239
p:unordered-list(1) => ul > li:fresh
240
p:ordered-list(1) => ol > li:fresh
241
242
# Tables
243
table.CustomTable => table.custom-table
244
245
# Ignore unwanted content
246
r[style-name='Hidden'] =>
247
p.Footer =>
248
249
# Comments (lines starting with #)
250
# This is a comment and will be ignored
251
"""
252
253
# Use style map in conversion
254
with open("document.docx", "rb") as docx_file:
255
result = mammoth.convert_to_html(
256
docx_file,
257
style_map=style_map
258
)
259
```
260
261
### Document Matcher Syntax
262
263
- `p` - Paragraph elements
264
- `r` - Run elements
265
- `table` - Table elements
266
- `.StyleName` - Match by style name
267
- `[style-name='Style Name']` - Match by style name with spaces
268
- `[style-id='styleId']` - Match by style ID
269
- `:unordered-list(level)` - Match unordered list at level
270
- `:ordered-list(level)` - Match ordered list at level
271
272
### HTML Path Syntax
273
274
- `h1` - Create h1 element
275
- `h1.class-name` - Create h1 with CSS class
276
- `div.container > p` - Nested elements
277
- `:fresh` - Force new element creation
278
- `ul|ol` - Alternative elements
279
- Empty line or `=>` alone - Ignore content
280
281
## Default Style Map
282
283
Mammoth includes extensive built-in style mappings:
284
285
```python
286
# Built-in mappings include:
287
"""
288
# Standard headings
289
p.Heading1 => h1:fresh
290
p.Heading2 => h2:fresh
291
p.Heading3 => h3:fresh
292
p.Heading4 => h4:fresh
293
p.Heading5 => h5:fresh
294
p.Heading6 => h6:fresh
295
296
# Alternative heading formats
297
p[style-name='Heading 1'] => h1:fresh
298
p[style-name='heading 1'] => h1:fresh
299
300
# Apple Pages
301
p.Heading => h1:fresh
302
p[style-name='Heading'] => h1:fresh
303
304
# Lists with nesting
305
p:unordered-list(1) => ul > li:fresh
306
p:unordered-list(2) => ul|ol > li > ul > li:fresh
307
p:ordered-list(1) => ol > li:fresh
308
p:ordered-list(2) => ul|ol > li > ol > li:fresh
309
310
# Text formatting
311
r[style-name='Strong'] => strong
312
r[style-name='Hyperlink'] =>
313
314
# Notes
315
p[style-name='footnote text'] => p:fresh
316
r[style-name='footnote reference'] =>
317
p[style-name='endnote text'] => p:fresh
318
r[style-name='endnote reference'] =>
319
320
# Normal paragraphs
321
p[style-name='Normal'] => p:fresh
322
p.Body => p:fresh
323
"""
324
```
325
326
## Advanced Style Mapping
327
328
### Embedded Style Maps
329
330
```python
331
import mammoth
332
333
# Embed style map in DOCX file
334
style_map = "p.CustomStyle => div.special"
335
with open("document.docx", "r+b") as docx_file:
336
mammoth.embed_style_map(docx_file, style_map)
337
338
# Later, read embedded style map
339
with open("document.docx", "rb") as docx_file:
340
embedded_map = mammoth.read_embedded_style_map(docx_file)
341
print(embedded_map) # "p.CustomStyle => div.special"
342
```
343
344
### Custom Style Processing
345
346
```python
347
import mammoth
348
349
def process_options(options):
350
"""Process conversion options with custom style logic."""
351
result = mammoth.options.read_options(options)
352
353
if result.messages:
354
for message in result.messages:
355
print(f"Style warning: {message.message}")
356
357
return result
358
359
# Use custom options processing
360
options = {
361
"style_map": "p.Custom => div.processed",
362
"include_default_style_map": True
363
}
364
365
processed_options = process_options(options)
366
```
367
368
### Style Map Validation
369
370
```python
371
import mammoth
372
373
def validate_style_map(style_map_text):
374
"""Validate style mapping syntax."""
375
lines = style_map_text.strip().split('\n')
376
errors = []
377
378
for i, line in enumerate(lines, 1):
379
line = line.strip()
380
if line and not line.startswith('#'):
381
try:
382
result = mammoth.styles.parser.read_style_mapping(line)
383
if result.messages:
384
for msg in result.messages:
385
errors.append(f"Line {i}: {msg.message}")
386
except mammoth.styles.parser.LineParseError as e:
387
errors.append(f"Line {i}: {str(e)}")
388
389
return errors
390
391
# Validate before using
392
style_map = """
393
p.Heading1 => h1:fresh
394
invalid syntax here
395
p.Heading2 => h2:fresh
396
"""
397
398
errors = validate_style_map(style_map)
399
if errors:
400
for error in errors:
401
print(error)
402
```
403
404
## Options Processing
405
406
Functions for processing and validating conversion options.
407
408
```python { .api }
409
def read_options(options):
410
"""
411
Process and validate conversion options.
412
413
Parameters:
414
- options: dict, conversion options dictionary including:
415
- style_map: str, custom style mapping rules
416
- embedded_style_map: str, style map from DOCX file
417
- include_default_style_map: bool, use built-in styles (default: True)
418
- ignore_empty_paragraphs: bool, skip empty paragraphs (default: True)
419
- convert_image: function, custom image conversion function
420
- output_format: str, "html" or "markdown"
421
- id_prefix: str, prefix for HTML element IDs
422
423
Returns:
424
Result object with processed options dictionary
425
"""
426
```
427
428
Usage example:
429
430
```python
431
import mammoth
432
433
# Process options with validation
434
options = {
435
"style_map": "p.CustomHeading => h1.special",
436
"ignore_empty_paragraphs": False,
437
"include_default_style_map": True
438
}
439
440
result = mammoth.options.read_options(options)
441
if result.messages:
442
for message in result.messages:
443
print(f"Option warning: {message.message}")
444
445
processed_options = result.value
446
```