or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

conversion.mdimages.mdindex.mdstyles.mdtransforms.mdwriters.md

styles.mddocs/

0

# Style System

1

2

Comprehensive style mapping system for converting Word document styles to HTML elements. Mammoth's style system includes parsers, matchers, and embedded style map support for complex styling rules and customization.

3

4

## Capabilities

5

6

### Style Map Management

7

8

Functions for embedding and reading style maps directly in DOCX files.

9

10

```python { .api }

11

def embed_style_map(fileobj, style_map):

12

"""

13

Embed a style map directly into DOCX file.

14

15

Parameters:

16

- fileobj: DOCX file object (must be writable)

17

- style_map: str, style mapping rules as text

18

19

Note: Modifies the DOCX file to include the style map

20

"""

21

22

def read_embedded_style_map(fileobj):

23

"""

24

Read embedded style map from DOCX file.

25

26

Parameters:

27

- fileobj: DOCX file object

28

29

Returns:

30

str, style map text or None if no embedded map exists

31

"""

32

```

33

34

### Style Mapping Parser

35

36

Parse style mapping strings into internal representations.

37

38

```python { .api }

39

def read_style_mapping(string):

40

"""

41

Parse style mapping strings.

42

43

Parameters:

44

- string: str, style mapping text line

45

46

Returns:

47

Result object with parsed style mapping or warning

48

49

Raises:

50

LineParseError: When style mapping syntax is invalid

51

"""

52

53

class LineParseError(Exception):

54

"""Raised for style mapping parse errors."""

55

56

def style(document_matcher, html_path):

57

"""

58

Create style mapping from document matcher to HTML path.

59

60

Parameters:

61

- document_matcher: DocumentMatcher, matcher for document elements

62

- html_path: HtmlPath, target HTML structure

63

64

Returns:

65

Style named tuple

66

"""

67

```

68

69

### Document Matchers

70

71

Matchers for identifying specific document elements and formatting.

72

73

```python { .api }

74

def paragraph(style_id=None, style_name=None, numbering=None):

75

"""

76

Create paragraph matcher.

77

78

Parameters:

79

- style_id: str, Word style ID to match

80

- style_name: str, Word style name to match

81

- numbering: object, numbering level to match

82

83

Returns:

84

ParagraphMatcher instance

85

"""

86

87

def run(style_id=None, style_name=None):

88

"""

89

Create run matcher.

90

91

Parameters:

92

- style_id: str, Word style ID to match

93

- style_name: str, Word style name to match

94

95

Returns:

96

RunMatcher instance

97

"""

98

99

def table(style_id=None, style_name=None):

100

"""

101

Create table matcher.

102

103

Parameters:

104

- style_id: str, Word style ID to match

105

- style_name: str, Word style name to match

106

107

Returns:

108

TableMatcher instance

109

"""

110

111

def highlight(color=None):

112

"""

113

Create highlight matcher.

114

115

Parameters:

116

- color: str, highlight color to match (optional)

117

118

Returns:

119

HighlightMatcher instance

120

"""

121

```

122

123

### Formatting Matchers

124

125

Pre-defined matchers for common text formatting.

126

127

```python { .api }

128

# Formatting matcher constants

129

bold = BoldMatcher() # Matches bold formatting

130

italic = ItalicMatcher() # Matches italic formatting

131

underline = UnderlineMatcher() # Matches underline formatting

132

strikethrough = StrikethroughMatcher() # Matches strikethrough formatting

133

all_caps = AllCapsMatcher() # Matches all-caps formatting

134

small_caps = SmallCapsMatcher() # Matches small-caps formatting

135

comment_reference = CommentReferenceMatcher() # Matches comment references

136

```

137

138

### Break Matchers

139

140

Matchers for different types of document breaks.

141

142

```python { .api }

143

# Break matcher constants

144

line_break = LineBreakMatcher() # Matches line breaks

145

page_break = PageBreakMatcher() # Matches page breaks

146

column_break = ColumnBreakMatcher() # Matches column breaks

147

```

148

149

### String Matchers

150

151

Matchers for string comparison in style names and IDs.

152

153

```python { .api }

154

def equal_to(value):

155

"""

156

Create case-insensitive string equality matcher.

157

158

Parameters:

159

- value: str, string to match exactly (case-insensitive)

160

161

Returns:

162

StringMatcher instance

163

"""

164

165

def starts_with(value):

166

"""

167

Create case-insensitive string prefix matcher.

168

169

Parameters:

170

- value: str, prefix to match (case-insensitive)

171

172

Returns:

173

StringMatcher instance

174

"""

175

```

176

177

## HTML Path System

178

179

System for defining HTML output structures in style mappings.

180

181

```python { .api }

182

def path(elements):

183

"""

184

Create HTML path from elements.

185

186

Parameters:

187

- elements: list, HTML path elements

188

189

Returns:

190

HtmlPath instance

191

"""

192

193

def element(names, attributes=None, class_names=None,

194

fresh=None, separator=None):

195

"""

196

Create HTML path elements for style mapping.

197

198

Parameters:

199

- names: str or list, HTML element name(s)

200

- attributes: dict, HTML attributes

201

- class_names: list, CSS class names

202

- fresh: bool, whether element should be fresh (force new element)

203

- separator: str, separator for multiple elements

204

205

Returns:

206

HtmlPathElement instance

207

"""

208

209

# Special path constants

210

empty = EmptyPath() # Empty HTML path (no output)

211

ignore = IgnorePath() # Path that ignores/removes content

212

```

213

214

## Style Mapping Syntax

215

216

Mammoth uses a simple text-based syntax for style mappings:

217

218

### Basic Syntax

219

220

```

221

<document_matcher> => <html_path>

222

```

223

224

### Examples

225

226

```python

227

# Style mapping examples

228

style_map = """

229

# Headings

230

p.Heading1 => h1:fresh

231

p.Heading2 => h2:fresh

232

p[style-name='Custom Heading'] => h3.custom:fresh

233

234

# Text formatting

235

r.Strong => strong

236

r[style-name='Emphasis'] => em

237

238

# Lists

239

p:unordered-list(1) => ul > li:fresh

240

p:ordered-list(1) => ol > li:fresh

241

242

# Tables

243

table.CustomTable => table.custom-table

244

245

# Ignore unwanted content

246

r[style-name='Hidden'] =>

247

p.Footer =>

248

249

# Comments (lines starting with #)

250

# This is a comment and will be ignored

251

"""

252

253

# Use style map in conversion

254

with open("document.docx", "rb") as docx_file:

255

result = mammoth.convert_to_html(

256

docx_file,

257

style_map=style_map

258

)

259

```

260

261

### Document Matcher Syntax

262

263

- `p` - Paragraph elements

264

- `r` - Run elements

265

- `table` - Table elements

266

- `.StyleName` - Match by style name

267

- `[style-name='Style Name']` - Match by style name with spaces

268

- `[style-id='styleId']` - Match by style ID

269

- `:unordered-list(level)` - Match unordered list at level

270

- `:ordered-list(level)` - Match ordered list at level

271

272

### HTML Path Syntax

273

274

- `h1` - Create h1 element

275

- `h1.class-name` - Create h1 with CSS class

276

- `div.container > p` - Nested elements

277

- `:fresh` - Force new element creation

278

- `ul|ol` - Alternative elements

279

- Empty line or `=>` alone - Ignore content

280

281

## Default Style Map

282

283

Mammoth includes extensive built-in style mappings:

284

285

```python

286

# Built-in mappings include:

287

"""

288

# Standard headings

289

p.Heading1 => h1:fresh

290

p.Heading2 => h2:fresh

291

p.Heading3 => h3:fresh

292

p.Heading4 => h4:fresh

293

p.Heading5 => h5:fresh

294

p.Heading6 => h6:fresh

295

296

# Alternative heading formats

297

p[style-name='Heading 1'] => h1:fresh

298

p[style-name='heading 1'] => h1:fresh

299

300

# Apple Pages

301

p.Heading => h1:fresh

302

p[style-name='Heading'] => h1:fresh

303

304

# Lists with nesting

305

p:unordered-list(1) => ul > li:fresh

306

p:unordered-list(2) => ul|ol > li > ul > li:fresh

307

p:ordered-list(1) => ol > li:fresh

308

p:ordered-list(2) => ul|ol > li > ol > li:fresh

309

310

# Text formatting

311

r[style-name='Strong'] => strong

312

r[style-name='Hyperlink'] =>

313

314

# Notes

315

p[style-name='footnote text'] => p:fresh

316

r[style-name='footnote reference'] =>

317

p[style-name='endnote text'] => p:fresh

318

r[style-name='endnote reference'] =>

319

320

# Normal paragraphs

321

p[style-name='Normal'] => p:fresh

322

p.Body => p:fresh

323

"""

324

```

325

326

## Advanced Style Mapping

327

328

### Embedded Style Maps

329

330

```python

331

import mammoth

332

333

# Embed style map in DOCX file

334

style_map = "p.CustomStyle => div.special"

335

with open("document.docx", "r+b") as docx_file:

336

mammoth.embed_style_map(docx_file, style_map)

337

338

# Later, read embedded style map

339

with open("document.docx", "rb") as docx_file:

340

embedded_map = mammoth.read_embedded_style_map(docx_file)

341

print(embedded_map) # "p.CustomStyle => div.special"

342

```

343

344

### Custom Style Processing

345

346

```python

347

import mammoth

348

349

def process_options(options):

350

"""Process conversion options with custom style logic."""

351

result = mammoth.options.read_options(options)

352

353

if result.messages:

354

for message in result.messages:

355

print(f"Style warning: {message.message}")

356

357

return result

358

359

# Use custom options processing

360

options = {

361

"style_map": "p.Custom => div.processed",

362

"include_default_style_map": True

363

}

364

365

processed_options = process_options(options)

366

```

367

368

### Style Map Validation

369

370

```python

371

import mammoth

372

373

def validate_style_map(style_map_text):

374

"""Validate style mapping syntax."""

375

lines = style_map_text.strip().split('\n')

376

errors = []

377

378

for i, line in enumerate(lines, 1):

379

line = line.strip()

380

if line and not line.startswith('#'):

381

try:

382

result = mammoth.styles.parser.read_style_mapping(line)

383

if result.messages:

384

for msg in result.messages:

385

errors.append(f"Line {i}: {msg.message}")

386

except mammoth.styles.parser.LineParseError as e:

387

errors.append(f"Line {i}: {str(e)}")

388

389

return errors

390

391

# Validate before using

392

style_map = """

393

p.Heading1 => h1:fresh

394

invalid syntax here

395

p.Heading2 => h2:fresh

396

"""

397

398

errors = validate_style_map(style_map)

399

if errors:

400

for error in errors:

401

print(error)

402

```

403

404

## Options Processing

405

406

Functions for processing and validating conversion options.

407

408

```python { .api }

409

def read_options(options):

410

"""

411

Process and validate conversion options.

412

413

Parameters:

414

- options: dict, conversion options dictionary including:

415

- style_map: str, custom style mapping rules

416

- embedded_style_map: str, style map from DOCX file

417

- include_default_style_map: bool, use built-in styles (default: True)

418

- ignore_empty_paragraphs: bool, skip empty paragraphs (default: True)

419

- convert_image: function, custom image conversion function

420

- output_format: str, "html" or "markdown"

421

- id_prefix: str, prefix for HTML element IDs

422

423

Returns:

424

Result object with processed options dictionary

425

"""

426

```

427

428

Usage example:

429

430

```python

431

import mammoth

432

433

# Process options with validation

434

options = {

435

"style_map": "p.CustomHeading => h1.special",

436

"ignore_empty_paragraphs": False,

437

"include_default_style_map": True

438

}

439

440

result = mammoth.options.read_options(options)

441

if result.messages:

442

for message in result.messages:

443

print(f"Option warning: {message.message}")

444

445

processed_options = result.value

446

```