Tessl Tile for pypi/mammoth@1.10.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

conversion.md images.md index.md styles.md transforms.md writers.md

conversion.mddocs/

0
# Document Conversion
1

2
Core conversion functions for transforming DOCX files to HTML and Markdown formats. These functions provide comprehensive options for customization, style mapping, and output control.
3

4
## Capabilities
5

6
### HTML Conversion
7

8
Converts DOCX documents to clean, semantic HTML with support for headings, lists, tables, images, and extensive formatting options.
9

10
```python { .api }
11
def convert_to_html(fileobj, **kwargs):
12
    """
13
    Convert DOCX file to HTML format.
14
    
15
    Parameters:
16
    - fileobj: File object (opened DOCX file in binary mode)
17
    - style_map: str, custom style mapping rules
18
    - convert_image: function, custom image conversion function
19
    - ignore_empty_paragraphs: bool, whether to skip empty paragraphs (default: True)
20
    - id_prefix: str, prefix for HTML element IDs
21
    - include_embedded_style_map: bool, use embedded style maps (default: True)
22
    - include_default_style_map: bool, use built-in style mappings (default: True)
23
    
24
    Returns:
25
    Result object with .value (HTML string) and .messages (list of warnings)
26
    """
27
```
28

29
Usage example:
30

31
```python
32
import mammoth
33

34
# Basic HTML conversion
35
with open("document.docx", "rb") as docx_file:
36
    result = mammoth.convert_to_html(docx_file)
37
    html = result.value
38
    
39
# HTML conversion with custom options
40
with open("document.docx", "rb") as docx_file:
41
    result = mammoth.convert_to_html(
42
        docx_file,
43
        style_map="p.Heading1 => h1.custom-heading",
44
        id_prefix="doc-",
45
        ignore_empty_paragraphs=False
46
    )
47
```
48

49
### Markdown Conversion
50

51
Converts DOCX documents to clean Markdown format, preserving document structure and formatting in Markdown syntax.
52

53
```python { .api }
54
def convert_to_markdown(fileobj, **kwargs):
55
    """
56
    Convert DOCX file to Markdown format.
57
    
58
    Parameters: Same as convert_to_html()
59
    
60
    Returns:
61
    Result object with .value (Markdown string) and .messages (list of warnings)
62
    """
63
```
64

65
Usage example:
66

67
```python
68
import mammoth
69

70
# Basic Markdown conversion
71
with open("document.docx", "rb") as docx_file:
72
    result = mammoth.convert_to_markdown(docx_file)
73
    markdown = result.value
74
    
75
# Check for conversion warnings
76
if result.messages:
77
    for message in result.messages:
78
        print(f"{message.type}: {message.message}")
79
```
80

81
### Core Conversion Function
82

83
The underlying conversion function with full parameter control, supporting both HTML and Markdown output formats.
84

85
```python { .api }
86
def convert(fileobj, transform_document=None, id_prefix=None, 
87
           include_embedded_style_map=True, **kwargs):
88
    """
89
    Core conversion function with full parameter control.
90
    
91
    Parameters:
92
    - fileobj: File object containing DOCX data
93
    - transform_document: function, transforms document before conversion
94
    - id_prefix: str, prefix for HTML element IDs
95
    - include_embedded_style_map: bool, whether to use embedded style maps
96
    - output_format: str, "html" or "markdown"
97
    - style_map: str, custom style mapping string
98
    - convert_image: function, custom image conversion function
99
    - ignore_empty_paragraphs: bool, skip empty paragraphs (default: True)
100
    - include_default_style_map: bool, use built-in styles (default: True)
101
    
102
    Returns:
103
    Result object with converted content and messages
104
    """
105
```
106

107
Usage example:
108

109
```python
110
import mammoth
111

112
def custom_transform(document):
113
    # Custom document transformation
114
    return document
115

116
with open("document.docx", "rb") as docx_file:
117
    result = mammoth.convert(
118
        docx_file,
119
        output_format="html",
120
        transform_document=custom_transform,
121
        style_map="p.CustomStyle => div.special"
122
    )
123
```
124

125
### Text Extraction
126

127
Extracts plain text content from DOCX documents without formatting, useful for text analysis and processing.
128

129
```python { .api }
130
def extract_raw_text(fileobj):
131
    """
132
    Extract plain text from DOCX file.
133
    
134
    Parameters:
135
    - fileobj: File object (opened DOCX file in binary mode)
136
    
137
    Returns:
138
    Result object with .value (plain text string) and .messages (list)
139
    """
140
```
141

142
Usage example:
143

144
```python
145
import mammoth
146

147
with open("document.docx", "rb") as docx_file:
148
    result = mammoth.extract_raw_text(docx_file)
149
    text = result.value
150
    print(text)  # Plain text content
151
```
152

153
## Supported Options
154

155
All conversion functions accept these common options:
156

157
- **style_map**: Custom style mapping rules as a string
158
- **embedded_style_map**: Style map extracted from the DOCX file itself
159
- **include_default_style_map**: Whether to include built-in style mappings (default: True)
160
- **ignore_empty_paragraphs**: Whether to skip empty paragraph elements (default: True)
161
- **convert_image**: Custom function for handling image conversion
162
- **output_format**: Target format ("html" or "markdown")
163
- **id_prefix**: Prefix for generated HTML element IDs
164

165
## Error Handling
166

167
All conversion functions return Result objects that contain both the converted content and any warnings or errors encountered during processing:
168

169
```python
170
result = mammoth.convert_to_html(docx_file)
171

172
# Access the converted content
173
html = result.value
174

175
# Check for warnings or errors
176
for message in result.messages:
177
    if message.type == "error":
178
        print(f"Error: {message.message}")
179
    elif message.type == "warning":
180
        print(f"Warning: {message.message}")
181
```

Version

Tile

Files

conversion.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

conversion.mddocs/