0
# Configuration Options
1
2
Comprehensive formatting and behavior configuration for customizing HTML to text conversion. All options can be set on HTML2Text instances to control output formatting, link handling, table processing, and text styling.
3
4
## Capabilities
5
6
### Link and Image Configuration
7
8
Control how links and images are processed and formatted in the output.
9
10
```python { .api }
11
# Link handling options
12
ignore_links: bool = False
13
"""Skip all link formatting, treating links as plain text."""
14
15
ignore_mailto_links: bool = False
16
"""Skip mailto: links while processing other links normally."""
17
18
inline_links: bool = True
19
"""Use inline [text](url) format vs reference-style [text][1] links."""
20
21
protect_links: bool = False
22
"""Wrap links with angle brackets <url> to prevent line breaks."""
23
24
skip_internal_links: bool = True
25
"""Skip internal anchor links (href="#section")."""
26
27
links_each_paragraph: bool = False
28
"""Place reference links after each paragraph instead of document end."""
29
30
use_automatic_links: bool = True
31
"""Convert URLs that match link text to automatic <url> format."""
32
33
wrap_links: bool = True
34
"""Allow wrapping of long links across multiple lines."""
35
36
# Image handling options
37
ignore_images: bool = False
38
"""Skip all image formatting, removing images from output."""
39
40
images_as_html: bool = False
41
"""Output images as raw HTML tags preserving attributes."""
42
43
images_to_alt: bool = False
44
"""Replace images with alt text only, discarding image references."""
45
46
images_with_size: bool = False
47
"""Include width/height attributes when outputting images as HTML."""
48
49
default_image_alt: str = ""
50
"""Default alt text for images missing alt attributes."""
51
```
52
53
### Text Formatting Configuration
54
55
Control text wrapping, character handling, and emphasis formatting.
56
57
```python { .api }
58
# Text wrapping and layout
59
body_width: int = 78
60
"""Maximum line width for text wrapping. Set to 0 for no wrapping."""
61
62
single_line_break: bool = False
63
"""Use single line breaks after block elements instead of double."""
64
65
wrap_list_items: bool = False
66
"""Allow wrapping of list items across multiple lines."""
67
68
# Character and emphasis handling
69
unicode_snob: bool = False
70
"""Use Unicode characters instead of ASCII replacements (e.g., → vs ->)."""
71
72
escape_snob: bool = False
73
"""Escape all special characters for safer but less readable output."""
74
75
ignore_emphasis: bool = False
76
"""Skip all emphasis formatting (bold, italic, etc.)."""
77
78
# Emphasis markers
79
ul_item_mark: str = "*"
80
"""Character used for unordered list items. Common: "*", "-", "+"."""
81
82
emphasis_mark: str = "_"
83
"""Character used for italic emphasis. Common: "_", "*"."""
84
85
strong_mark: str = "**"
86
"""Character sequence used for bold emphasis."""
87
88
# Quote handling
89
open_quote: str = '"'
90
"""Character used to open quotes from <q> tags."""
91
92
close_quote: str = '"'
93
"""Character used to close quotes from <q> tags."""
94
```
95
96
### Table Configuration
97
98
Control table processing and formatting options.
99
100
```python { .api }
101
bypass_tables: bool = False
102
"""Format tables as raw HTML instead of Markdown table syntax."""
103
104
ignore_tables: bool = False
105
"""Skip table formatting entirely, treating as plain text."""
106
107
pad_tables: bool = False
108
"""Pad table cells to equal column width for aligned appearance."""
109
110
wrap_tables: bool = False
111
"""Allow wrapping of table content across multiple lines."""
112
```
113
114
### Code and Preformatted Text
115
116
Control handling of code blocks and preformatted content.
117
118
```python { .api }
119
mark_code: bool = False
120
"""Mark code blocks with [code]...[/code] tags instead of indentation."""
121
122
backquote_code_style: bool = False
123
"""Use triple-backtick ```code``` blocks instead of indentation."""
124
125
hide_strikethrough: bool = False
126
"""Hide strikethrough text instead of showing with ~~text~~ format."""
127
```
128
129
### Google Docs Specific Options
130
131
Special handling for HTML exported from Google Docs.
132
133
```python { .api }
134
google_doc: bool = False
135
"""Enable Google Docs-specific formatting and style handling."""
136
137
google_list_indent: int = 36
138
"""Number of pixels Google uses for nested list indentation."""
139
```
140
141
### Advanced Options
142
143
Additional options for specialized use cases.
144
145
```python { .api }
146
include_sup_sub: bool = False
147
"""Include superscript <sup> and subscript <sub> tags in output."""
148
149
tag_callback: Optional[Callable] = None
150
"""Custom callback function for handling specific HTML tags."""
151
```
152
153
## Configuration Examples
154
155
### Basic Configuration
156
157
```python
158
import html2text
159
160
# Create converter with custom settings
161
h = html2text.HTML2Text()
162
163
# Configure for clean, readable output
164
h.ignore_links = True # Remove all links
165
h.ignore_images = True # Remove all images
166
h.body_width = 0 # No line wrapping
167
h.ignore_emphasis = False # Keep bold/italic formatting
168
169
html = """
170
<div>
171
<h1>Title</h1>
172
<p>Some <strong>bold</strong> text with a <a href="http://example.com">link</a>.</p>
173
<img src="image.jpg" alt="An image">
174
</div>
175
"""
176
177
result = h.handle(html)
178
print(result)
179
```
180
181
### Link Processing Options
182
183
```python
184
import html2text
185
186
html = """
187
<p>Check out <a href="https://example.com">our website</a> and
188
<a href="mailto:contact@example.com">email us</a> or see
189
<a href="#section1">this section</a>.</p>
190
"""
191
192
# Inline links (default)
193
h1 = html2text.HTML2Text()
194
h1.inline_links = True
195
print("Inline links:")
196
print(h1.handle(html))
197
198
# Reference-style links
199
h2 = html2text.HTML2Text()
200
h2.inline_links = False
201
print("\nReference links:")
202
print(h2.handle(html))
203
204
# Ignore specific link types
205
h3 = html2text.HTML2Text()
206
h3.ignore_mailto_links = True
207
h3.skip_internal_links = True
208
print("\nFiltered links:")
209
print(h3.handle(html))
210
```
211
212
### Table Formatting Options
213
214
```python
215
import html2text
216
217
html = """
218
<table>
219
<tr><th>Name</th><th>Age</th><th>City</th></tr>
220
<tr><td>Alice</td><td>30</td><td>New York</td></tr>
221
<tr><td>Bob</td><td>25</td><td>London</td></tr>
222
</table>
223
"""
224
225
# Default markdown table
226
h1 = html2text.HTML2Text()
227
print("Markdown table:")
228
print(h1.handle(html))
229
230
# Padded table for alignment
231
h2 = html2text.HTML2Text()
232
h2.pad_tables = True
233
print("\nPadded table:")
234
print(h2.handle(html))
235
236
# Raw HTML table
237
h3 = html2text.HTML2Text()
238
h3.bypass_tables = True
239
print("\nHTML table:")
240
print(h3.handle(html))
241
242
# No table formatting
243
h4 = html2text.HTML2Text()
244
h4.ignore_tables = True
245
print("\nIgnored table:")
246
print(h4.handle(html))
247
```
248
249
### Code Block Formatting
250
251
```python
252
import html2text
253
254
html = """
255
<div>
256
<p>Here's some code:</p>
257
<pre><code>def hello():
258
print("Hello, world!")
259
return True</code></pre>
260
<p>And inline <code>code</code> too.</p>
261
</div>
262
"""
263
264
# Default indented code blocks
265
h1 = html2text.HTML2Text()
266
print("Indented code blocks:")
267
print(h1.handle(html))
268
269
# Triple-backtick code blocks
270
h2 = html2text.HTML2Text()
271
h2.backquote_code_style = True
272
print("\nBacktick code blocks:")
273
print(h2.handle(html))
274
275
# Marked code blocks
276
h3 = html2text.HTML2Text()
277
h3.mark_code = True
278
print("\nMarked code blocks:")
279
print(h3.handle(html))
280
```
281
282
### Text Wrapping and Formatting
283
284
```python
285
import html2text
286
287
html = "<p>This is a very long paragraph that will demonstrate text wrapping behavior in the html2text converter when processing HTML content.</p>"
288
289
# Default wrapping at 78 characters
290
h1 = html2text.HTML2Text()
291
print(f"Default wrapping (width={h1.body_width}):")
292
print(h1.handle(html))
293
294
# Custom width
295
h2 = html2text.HTML2Text()
296
h2.body_width = 40
297
print(f"\nNarrow wrapping (width={h2.body_width}):")
298
print(h2.handle(html))
299
300
# No wrapping
301
h3 = html2text.HTML2Text()
302
h3.body_width = 0
303
print(f"\nNo wrapping (width={h3.body_width}):")
304
print(h3.handle(html))
305
```
306
307
### Google Docs Processing
308
309
```python
310
import html2text
311
312
# HTML exported from Google Docs with inline styles
313
google_html = """
314
<p style="margin-left:36px"><span style="font-weight:bold">Bold item</span></p>
315
<p style="margin-left:72px">Nested item with <span style="font-style:italic">emphasis</span></p>
316
"""
317
318
h = html2text.HTML2Text()
319
h.google_doc = True
320
h.google_list_indent = 36 # Google's default indent
321
322
result = h.handle(google_html)
323
print(result)
324
```