or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

configuration.mdcore-conversion.mdindex.mdutilities.md

configuration.mddocs/

0

# Configuration Options

1

2

Comprehensive formatting and behavior configuration for customizing HTML to text conversion. All options can be set on HTML2Text instances to control output formatting, link handling, table processing, and text styling.

3

4

## Capabilities

5

6

### Link and Image Configuration

7

8

Control how links and images are processed and formatted in the output.

9

10

```python { .api }

11

# Link handling options

12

ignore_links: bool = False

13

"""Skip all link formatting, treating links as plain text."""

14

15

ignore_mailto_links: bool = False

16

"""Skip mailto: links while processing other links normally."""

17

18

inline_links: bool = True

19

"""Use inline [text](url) format vs reference-style [text][1] links."""

20

21

protect_links: bool = False

22

"""Wrap links with angle brackets <url> to prevent line breaks."""

23

24

skip_internal_links: bool = True

25

"""Skip internal anchor links (href="#section")."""

26

27

links_each_paragraph: bool = False

28

"""Place reference links after each paragraph instead of document end."""

29

30

use_automatic_links: bool = True

31

"""Convert URLs that match link text to automatic <url> format."""

32

33

wrap_links: bool = True

34

"""Allow wrapping of long links across multiple lines."""

35

36

# Image handling options

37

ignore_images: bool = False

38

"""Skip all image formatting, removing images from output."""

39

40

images_as_html: bool = False

41

"""Output images as raw HTML tags preserving attributes."""

42

43

images_to_alt: bool = False

44

"""Replace images with alt text only, discarding image references."""

45

46

images_with_size: bool = False

47

"""Include width/height attributes when outputting images as HTML."""

48

49

default_image_alt: str = ""

50

"""Default alt text for images missing alt attributes."""

51

```

52

53

### Text Formatting Configuration

54

55

Control text wrapping, character handling, and emphasis formatting.

56

57

```python { .api }

58

# Text wrapping and layout

59

body_width: int = 78

60

"""Maximum line width for text wrapping. Set to 0 for no wrapping."""

61

62

single_line_break: bool = False

63

"""Use single line breaks after block elements instead of double."""

64

65

wrap_list_items: bool = False

66

"""Allow wrapping of list items across multiple lines."""

67

68

# Character and emphasis handling

69

unicode_snob: bool = False

70

"""Use Unicode characters instead of ASCII replacements (e.g., → vs ->)."""

71

72

escape_snob: bool = False

73

"""Escape all special characters for safer but less readable output."""

74

75

ignore_emphasis: bool = False

76

"""Skip all emphasis formatting (bold, italic, etc.)."""

77

78

# Emphasis markers

79

ul_item_mark: str = "*"

80

"""Character used for unordered list items. Common: "*", "-", "+"."""

81

82

emphasis_mark: str = "_"

83

"""Character used for italic emphasis. Common: "_", "*"."""

84

85

strong_mark: str = "**"

86

"""Character sequence used for bold emphasis."""

87

88

# Quote handling

89

open_quote: str = '"'

90

"""Character used to open quotes from <q> tags."""

91

92

close_quote: str = '"'

93

"""Character used to close quotes from <q> tags."""

94

```

95

96

### Table Configuration

97

98

Control table processing and formatting options.

99

100

```python { .api }

101

bypass_tables: bool = False

102

"""Format tables as raw HTML instead of Markdown table syntax."""

103

104

ignore_tables: bool = False

105

"""Skip table formatting entirely, treating as plain text."""

106

107

pad_tables: bool = False

108

"""Pad table cells to equal column width for aligned appearance."""

109

110

wrap_tables: bool = False

111

"""Allow wrapping of table content across multiple lines."""

112

```

113

114

### Code and Preformatted Text

115

116

Control handling of code blocks and preformatted content.

117

118

```python { .api }

119

mark_code: bool = False

120

"""Mark code blocks with [code]...[/code] tags instead of indentation."""

121

122

backquote_code_style: bool = False

123

"""Use triple-backtick ```code``` blocks instead of indentation."""

124

125

hide_strikethrough: bool = False

126

"""Hide strikethrough text instead of showing with ~~text~~ format."""

127

```

128

129

### Google Docs Specific Options

130

131

Special handling for HTML exported from Google Docs.

132

133

```python { .api }

134

google_doc: bool = False

135

"""Enable Google Docs-specific formatting and style handling."""

136

137

google_list_indent: int = 36

138

"""Number of pixels Google uses for nested list indentation."""

139

```

140

141

### Advanced Options

142

143

Additional options for specialized use cases.

144

145

```python { .api }

146

include_sup_sub: bool = False

147

"""Include superscript <sup> and subscript <sub> tags in output."""

148

149

tag_callback: Optional[Callable] = None

150

"""Custom callback function for handling specific HTML tags."""

151

```

152

153

## Configuration Examples

154

155

### Basic Configuration

156

157

```python

158

import html2text

159

160

# Create converter with custom settings

161

h = html2text.HTML2Text()

162

163

# Configure for clean, readable output

164

h.ignore_links = True # Remove all links

165

h.ignore_images = True # Remove all images

166

h.body_width = 0 # No line wrapping

167

h.ignore_emphasis = False # Keep bold/italic formatting

168

169

html = """

170

<div>

171

<h1>Title</h1>

172

<p>Some <strong>bold</strong> text with a <a href="http://example.com">link</a>.</p>

173

<img src="image.jpg" alt="An image">

174

</div>

175

"""

176

177

result = h.handle(html)

178

print(result)

179

```

180

181

### Link Processing Options

182

183

```python

184

import html2text

185

186

html = """

187

<p>Check out <a href="https://example.com">our website</a> and

188

<a href="mailto:contact@example.com">email us</a> or see

189

<a href="#section1">this section</a>.</p>

190

"""

191

192

# Inline links (default)

193

h1 = html2text.HTML2Text()

194

h1.inline_links = True

195

print("Inline links:")

196

print(h1.handle(html))

197

198

# Reference-style links

199

h2 = html2text.HTML2Text()

200

h2.inline_links = False

201

print("\nReference links:")

202

print(h2.handle(html))

203

204

# Ignore specific link types

205

h3 = html2text.HTML2Text()

206

h3.ignore_mailto_links = True

207

h3.skip_internal_links = True

208

print("\nFiltered links:")

209

print(h3.handle(html))

210

```

211

212

### Table Formatting Options

213

214

```python

215

import html2text

216

217

html = """

218

<table>

219

<tr><th>Name</th><th>Age</th><th>City</th></tr>

220

<tr><td>Alice</td><td>30</td><td>New York</td></tr>

221

<tr><td>Bob</td><td>25</td><td>London</td></tr>

222

</table>

223

"""

224

225

# Default markdown table

226

h1 = html2text.HTML2Text()

227

print("Markdown table:")

228

print(h1.handle(html))

229

230

# Padded table for alignment

231

h2 = html2text.HTML2Text()

232

h2.pad_tables = True

233

print("\nPadded table:")

234

print(h2.handle(html))

235

236

# Raw HTML table

237

h3 = html2text.HTML2Text()

238

h3.bypass_tables = True

239

print("\nHTML table:")

240

print(h3.handle(html))

241

242

# No table formatting

243

h4 = html2text.HTML2Text()

244

h4.ignore_tables = True

245

print("\nIgnored table:")

246

print(h4.handle(html))

247

```

248

249

### Code Block Formatting

250

251

```python

252

import html2text

253

254

html = """

255

<div>

256

<p>Here's some code:</p>

257

<pre><code>def hello():

258

print("Hello, world!")

259

return True</code></pre>

260

<p>And inline <code>code</code> too.</p>

261

</div>

262

"""

263

264

# Default indented code blocks

265

h1 = html2text.HTML2Text()

266

print("Indented code blocks:")

267

print(h1.handle(html))

268

269

# Triple-backtick code blocks

270

h2 = html2text.HTML2Text()

271

h2.backquote_code_style = True

272

print("\nBacktick code blocks:")

273

print(h2.handle(html))

274

275

# Marked code blocks

276

h3 = html2text.HTML2Text()

277

h3.mark_code = True

278

print("\nMarked code blocks:")

279

print(h3.handle(html))

280

```

281

282

### Text Wrapping and Formatting

283

284

```python

285

import html2text

286

287

html = "<p>This is a very long paragraph that will demonstrate text wrapping behavior in the html2text converter when processing HTML content.</p>"

288

289

# Default wrapping at 78 characters

290

h1 = html2text.HTML2Text()

291

print(f"Default wrapping (width={h1.body_width}):")

292

print(h1.handle(html))

293

294

# Custom width

295

h2 = html2text.HTML2Text()

296

h2.body_width = 40

297

print(f"\nNarrow wrapping (width={h2.body_width}):")

298

print(h2.handle(html))

299

300

# No wrapping

301

h3 = html2text.HTML2Text()

302

h3.body_width = 0

303

print(f"\nNo wrapping (width={h3.body_width}):")

304

print(h3.handle(html))

305

```

306

307

### Google Docs Processing

308

309

```python

310

import html2text

311

312

# HTML exported from Google Docs with inline styles

313

google_html = """

314

<p style="margin-left:36px"><span style="font-weight:bold">Bold item</span></p>

315

<p style="margin-left:72px">Nested item with <span style="font-style:italic">emphasis</span></p>

316

"""

317

318

h = html2text.HTML2Text()

319

h.google_doc = True

320

h.google_list_indent = 36 # Google's default indent

321

322

result = h.handle(google_html)

323

print(result)

324

```