or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

account-management.mdasync-api.mdfile-upload.mdhtml-utilities.mdindex.mdlow-level-api.mdpage-operations.md

html-utilities.mddocs/

0

# HTML Utilities

1

2

Utility functions for converting between HTML and Telegraph's internal node format. These functions handle HTML parsing, validation, and conversion while respecting Telegraph's allowed tag restrictions.

3

4

## Capabilities

5

6

### HTML to Nodes Conversion

7

8

Convert HTML content to Telegraph's internal node format.

9

10

```python { .api }

11

def html_to_nodes(html_content: str) -> list:

12

"""

13

Convert HTML content to Telegraph nodes format.

14

15

Parameters:

16

- html_content (str): HTML string to convert

17

18

Returns:

19

list: Telegraph nodes representation of the HTML

20

21

Raises:

22

NotAllowedTag: HTML contains tags not allowed by Telegraph

23

InvalidHTML: HTML is malformed or has mismatched tags

24

"""

25

```

26

27

Usage examples:

28

29

```python

30

from telegraph.utils import html_to_nodes

31

32

# Simple HTML conversion

33

html = '<p>Hello <strong>world</strong>!</p>'

34

nodes = html_to_nodes(html)

35

print(nodes)

36

# Output: [{'tag': 'p', 'children': ['Hello ', {'tag': 'strong', 'children': ['world']}, '!']}]

37

38

# Complex HTML with attributes

39

html = '<p><a href="https://example.com">Link</a></p>'

40

nodes = html_to_nodes(html)

41

print(nodes)

42

# Output: [{'tag': 'p', 'children': [{'tag': 'a', 'attrs': {'href': 'https://example.com'}, 'children': ['Link']}]}]

43

44

# HTML with images

45

html = '<figure><img src="/file/image.jpg" alt="Photo"><figcaption>Caption</figcaption></figure>'

46

nodes = html_to_nodes(html)

47

```

48

49

### Nodes to HTML Conversion

50

51

Convert Telegraph nodes back to HTML format.

52

53

```python { .api }

54

def nodes_to_html(nodes: list) -> str:

55

"""

56

Convert Telegraph nodes to HTML format.

57

58

Parameters:

59

- nodes (list): Telegraph nodes to convert

60

61

Returns:

62

str: HTML representation of the nodes

63

"""

64

```

65

66

Usage examples:

67

68

```python

69

from telegraph.utils import nodes_to_html

70

71

# Convert nodes to HTML

72

nodes = [

73

{'tag': 'p', 'children': ['Hello ', {'tag': 'em', 'children': ['world']}, '!']}

74

]

75

html = nodes_to_html(nodes)

76

print(html)

77

# Output: '<p>Hello <em>world</em>!</p>'

78

79

# Complex nodes with attributes

80

nodes = [

81

{'tag': 'p', 'children': [

82

{'tag': 'a', 'attrs': {'href': 'https://example.com'}, 'children': ['Visit site']}

83

]}

84

]

85

html = nodes_to_html(nodes)

86

print(html)

87

# Output: '<p><a href="https://example.com">Visit site</a></p>'

88

```

89

90

### Round-trip Conversion

91

92

You can convert HTML to nodes and back to HTML:

93

94

```python

95

from telegraph.utils import html_to_nodes, nodes_to_html

96

97

original_html = '<p>Test <strong>content</strong> with <em>formatting</em>.</p>'

98

nodes = html_to_nodes(original_html)

99

converted_html = nodes_to_html(nodes)

100

print(converted_html)

101

# Output: '<p>Test <strong>content</strong> with <em>formatting</em>.</p>'

102

```

103

104

## Node Format Structure

105

106

Telegraph nodes use a specific JSON structure:

107

108

### Text Nodes

109

Plain strings represent text content:

110

```python

111

"Hello world"

112

```

113

114

### Element Nodes

115

Dictionaries represent HTML elements:

116

```python

117

{

118

'tag': 'p', # Required: HTML tag name

119

'attrs': {'id': 'content'}, # Optional: attributes dict

120

'children': ['Text content'] # Optional: child nodes list

121

}

122

```

123

124

### Common Node Examples

125

126

```python

127

# Paragraph with text

128

{'tag': 'p', 'children': ['Simple paragraph']}

129

130

# Bold text

131

{'tag': 'strong', 'children': ['Bold text']}

132

133

# Link with attributes

134

{'tag': 'a', 'attrs': {'href': 'https://example.com'}, 'children': ['Link text']}

135

136

# Image (void element)

137

{'tag': 'img', 'attrs': {'src': '/file/image.jpg', 'alt': 'Description'}}

138

139

# Nested elements

140

{'tag': 'p', 'children': [

141

'Text with ',

142

{'tag': 'strong', 'children': ['bold']},

143

' and ',

144

{'tag': 'em', 'children': ['italic']},

145

' formatting.'

146

]}

147

```

148

149

## Allowed HTML Tags

150

151

Telegraph supports a restricted set of HTML tags:

152

153

**Text formatting**: `b`, `strong`, `i`, `em`, `u`, `s`, `code`

154

**Structure**: `p`, `br`, `h3`, `h4`, `hr`, `blockquote`, `pre`

155

**Lists**: `ul`, `ol`, `li`

156

**Media**: `img`, `video`, `iframe`, `figure`, `figcaption`

157

**Links**: `a`

158

**Semantic**: `aside`

159

160

## HTML Processing Rules

161

162

### Whitespace Handling

163

- Multiple whitespace characters are collapsed to single spaces

164

- Leading/trailing whitespace is trimmed appropriately

165

- Whitespace in `<pre>` tags is preserved exactly

166

167

```python

168

# Multiple spaces collapsed

169

html = '<p>Multiple spaces here</p>'

170

nodes = html_to_nodes(html)

171

result = nodes_to_html(nodes)

172

print(result) # '<p>Multiple spaces here</p>'

173

174

# Preformatted text preserved

175

html = '<pre> Code with spaces </pre>'

176

nodes = html_to_nodes(html)

177

result = nodes_to_html(nodes)

178

print(result) # '<pre> Code with spaces </pre>'

179

```

180

181

### Case Normalization

182

HTML tag names are automatically converted to lowercase:

183

184

```python

185

html = '<P><STRONG>Upper case tags</STRONG></P>'

186

nodes = html_to_nodes(html)

187

result = nodes_to_html(nodes)

188

print(result) # '<p><strong>Upper case tags</strong></p>'

189

```

190

191

## Error Handling

192

193

HTML utility functions raise specific exceptions for different error conditions:

194

195

```python

196

from telegraph.utils import html_to_nodes

197

from telegraph.exceptions import NotAllowedTag, InvalidHTML

198

199

# Handle disallowed tags

200

try:

201

html = '<script>alert("bad")</script>'

202

nodes = html_to_nodes(html)

203

except NotAllowedTag as e:

204

print(f"Tag not allowed: {e}")

205

206

# Handle malformed HTML

207

try:

208

html = '<p><strong>Unclosed tags</p>'

209

nodes = html_to_nodes(html)

210

except InvalidHTML as e:

211

print(f"Invalid HTML: {e}")

212

213

# Handle missing start tags

214

try:

215

html = '</div><p>Content</p>'

216

nodes = html_to_nodes(html)

217

except InvalidHTML as e:

218

print(f"Missing start tag: {e}")

219

```

220

221

## Integration with Telegraph API

222

223

Use utilities to work with different content formats:

224

225

```python

226

from telegraph import Telegraph

227

from telegraph.utils import html_to_nodes, nodes_to_html

228

229

telegraph = Telegraph(access_token='your_token')

230

231

# Create page with HTML, retrieve as nodes

232

html_content = '<p>Original <strong>HTML</strong> content.</p>'

233

response = telegraph.create_page(

234

title='HTML Example',

235

html_content=html_content

236

)

237

238

# Get page content as nodes

239

page = telegraph.get_page(response['path'], return_html=False)

240

nodes = page['content']

241

242

# Modify nodes programmatically

243

nodes.append({'tag': 'p', 'children': ['Added paragraph.']})

244

245

# Convert back to HTML and update page

246

updated_html = nodes_to_html(nodes)

247

telegraph.edit_page(

248

response['path'],

249

title='Updated HTML Example',

250

html_content=updated_html

251

)

252

```

253

254

## Advanced Usage

255

256

### Custom Node Processing

257

258

```python

259

def process_nodes(nodes):

260

"""Process nodes recursively to modify content."""

261

processed = []

262

for node in nodes:

263

if isinstance(node, str):

264

# Process text nodes

265

processed.append(node.upper())

266

elif isinstance(node, dict):

267

# Process element nodes

268

new_node = {'tag': node['tag']}

269

if 'attrs' in node:

270

new_node['attrs'] = node['attrs']

271

if 'children' in node:

272

new_node['children'] = process_nodes(node['children'])

273

processed.append(new_node)

274

return processed

275

276

# Apply custom processing

277

original_nodes = html_to_nodes('<p>Process <em>this</em> text.</p>')

278

modified_nodes = process_nodes(original_nodes)

279

result_html = nodes_to_html(modified_nodes)

280

print(result_html) # '<p>PROCESS <em>THIS</em> TEXT.</p>'

281

```

282

283

## Additional Utilities

284

285

### JSON Serialization

286

287

Utility function for Telegraph-compatible JSON serialization.

288

289

```python { .api }

290

def json_dumps(*args, **kwargs) -> str:

291

"""

292

Serialize object to JSON string with Telegraph-compatible formatting.

293

294

Uses compact separators and ensures proper Unicode handling.

295

Arguments passed through to json.dumps() with optimized defaults.

296

297

Returns:

298

str: JSON string with compact formatting

299

"""

300

```

301

302

Usage example:

303

304

```python

305

from telegraph.utils import json_dumps

306

307

# Serialize nodes for Telegraph API

308

nodes = [{'tag': 'p', 'children': ['Hello, world!']}]

309

json_string = json_dumps(nodes)

310

print(json_string) # Compact JSON output

311

```

312

313

### File Handling Utility

314

315

Context manager for handling file uploads with proper resource management.

316

317

```python { .api }

318

class FilesOpener:

319

"""

320

Context manager for opening and managing file objects for upload.

321

322

Parameters:

323

- paths (str|list): File path(s) or file-like object(s)

324

- key_format (str): Format string for file keys, defaults to 'file{}'

325

"""

326

def __init__(self, paths, key_format: str = 'file{}'):

327

pass

328

329

def __enter__(self) -> list:

330

"""

331

Open files and return list of (key, (filename, file_object, mimetype)) tuples.

332

"""

333

pass

334

335

def __exit__(self, type, value, traceback):

336

"""

337

Close all opened files.

338

"""

339

pass

340

```

341

342

Usage example:

343

344

```python

345

from telegraph.utils import FilesOpener

346

347

# Handle single file

348

with FilesOpener('image.jpg') as files:

349

print(files) # [('file0', ('file0', <file_object>, 'image/jpeg'))]

350

351

# Handle multiple files

352

with FilesOpener(['img1.png', 'img2.jpg']) as files:

353

for key, (filename, file_obj, mimetype) in files:

354

print(f"{key}: {filename} ({mimetype})")

355

```

356

357

### Telegraph Constants

358

359

Important constants for HTML processing and validation.

360

361

```python { .api }

362

ALLOWED_TAGS: set = {

363

'a', 'aside', 'b', 'blockquote', 'br', 'code', 'em', 'figcaption', 'figure',

364

'h3', 'h4', 'hr', 'i', 'iframe', 'img', 'li', 'ol', 'p', 'pre', 's',

365

'strong', 'u', 'ul', 'video'

366

}

367

368

VOID_ELEMENTS: set = {

369

'area', 'base', 'br', 'col', 'embed', 'hr', 'img', 'input', 'keygen',

370

'link', 'menuitem', 'meta', 'param', 'source', 'track', 'wbr'

371

}

372

373

BLOCK_ELEMENTS: set = {

374

'address', 'article', 'aside', 'blockquote', 'canvas', 'dd', 'div', 'dl',

375

'dt', 'fieldset', 'figcaption', 'figure', 'footer', 'form', 'h1', 'h2',

376

'h3', 'h4', 'h5', 'h6', 'header', 'hgroup', 'hr', 'li', 'main', 'nav',

377

'noscript', 'ol', 'output', 'p', 'pre', 'section', 'table', 'tfoot', 'ul',

378

'video'

379

}

380

```

381

382

These constants can be imported and used for validation:

383

384

```python

385

from telegraph.utils import ALLOWED_TAGS, VOID_ELEMENTS, BLOCK_ELEMENTS

386

387

def validate_tag(tag_name):

388

"""Check if a tag is allowed by Telegraph."""

389

return tag_name.lower() in ALLOWED_TAGS

390

391

def is_void_element(tag_name):

392

"""Check if a tag is a void element (self-closing)."""

393

return tag_name.lower() in VOID_ELEMENTS

394

395

def is_block_element(tag_name):

396

"""Check if a tag is a block-level element."""

397

return tag_name.lower() in BLOCK_ELEMENTS

398

399

# Usage

400

print(validate_tag('p')) # True

401

print(validate_tag('script')) # False

402

print(is_void_element('br')) # True

403

print(is_block_element('p')) # True

404

```

405

406

### Content Validation

407

408

```python

409

def validate_content(html):

410

"""Validate HTML content for Telegraph compatibility."""

411

try:

412

nodes = html_to_nodes(html)

413

return True, "Content is valid"

414

except NotAllowedTag as e:

415

return False, f"Contains disallowed tag: {e}"

416

except InvalidHTML as e:

417

return False, f"Invalid HTML structure: {e}"

418

419

# Validate before creating page

420

html = '<p>Valid content with <strong>formatting</strong>.</p>'

421

is_valid, message = validate_content(html)

422

if is_valid:

423

telegraph.create_page(title='Validated Content', html_content=html)

424

else:

425

print(f"Invalid content: {message}")

426

```