Tessl Tile for pypi/telegraph@2.2.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

account-management.md async-api.md file-upload.md html-utilities.md index.md low-level-api.md page-operations.md

html-utilities.mddocs/

0
# HTML Utilities
1

2
Utility functions for converting between HTML and Telegraph's internal node format. These functions handle HTML parsing, validation, and conversion while respecting Telegraph's allowed tag restrictions.
3

4
## Capabilities
5

6
### HTML to Nodes Conversion
7

8
Convert HTML content to Telegraph's internal node format.
9

10
```python { .api }
11
def html_to_nodes(html_content: str) -> list:
12
    """
13
    Convert HTML content to Telegraph nodes format.
14

15
    Parameters:
16
    - html_content (str): HTML string to convert
17

18
    Returns:
19
    list: Telegraph nodes representation of the HTML
20

21
    Raises:
22
    NotAllowedTag: HTML contains tags not allowed by Telegraph
23
    InvalidHTML: HTML is malformed or has mismatched tags
24
    """
25
```
26

27
Usage examples:
28

29
```python
30
from telegraph.utils import html_to_nodes
31

32
# Simple HTML conversion
33
html = '<p>Hello <strong>world</strong>!</p>'
34
nodes = html_to_nodes(html)
35
print(nodes)
36
# Output: [{'tag': 'p', 'children': ['Hello ', {'tag': 'strong', 'children': ['world']}, '!']}]
37

38
# Complex HTML with attributes
39
html = '<p><a href="https://example.com">Link</a></p>'
40
nodes = html_to_nodes(html)
41
print(nodes)
42
# Output: [{'tag': 'p', 'children': [{'tag': 'a', 'attrs': {'href': 'https://example.com'}, 'children': ['Link']}]}]
43

44
# HTML with images
45
html = '<figure><img src="/file/image.jpg" alt="Photo"><figcaption>Caption</figcaption></figure>'
46
nodes = html_to_nodes(html)
47
```
48

49
### Nodes to HTML Conversion
50

51
Convert Telegraph nodes back to HTML format.
52

53
```python { .api }
54
def nodes_to_html(nodes: list) -> str:
55
    """
56
    Convert Telegraph nodes to HTML format.
57

58
    Parameters:
59
    - nodes (list): Telegraph nodes to convert
60

61
    Returns:
62
    str: HTML representation of the nodes
63
    """
64
```
65

66
Usage examples:
67

68
```python
69
from telegraph.utils import nodes_to_html
70

71
# Convert nodes to HTML
72
nodes = [
73
    {'tag': 'p', 'children': ['Hello ', {'tag': 'em', 'children': ['world']}, '!']}
74
]
75
html = nodes_to_html(nodes)
76
print(html)
77
# Output: '<p>Hello <em>world</em>!</p>'
78

79
# Complex nodes with attributes
80
nodes = [
81
    {'tag': 'p', 'children': [
82
        {'tag': 'a', 'attrs': {'href': 'https://example.com'}, 'children': ['Visit site']}
83
    ]}
84
]
85
html = nodes_to_html(nodes)
86
print(html)
87
# Output: '<p><a href="https://example.com">Visit site</a></p>'
88
```
89

90
### Round-trip Conversion
91

92
You can convert HTML to nodes and back to HTML:
93

94
```python
95
from telegraph.utils import html_to_nodes, nodes_to_html
96

97
original_html = '<p>Test <strong>content</strong> with <em>formatting</em>.</p>'
98
nodes = html_to_nodes(original_html)
99
converted_html = nodes_to_html(nodes)
100
print(converted_html)
101
# Output: '<p>Test <strong>content</strong> with <em>formatting</em>.</p>'
102
```
103

104
## Node Format Structure
105

106
Telegraph nodes use a specific JSON structure:
107

108
### Text Nodes
109
Plain strings represent text content:
110
```python
111
"Hello world"
112
```
113

114
### Element Nodes
115
Dictionaries represent HTML elements:
116
```python
117
{
118
    'tag': 'p',                    # Required: HTML tag name
119
    'attrs': {'id': 'content'},    # Optional: attributes dict
120
    'children': ['Text content']   # Optional: child nodes list
121
}
122
```
123

124
### Common Node Examples
125

126
```python
127
# Paragraph with text
128
{'tag': 'p', 'children': ['Simple paragraph']}
129

130
# Bold text
131
{'tag': 'strong', 'children': ['Bold text']}
132

133
# Link with attributes
134
{'tag': 'a', 'attrs': {'href': 'https://example.com'}, 'children': ['Link text']}
135

136
# Image (void element)
137
{'tag': 'img', 'attrs': {'src': '/file/image.jpg', 'alt': 'Description'}}
138

139
# Nested elements
140
{'tag': 'p', 'children': [
141
    'Text with ',
142
    {'tag': 'strong', 'children': ['bold']},
143
    ' and ',
144
    {'tag': 'em', 'children': ['italic']},
145
    ' formatting.'
146
]}
147
```
148

149
## Allowed HTML Tags
150

151
Telegraph supports a restricted set of HTML tags:
152

153
**Text formatting**: `b`, `strong`, `i`, `em`, `u`, `s`, `code`  
154
**Structure**: `p`, `br`, `h3`, `h4`, `hr`, `blockquote`, `pre`  
155
**Lists**: `ul`, `ol`, `li`  
156
**Media**: `img`, `video`, `iframe`, `figure`, `figcaption`  
157
**Links**: `a`  
158
**Semantic**: `aside`  
159

160
## HTML Processing Rules
161

162
### Whitespace Handling
163
- Multiple whitespace characters are collapsed to single spaces
164
- Leading/trailing whitespace is trimmed appropriately
165
- Whitespace in `<pre>` tags is preserved exactly
166

167
```python
168
# Multiple spaces collapsed
169
html = '<p>Multiple    spaces   here</p>'
170
nodes = html_to_nodes(html)
171
result = nodes_to_html(nodes)
172
print(result)  # '<p>Multiple spaces here</p>'
173

174
# Preformatted text preserved
175
html = '<pre>  Code with    spaces  </pre>'
176
nodes = html_to_nodes(html)
177
result = nodes_to_html(nodes)
178
print(result)  # '<pre>  Code with    spaces  </pre>'
179
```
180

181
### Case Normalization
182
HTML tag names are automatically converted to lowercase:
183

184
```python
185
html = '<P><STRONG>Upper case tags</STRONG></P>'
186
nodes = html_to_nodes(html)
187
result = nodes_to_html(nodes)
188
print(result)  # '<p><strong>Upper case tags</strong></p>'
189
```
190

191
## Error Handling
192

193
HTML utility functions raise specific exceptions for different error conditions:
194

195
```python
196
from telegraph.utils import html_to_nodes
197
from telegraph.exceptions import NotAllowedTag, InvalidHTML
198

199
# Handle disallowed tags
200
try:
201
    html = '<script>alert("bad")</script>'
202
    nodes = html_to_nodes(html)
203
except NotAllowedTag as e:
204
    print(f"Tag not allowed: {e}")
205

206
# Handle malformed HTML
207
try:
208
    html = '<p><strong>Unclosed tags</p>'
209
    nodes = html_to_nodes(html)
210
except InvalidHTML as e:
211
    print(f"Invalid HTML: {e}")
212

213
# Handle missing start tags
214
try:
215
    html = '</div><p>Content</p>'
216
    nodes = html_to_nodes(html)
217
except InvalidHTML as e:
218
    print(f"Missing start tag: {e}")
219
```
220

221
## Integration with Telegraph API
222

223
Use utilities to work with different content formats:
224

225
```python
226
from telegraph import Telegraph
227
from telegraph.utils import html_to_nodes, nodes_to_html
228

229
telegraph = Telegraph(access_token='your_token')
230

231
# Create page with HTML, retrieve as nodes
232
html_content = '<p>Original <strong>HTML</strong> content.</p>'
233
response = telegraph.create_page(
234
    title='HTML Example',
235
    html_content=html_content
236
)
237

238
# Get page content as nodes
239
page = telegraph.get_page(response['path'], return_html=False)
240
nodes = page['content']
241

242
# Modify nodes programmatically
243
nodes.append({'tag': 'p', 'children': ['Added paragraph.']})
244

245
# Convert back to HTML and update page
246
updated_html = nodes_to_html(nodes)
247
telegraph.edit_page(
248
    response['path'],
249
    title='Updated HTML Example',  
250
    html_content=updated_html
251
)
252
```
253

254
## Advanced Usage
255

256
### Custom Node Processing
257

258
```python
259
def process_nodes(nodes):
260
    """Process nodes recursively to modify content."""
261
    processed = []
262
    for node in nodes:
263
        if isinstance(node, str):
264
            # Process text nodes
265
            processed.append(node.upper())
266
        elif isinstance(node, dict):
267
            # Process element nodes
268
            new_node = {'tag': node['tag']}
269
            if 'attrs' in node:
270
                new_node['attrs'] = node['attrs']
271
            if 'children' in node:
272
                new_node['children'] = process_nodes(node['children'])
273
            processed.append(new_node)
274
    return processed
275

276
# Apply custom processing
277
original_nodes = html_to_nodes('<p>Process <em>this</em> text.</p>')
278
modified_nodes = process_nodes(original_nodes)
279
result_html = nodes_to_html(modified_nodes)
280
print(result_html)  # '<p>PROCESS <em>THIS</em> TEXT.</p>'
281
```
282

283
## Additional Utilities
284

285
### JSON Serialization
286

287
Utility function for Telegraph-compatible JSON serialization.
288

289
```python { .api }
290
def json_dumps(*args, **kwargs) -> str:
291
    """
292
    Serialize object to JSON string with Telegraph-compatible formatting.
293
    
294
    Uses compact separators and ensures proper Unicode handling.
295
    Arguments passed through to json.dumps() with optimized defaults.
296
    
297
    Returns:
298
    str: JSON string with compact formatting
299
    """
300
```
301

302
Usage example:
303

304
```python
305
from telegraph.utils import json_dumps
306

307
# Serialize nodes for Telegraph API
308
nodes = [{'tag': 'p', 'children': ['Hello, world!']}]
309
json_string = json_dumps(nodes)
310
print(json_string)  # Compact JSON output
311
```
312

313
### File Handling Utility
314

315
Context manager for handling file uploads with proper resource management.
316

317
```python { .api }
318
class FilesOpener:
319
    """
320
    Context manager for opening and managing file objects for upload.
321
    
322
    Parameters:
323
    - paths (str|list): File path(s) or file-like object(s)
324
    - key_format (str): Format string for file keys, defaults to 'file{}'
325
    """
326
    def __init__(self, paths, key_format: str = 'file{}'):
327
        pass
328
        
329
    def __enter__(self) -> list:
330
        """
331
        Open files and return list of (key, (filename, file_object, mimetype)) tuples.
332
        """
333
        pass
334
        
335
    def __exit__(self, type, value, traceback):
336
        """
337
        Close all opened files.
338
        """
339
        pass
340
```
341

342
Usage example:
343

344
```python
345
from telegraph.utils import FilesOpener
346

347
# Handle single file
348
with FilesOpener('image.jpg') as files:
349
    print(files)  # [('file0', ('file0', <file_object>, 'image/jpeg'))]
350

351
# Handle multiple files
352
with FilesOpener(['img1.png', 'img2.jpg']) as files:
353
    for key, (filename, file_obj, mimetype) in files:
354
        print(f"{key}: {filename} ({mimetype})")
355
```
356

357
### Telegraph Constants
358

359
Important constants for HTML processing and validation.
360

361
```python { .api }
362
ALLOWED_TAGS: set = {
363
    'a', 'aside', 'b', 'blockquote', 'br', 'code', 'em', 'figcaption', 'figure',
364
    'h3', 'h4', 'hr', 'i', 'iframe', 'img', 'li', 'ol', 'p', 'pre', 's',
365
    'strong', 'u', 'ul', 'video'
366
}
367

368
VOID_ELEMENTS: set = {
369
    'area', 'base', 'br', 'col', 'embed', 'hr', 'img', 'input', 'keygen',
370
    'link', 'menuitem', 'meta', 'param', 'source', 'track', 'wbr'
371
}
372

373
BLOCK_ELEMENTS: set = {
374
    'address', 'article', 'aside', 'blockquote', 'canvas', 'dd', 'div', 'dl',
375
    'dt', 'fieldset', 'figcaption', 'figure', 'footer', 'form', 'h1', 'h2',
376
    'h3', 'h4', 'h5', 'h6', 'header', 'hgroup', 'hr', 'li', 'main', 'nav',
377
    'noscript', 'ol', 'output', 'p', 'pre', 'section', 'table', 'tfoot', 'ul',
378
    'video'
379
}
380
```
381

382
These constants can be imported and used for validation:
383

384
```python
385
from telegraph.utils import ALLOWED_TAGS, VOID_ELEMENTS, BLOCK_ELEMENTS
386

387
def validate_tag(tag_name):
388
    """Check if a tag is allowed by Telegraph."""
389
    return tag_name.lower() in ALLOWED_TAGS
390

391
def is_void_element(tag_name):
392
    """Check if a tag is a void element (self-closing)."""
393
    return tag_name.lower() in VOID_ELEMENTS
394

395
def is_block_element(tag_name):
396
    """Check if a tag is a block-level element."""
397
    return tag_name.lower() in BLOCK_ELEMENTS
398

399
# Usage
400
print(validate_tag('p'))      # True
401
print(validate_tag('script')) # False
402
print(is_void_element('br'))  # True
403
print(is_block_element('p'))  # True
404
```
405

406
### Content Validation
407

408
```python
409
def validate_content(html):
410
    """Validate HTML content for Telegraph compatibility."""
411
    try:
412
        nodes = html_to_nodes(html)
413
        return True, "Content is valid"
414
    except NotAllowedTag as e:
415
        return False, f"Contains disallowed tag: {e}"
416
    except InvalidHTML as e:
417
        return False, f"Invalid HTML structure: {e}"
418

419
# Validate before creating page
420
html = '<p>Valid content with <strong>formatting</strong>.</p>'
421
is_valid, message = validate_content(html)
422
if is_valid:
423
    telegraph.create_page(title='Validated Content', html_content=html)
424
else:
425
    print(f"Invalid content: {message}")
426
```

Version

Tile

Files

html-utilities.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

html-utilities.mddocs/