Tessl Tile for pypi/bleach@6.2.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

index.md

index.mddocs/

0
# Bleach
1

2
An easy safelist-based HTML-sanitizing tool that escapes or strips markup and attributes from untrusted HTML content. Bleach uses an allowlist approach to remove malicious content while preserving safe, intended HTML elements. It can also safely linkify text, applying more comprehensive filters than Django's urlize filter.
3

4
## Package Information
5

6
- **Package Name**: bleach
7
- **Language**: Python
8
- **Installation**: `pip install bleach`
9
- **Optional Dependencies**: `pip install bleach[css]` (for CSS sanitization with tinycss2)
10

11
## Core Imports
12

13
```python
14
import bleach
15
```
16

17
For main functions:
18

19
```python
20
from bleach import clean, linkify
21
```
22

23
For classes:
24

25
```python
26
from bleach.sanitizer import Cleaner, BleachSanitizerFilter, attribute_filter_factory
27
from bleach.linkifier import Linker, LinkifyFilter
28
from bleach.css_sanitizer import CSSSanitizer
29
```
30

31
For callbacks:
32

33
```python  
34
from bleach.callbacks import nofollow, target_blank
35
```
36

37
For constants and utilities:
38

39
```python
40
from bleach.sanitizer import ALLOWED_TAGS, ALLOWED_ATTRIBUTES, ALLOWED_PROTOCOLS
41
from bleach.sanitizer import INVISIBLE_CHARACTERS, INVISIBLE_CHARACTERS_RE, INVISIBLE_REPLACEMENT_CHAR
42
from bleach.linkifier import DEFAULT_CALLBACKS, build_url_re, build_email_re, TLDS, URL_RE, EMAIL_RE, PROTO_RE
43
from bleach.css_sanitizer import ALLOWED_CSS_PROPERTIES, ALLOWED_SVG_PROPERTIES
44
from bleach import html5lib_shim  # For HTML_TAGS constant
45
from bleach import __version__, __releasedate__
46
```
47

48
## Basic Usage
49

50
```python
51
import bleach
52

53
# Basic HTML sanitization - removes unsafe tags and attributes
54
unsafe_html = '<script>alert("XSS")</script><p onclick="evil()">Hello <b>world</b></p>'
55
safe_html = bleach.clean(unsafe_html)
56
# Result: '&lt;script&gt;alert("XSS")&lt;/script&gt;<p>Hello <b>world</b></p>'
57

58
# Linkification - converts URLs to clickable links
59
text_with_urls = 'Visit https://example.com for more info!'
60
linked_text = bleach.linkify(text_with_urls)
61
# Result: 'Visit <a href="https://example.com" rel="nofollow">https://example.com</a> for more info!'
62

63
# Combined cleaning and linkifying
64
unsafe_text = 'Check out http://evil.com<script>alert("bad")</script>'
65
safe_linked = bleach.linkify(bleach.clean(unsafe_text))
66
```
67

68
## Capabilities
69

70
### HTML Sanitization
71

72
Cleans HTML fragments by removing or escaping malicious content using an allowlist-based approach.
73

74
```python { .api }
75
def clean(
76
    text: str,
77
    tags: frozenset = ALLOWED_TAGS,
78
    attributes: dict = ALLOWED_ATTRIBUTES,
79
    protocols: frozenset = ALLOWED_PROTOCOLS,
80
    strip: bool = False,
81
    strip_comments: bool = True,
82
    css_sanitizer: CSSSanitizer = None
83
) -> str:
84
    """
85
    Clean an HTML fragment of malicious content and return it.
86
    
87
    Parameters:
88
    - text: the HTML text to clean
89
    - tags: set of allowed tags; defaults to ALLOWED_TAGS
90
    - attributes: allowed attributes; can be callable, list or dict; defaults to ALLOWED_ATTRIBUTES
91
    - protocols: allowed list of protocols for links; defaults to ALLOWED_PROTOCOLS
92
    - strip: whether to strip disallowed elements instead of escaping
93
    - strip_comments: whether to strip HTML comments
94
    - css_sanitizer: instance with sanitize_css method for style attributes
95
    
96
    Returns:
97
    Cleaned text as unicode string
98
    """
99
```
100

101
### URL Linkification
102

103
Converts URL-like strings in HTML fragments to clickable links while preserving existing links and structure.
104

105
```python { .api }
106
def linkify(
107
    text: str,
108
    callbacks: list = DEFAULT_CALLBACKS,
109
    skip_tags: set = None,
110
    parse_email: bool = False
111
) -> str:
112
    """
113
    Convert URL-like strings in an HTML fragment to links.
114
    
115
    Parameters:
116
    - text: the text to linkify
117
    - callbacks: list of callbacks to run when adjusting tag attributes
118
    - skip_tags: set of tags to skip linkifying contents of
119
    - parse_email: whether to linkify email addresses
120
    
121
    Returns:
122
    Linkified text as unicode string
123
    """
124
```
125

126
### Advanced HTML Cleaning
127

128
Configurable HTML cleaner for repeated use with consistent settings.
129

130
```python { .api }
131
class Cleaner:
132
    """
133
    Cleaner for cleaning HTML fragments of malicious content.
134
    Not thread-safe - create separate instances per thread.
135
    """
136
    
137
    def __init__(
138
        self,
139
        tags: frozenset = ALLOWED_TAGS,
140
        attributes: dict = ALLOWED_ATTRIBUTES,
141
        protocols: frozenset = ALLOWED_PROTOCOLS,
142
        strip: bool = False,
143
        strip_comments: bool = True,
144
        filters: list = None,
145
        css_sanitizer: CSSSanitizer = None
146
    ):
147
        """
148
        Initialize a Cleaner instance.
149
        
150
        Parameters:
151
        - tags: set of allowed tags
152
        - attributes: allowed attributes configuration
153
        - protocols: allowed protocols for links  
154
        - strip: whether to strip disallowed elements
155
        - strip_comments: whether to strip HTML comments
156
        - filters: list of additional html5lib filters
157
        - css_sanitizer: CSS sanitizer instance
158
        """
159
    
160
    def clean(self, text: str) -> str:
161
        """
162
        Clean the specified HTML text.
163
        
164
        Parameters:
165
        - text: HTML text to clean
166
        
167
        Returns:
168
        Cleaned HTML text
169
        """
170
```
171

172
### Advanced URL Linkification  
173

174
Configurable URL linkifier for repeated use with consistent settings.
175

176
```python { .api }
177
class Linker:
178
    """
179
    Convert URL-like strings in HTML fragments to links with configuration.
180
    """
181
    
182
    def __init__(
183
        self,
184
        callbacks: list = DEFAULT_CALLBACKS,
185
        skip_tags: set = None,
186
        parse_email: bool = False,
187
        url_re = URL_RE,
188
        email_re = EMAIL_RE,
189
        recognized_tags = html5lib_shim.HTML_TAGS
190
    ):
191
        """
192
        Create a Linker instance.
193
        
194
        Parameters:
195
        - callbacks: list of callbacks for adjusting tag attributes
196
        - skip_tags: set of tags to skip linkifying contents of
197
        - parse_email: whether to linkify email addresses
198
        - url_re: custom URL matching regex
199
        - email_re: custom email matching regex
200
        - recognized_tags: set of recognized HTML tags
201
        """
202
    
203
    def linkify(self, text: str) -> str:
204
        """
205
        Linkify the specified text.
206
        
207
        Parameters:
208
        - text: text to linkify
209
        
210
        Returns:
211
        Linkified text
212
        
213
        Raises:
214
        TypeError: if text is not a string type
215
        """
216
```
217

218
### Advanced Linkification Filter
219

220
HTML filter for linkifying during html5lib parsing, commonly used with Cleaner filters.
221

222
```python { .api }
223
class LinkifyFilter(html5lib_shim.Filter):
224
    """
225
    HTML filter that linkifies text during html5lib parsing.
226
    Can be used with Cleaner filters for combined cleaning and linkification.
227
    """
228
    
229
    def __init__(
230
        self,
231
        source,
232
        callbacks: list = DEFAULT_CALLBACKS,
233
        skip_tags: set = None,
234
        parse_email: bool = False,
235
        url_re = URL_RE,
236
        email_re = EMAIL_RE
237
    ):
238
        """
239
        Create a LinkifyFilter instance.
240
        
241
        Parameters:
242
        - source: html5lib TreeWalker stream  
243
        - callbacks: list of callbacks for adjusting tag attributes
244
        - skip_tags: set of tags to skip linkifying contents of
245
        - parse_email: whether to linkify email addresses
246
        - url_re: custom URL matching regex
247
        - email_re: custom email matching regex
248
        """
249
```
250

251
### HTML Sanitization Filter
252

253
HTML filter for sanitizing content during html5lib parsing, commonly used with other filters.
254

255
```python { .api }
256
class BleachSanitizerFilter(html5lib_shim.SanitizerFilter):
257
    """
258
    HTML filter that sanitizes HTML during html5lib parsing.
259
    Can be used with other html5lib filters for custom processing.
260
    """
261
    
262
    def __init__(
263
        self,
264
        source,
265
        allowed_tags: frozenset = ALLOWED_TAGS,
266
        attributes = ALLOWED_ATTRIBUTES,
267
        allowed_protocols: frozenset = ALLOWED_PROTOCOLS,
268
        attr_val_is_uri = html5lib_shim.attr_val_is_uri,
269
        svg_attr_val_allows_ref = html5lib_shim.svg_attr_val_allows_ref,
270
        svg_allow_local_href = html5lib_shim.svg_allow_local_href,
271
        strip_disallowed_tags: bool = False,
272
        strip_html_comments: bool = True,
273
        css_sanitizer: CSSSanitizer = None
274
    ):
275
        """
276
        Create a BleachSanitizerFilter instance.
277
        
278
        Parameters:
279
        - source: html5lib TreeWalker stream
280
        - allowed_tags: set of allowed tags
281
        - attributes: allowed attributes configuration
282
        - allowed_protocols: allowed protocols for links
283
        - attr_val_is_uri: set of attributes that have URI values
284
        - svg_attr_val_allows_ref: set of SVG attributes that can have references
285
        - svg_allow_local_href: set of SVG elements that can have local hrefs
286
        - strip_disallowed_tags: whether to strip disallowed tags
287
        - strip_html_comments: whether to strip HTML comments
288
        - css_sanitizer: CSS sanitizer instance
289
        """
290
```
291

292
### CSS Sanitization
293

294
Sanitizes CSS declarations in style attributes and style elements.
295

296
```python { .api }
297
class CSSSanitizer:
298
    """
299
    CSS sanitizer for cleaning style attributes and style text.
300
    """
301
    
302
    def __init__(
303
        self,
304
        allowed_css_properties: frozenset = ALLOWED_CSS_PROPERTIES,
305
        allowed_svg_properties: frozenset = ALLOWED_SVG_PROPERTIES
306
    ):
307
        """
308
        Initialize CSS sanitizer.
309
        
310
        Parameters:
311
        - allowed_css_properties: set of allowed CSS properties
312
        - allowed_svg_properties: set of allowed SVG properties
313
        """
314
    
315
    def sanitize_css(self, style: str) -> str:
316
        """
317
        Sanitize CSS declarations.
318
        
319
        Parameters:
320
        - style: CSS declarations string
321
        
322
        Returns:
323
        Sanitized CSS string
324
        """
325
```
326

327
### Linkification Callbacks
328

329
Callback functions for customizing link attributes during linkification.
330

331
```python { .api }
332
def nofollow(attrs: dict, new: bool = False) -> dict:
333
    """
334
    Add rel="nofollow" to links (except mailto links).
335
    
336
    Parameters:
337
    - attrs: link attributes dictionary
338
    - new: whether this is a new link
339
    
340
    Returns:
341
    Modified attributes dictionary
342
    """
343

344
def target_blank(attrs: dict, new: bool = False) -> dict:
345
    """
346
    Add target="_blank" to links (except mailto links).
347
    
348
    Parameters:
349
    - attrs: link attributes dictionary  
350
    - new: whether this is a new link
351
    
352
    Returns:
353
    Modified attributes dictionary
354
    """
355
```
356

357
### Attribute Filter Factory
358

359
Utility function for creating attribute filter functions from various attribute configurations.
360

361
```python { .api }
362
def attribute_filter_factory(attributes) -> callable:
363
    """
364
    Generate attribute filter function for the given attributes configuration.
365
    
366
    The attributes value can be a callable, dict, or list. This returns a filter
367
    function appropriate to the attributes value.
368
    
369
    Parameters:
370
    - attributes: attribute configuration (callable, dict, or list)
371
    
372
    Returns:
373
    Filter function that takes (tag, attr, value) and returns bool
374
    
375
    Raises:
376
    ValueError: if attributes is not a callable, list, or dict
377
    """
378
```
379

380
### URL and Email Pattern Building
381

382
Functions for creating custom URL and email matching patterns.
383

384
```python { .api }
385
def build_url_re(
386
    tlds: list = TLDS,
387
    protocols = html5lib_shim.allowed_protocols
388
) -> re.Pattern:
389
    """
390
    Build URL regex with custom TLDs and protocols.
391
    
392
    Parameters:
393
    - tlds: list of top-level domains
394
    - protocols: set of allowed protocols
395
    
396
    Returns:
397
    Compiled regex pattern for URL matching
398
    """
399

400
def build_email_re(tlds: list = TLDS) -> re.Pattern:
401
    """
402
    Build email regex with custom TLDs.
403
    
404
    Parameters:
405
    - tlds: list of top-level domains
406
    
407
    Returns:
408
    Compiled regex pattern for email matching
409
    """
410
```
411

412
## Constants
413

414
### Default Sanitization Settings
415

416
```python { .api }
417
# Default allowed HTML tags
418
ALLOWED_TAGS: frozenset = frozenset((
419
    "a", "abbr", "acronym", "b", "blockquote", "code", 
420
    "em", "i", "li", "ol", "strong", "ul"
421
))
422

423
# Default allowed attributes by tag
424
ALLOWED_ATTRIBUTES: dict = {
425
    "a": ["href", "title"],
426
    "abbr": ["title"],
427
    "acronym": ["title"]
428
}
429

430
# Default allowed protocols for links
431
ALLOWED_PROTOCOLS: frozenset = frozenset(("http", "https", "mailto"))
432

433
# Invisible character handling (requires: from itertools import chain)
434
INVISIBLE_CHARACTERS: str = "".join([chr(c) for c in chain(range(0, 9), range(11, 13), range(14, 32))])
435
INVISIBLE_CHARACTERS_RE: re.Pattern = re.compile("[" + INVISIBLE_CHARACTERS + "]", re.UNICODE)
436
INVISIBLE_REPLACEMENT_CHAR: str = "?"
437
```
438

439
### Default Linkification Settings
440

441
```python { .api }
442
# Default linkification callbacks
443
DEFAULT_CALLBACKS: list = [nofollow]
444

445
# Top-level domains for URL detection
446
TLDS: list = [
447
    "ac", "ad", "ae", "aero", "af", "ag", "ai", "al", "am", "an", "ao", "aq", "ar", "arpa", "as", "asia", "at", "au", "aw", "ax", "az",
448
    "ba", "bb", "bd", "be", "bf", "bg", "bh", "bi", "biz", "bj", "bm", "bn", "bo", "br", "bs", "bt", "bv", "bw", "by", "bz",
449
    "ca", "cat", "cc", "cd", "cf", "cg", "ch", "ci", "ck", "cl", "cm", "cn", "co", "com", "coop", "cr", "cu", "cv", "cx", "cy", "cz",
450
    "de", "dj", "dk", "dm", "do", "dz", "ec", "edu", "ee", "eg", "er", "es", "et", "eu", "fi", "fj", "fk", "fm", "fo", "fr",
451
    "ga", "gb", "gd", "ge", "gf", "gg", "gh", "gi", "gl", "gm", "gn", "gov", "gp", "gq", "gr", "gs", "gt", "gu", "gw", "gy",
452
    "hk", "hm", "hn", "hr", "ht", "hu", "id", "ie", "il", "im", "in", "info", "int", "io", "iq", "ir", "is", "it",
453
    "je", "jm", "jo", "jobs", "jp", "ke", "kg", "kh", "ki", "km", "kn", "kp", "kr", "kw", "ky", "kz",
454
    "la", "lb", "lc", "li", "lk", "lr", "ls", "lt", "lu", "lv", "ly", "ma", "mc", "md", "me", "mg", "mh", "mil", "mk", "ml", "mm", "mn", "mo", "mobi", "mp", "mq", "mr", "ms", "mt", "mu", "museum", "mv", "mw", "mx", "my", "mz",
455
    "na", "name", "nc", "ne", "net", "nf", "ng", "ni", "nl", "no", "np", "nr", "nu", "nz", "om", "org",
456
    "pa", "pe", "pf", "pg", "ph", "pk", "pl", "pm", "pn", "post", "pr", "pro", "ps", "pt", "pw", "py",
457
    "qa", "re", "ro", "rs", "ru", "rw", "sa", "sb", "sc", "sd", "se", "sg", "sh", "si", "sj", "sk", "sl", "sm", "sn", "so", "sr", "ss", "st", "su", "sv", "sx", "sy", "sz",
458
    "tc", "td", "tel", "tf", "tg", "th", "tj", "tk", "tl", "tm", "tn", "to", "tp", "tr", "travel", "tt", "tv", "tw", "tz",
459
    "ua", "ug", "uk", "us", "uy", "uz", "va", "vc", "ve", "vg", "vi", "vn", "vu", "wf", "ws", "xn", "xxx", "ye", "yt", "yu", "za", "zm", "zw"
460
]
461

462
# Default URL matching regex
463
URL_RE: re.Pattern = build_url_re()
464

465
# Default email matching regex  
466
EMAIL_RE: re.Pattern = build_email_re()
467

468
# Protocol matching regex for URL detection
469
PROTO_RE: re.Pattern = re.compile(r"^[\w-]+:/{0,3}", re.IGNORECASE)
470
```
471

472
### CSS Sanitization Settings
473

474
```python { .api }
475
# Allowed CSS properties
476
ALLOWED_CSS_PROPERTIES: frozenset = frozenset((
477
    "azimuth", "background-color", "border-bottom-color", "border-collapse",
478
    "border-color", "border-left-color", "border-right-color", "border-top-color",
479
    "clear", "color", "cursor", "direction", "display", "elevation", "float",
480
    "font", "font-family", "font-size", "font-style", "font-variant", "font-weight",
481
    "height", "letter-spacing", "line-height", "overflow", "pause", "pause-after",
482
    "pause-before", "pitch", "pitch-range", "richness", "speak", "speak-header",
483
    "speak-numeral", "speak-punctuation", "speech-rate", "stress", "text-align",
484
    "text-decoration", "text-indent", "unicode-bidi", "vertical-align",
485
    "voice-family", "volume", "white-space", "width"
486
))
487

488
# Allowed SVG properties  
489
ALLOWED_SVG_PROPERTIES: frozenset = frozenset((
490
    "fill", "fill-opacity", "fill-rule", "stroke", "stroke-width",
491
    "stroke-linecap", "stroke-linejoin", "stroke-opacity"
492
))
493
```
494

495
### Package Version Information
496

497
```python { .api }
498
# Package version string
499
__version__: str = "6.2.0"
500

501
# Release date in YYYYMMDD format
502
__releasedate__: str = "20241029"
503
```
504

505
## Warning Classes
506

507
```python { .api }
508
class NoCssSanitizerWarning(UserWarning):
509
    """
510
    Warning raised when CSS sanitization is needed but no CSS sanitizer is configured.
511
    """
512
```
513

514
## Usage Examples
515

516
### Custom Sanitization Rules
517

518
```python
519
import bleach
520
from bleach.sanitizer import Cleaner
521

522
# Custom allowed tags and attributes
523
custom_tags = ['p', 'strong', 'em', 'a', 'img']
524
custom_attributes = {
525
    'a': ['href', 'title'],
526
    'img': ['src', 'alt', 'width', 'height']
527
}
528

529
# Create reusable cleaner
530
cleaner = Cleaner(
531
    tags=custom_tags,
532
    attributes=custom_attributes,
533
    strip=True  # Remove disallowed tags entirely
534
)
535

536
# Clean multiple texts with same rules
537
safe_text1 = cleaner.clean(untrusted_html1)
538
safe_text2 = cleaner.clean(untrusted_html2)
539
```
540

541
### CSS Sanitization
542

543
```python
544
import bleach
545
from bleach.css_sanitizer import CSSSanitizer
546

547
# Create CSS sanitizer
548
css_sanitizer = CSSSanitizer(
549
    allowed_css_properties=bleach.css_sanitizer.ALLOWED_CSS_PROPERTIES
550
)
551

552
# Clean HTML with CSS sanitization
553
html_with_styles = '<p style="color: red; background: javascript:alert();">Text</p>'
554
safe_html = bleach.clean(
555
    html_with_styles,
556
    tags=['p'],
557
    attributes={'p': ['style']}, 
558
    css_sanitizer=css_sanitizer
559
)
560
# Result: '<p style="color: red;">Text</p>'
561
```
562

563
### Custom Linkification
564

565
```python
566
import bleach
567
from bleach.linkifier import Linker
568
from bleach.callbacks import target_blank, nofollow
569

570
# Custom linkifier with multiple callbacks
571
linker = Linker(
572
    callbacks=[nofollow, target_blank],
573
    skip_tags={'pre', 'code'},  # Don't linkify in code blocks
574
    parse_email=True
575
)
576

577
text = 'Email me at user@example.com or visit https://example.org'
578
linked = linker.linkify(text)
579
# Result includes both rel="nofollow" and target="_blank"
580
```
581

582
### Combined Operations
583

584
```python
585
import bleach
586
from bleach.sanitizer import Cleaner
587
from bleach.linkifier import Linker, LinkifyFilter
588

589
# Clean and linkify in single pass using LinkifyFilter
590
cleaner = Cleaner(
591
    tags=['p', 'a', 'strong'],
592
    attributes={'a': ['href', 'rel', 'target']},
593
    filters=[LinkifyFilter()]  # Linkify during cleaning
594
)
595

596
unsafe_text = '<script>alert("xss")</script><p>Visit https://example.com</p>'
597
result = cleaner.clean(unsafe_text)
598
```

Version

Tile

Files

index.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

index.mddocs/