or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

index.md

index.mddocs/

0

# Bleach

1

2

An easy safelist-based HTML-sanitizing tool that escapes or strips markup and attributes from untrusted HTML content. Bleach uses an allowlist approach to remove malicious content while preserving safe, intended HTML elements. It can also safely linkify text, applying more comprehensive filters than Django's urlize filter.

3

4

## Package Information

5

6

- **Package Name**: bleach

7

- **Language**: Python

8

- **Installation**: `pip install bleach`

9

- **Optional Dependencies**: `pip install bleach[css]` (for CSS sanitization with tinycss2)

10

11

## Core Imports

12

13

```python

14

import bleach

15

```

16

17

For main functions:

18

19

```python

20

from bleach import clean, linkify

21

```

22

23

For classes:

24

25

```python

26

from bleach.sanitizer import Cleaner, BleachSanitizerFilter, attribute_filter_factory

27

from bleach.linkifier import Linker, LinkifyFilter

28

from bleach.css_sanitizer import CSSSanitizer

29

```

30

31

For callbacks:

32

33

```python

34

from bleach.callbacks import nofollow, target_blank

35

```

36

37

For constants and utilities:

38

39

```python

40

from bleach.sanitizer import ALLOWED_TAGS, ALLOWED_ATTRIBUTES, ALLOWED_PROTOCOLS

41

from bleach.sanitizer import INVISIBLE_CHARACTERS, INVISIBLE_CHARACTERS_RE, INVISIBLE_REPLACEMENT_CHAR

42

from bleach.linkifier import DEFAULT_CALLBACKS, build_url_re, build_email_re, TLDS, URL_RE, EMAIL_RE, PROTO_RE

43

from bleach.css_sanitizer import ALLOWED_CSS_PROPERTIES, ALLOWED_SVG_PROPERTIES

44

from bleach import html5lib_shim # For HTML_TAGS constant

45

from bleach import __version__, __releasedate__

46

```

47

48

## Basic Usage

49

50

```python

51

import bleach

52

53

# Basic HTML sanitization - removes unsafe tags and attributes

54

unsafe_html = '<script>alert("XSS")</script><p onclick="evil()">Hello <b>world</b></p>'

55

safe_html = bleach.clean(unsafe_html)

56

# Result: '&lt;script&gt;alert("XSS")&lt;/script&gt;<p>Hello <b>world</b></p>'

57

58

# Linkification - converts URLs to clickable links

59

text_with_urls = 'Visit https://example.com for more info!'

60

linked_text = bleach.linkify(text_with_urls)

61

# Result: 'Visit <a href="https://example.com" rel="nofollow">https://example.com</a> for more info!'

62

63

# Combined cleaning and linkifying

64

unsafe_text = 'Check out http://evil.com<script>alert("bad")</script>'

65

safe_linked = bleach.linkify(bleach.clean(unsafe_text))

66

```

67

68

## Capabilities

69

70

### HTML Sanitization

71

72

Cleans HTML fragments by removing or escaping malicious content using an allowlist-based approach.

73

74

```python { .api }

75

def clean(

76

text: str,

77

tags: frozenset = ALLOWED_TAGS,

78

attributes: dict = ALLOWED_ATTRIBUTES,

79

protocols: frozenset = ALLOWED_PROTOCOLS,

80

strip: bool = False,

81

strip_comments: bool = True,

82

css_sanitizer: CSSSanitizer = None

83

) -> str:

84

"""

85

Clean an HTML fragment of malicious content and return it.

86

87

Parameters:

88

- text: the HTML text to clean

89

- tags: set of allowed tags; defaults to ALLOWED_TAGS

90

- attributes: allowed attributes; can be callable, list or dict; defaults to ALLOWED_ATTRIBUTES

91

- protocols: allowed list of protocols for links; defaults to ALLOWED_PROTOCOLS

92

- strip: whether to strip disallowed elements instead of escaping

93

- strip_comments: whether to strip HTML comments

94

- css_sanitizer: instance with sanitize_css method for style attributes

95

96

Returns:

97

Cleaned text as unicode string

98

"""

99

```

100

101

### URL Linkification

102

103

Converts URL-like strings in HTML fragments to clickable links while preserving existing links and structure.

104

105

```python { .api }

106

def linkify(

107

text: str,

108

callbacks: list = DEFAULT_CALLBACKS,

109

skip_tags: set = None,

110

parse_email: bool = False

111

) -> str:

112

"""

113

Convert URL-like strings in an HTML fragment to links.

114

115

Parameters:

116

- text: the text to linkify

117

- callbacks: list of callbacks to run when adjusting tag attributes

118

- skip_tags: set of tags to skip linkifying contents of

119

- parse_email: whether to linkify email addresses

120

121

Returns:

122

Linkified text as unicode string

123

"""

124

```

125

126

### Advanced HTML Cleaning

127

128

Configurable HTML cleaner for repeated use with consistent settings.

129

130

```python { .api }

131

class Cleaner:

132

"""

133

Cleaner for cleaning HTML fragments of malicious content.

134

Not thread-safe - create separate instances per thread.

135

"""

136

137

def __init__(

138

self,

139

tags: frozenset = ALLOWED_TAGS,

140

attributes: dict = ALLOWED_ATTRIBUTES,

141

protocols: frozenset = ALLOWED_PROTOCOLS,

142

strip: bool = False,

143

strip_comments: bool = True,

144

filters: list = None,

145

css_sanitizer: CSSSanitizer = None

146

):

147

"""

148

Initialize a Cleaner instance.

149

150

Parameters:

151

- tags: set of allowed tags

152

- attributes: allowed attributes configuration

153

- protocols: allowed protocols for links

154

- strip: whether to strip disallowed elements

155

- strip_comments: whether to strip HTML comments

156

- filters: list of additional html5lib filters

157

- css_sanitizer: CSS sanitizer instance

158

"""

159

160

def clean(self, text: str) -> str:

161

"""

162

Clean the specified HTML text.

163

164

Parameters:

165

- text: HTML text to clean

166

167

Returns:

168

Cleaned HTML text

169

"""

170

```

171

172

### Advanced URL Linkification

173

174

Configurable URL linkifier for repeated use with consistent settings.

175

176

```python { .api }

177

class Linker:

178

"""

179

Convert URL-like strings in HTML fragments to links with configuration.

180

"""

181

182

def __init__(

183

self,

184

callbacks: list = DEFAULT_CALLBACKS,

185

skip_tags: set = None,

186

parse_email: bool = False,

187

url_re = URL_RE,

188

email_re = EMAIL_RE,

189

recognized_tags = html5lib_shim.HTML_TAGS

190

):

191

"""

192

Create a Linker instance.

193

194

Parameters:

195

- callbacks: list of callbacks for adjusting tag attributes

196

- skip_tags: set of tags to skip linkifying contents of

197

- parse_email: whether to linkify email addresses

198

- url_re: custom URL matching regex

199

- email_re: custom email matching regex

200

- recognized_tags: set of recognized HTML tags

201

"""

202

203

def linkify(self, text: str) -> str:

204

"""

205

Linkify the specified text.

206

207

Parameters:

208

- text: text to linkify

209

210

Returns:

211

Linkified text

212

213

Raises:

214

TypeError: if text is not a string type

215

"""

216

```

217

218

### Advanced Linkification Filter

219

220

HTML filter for linkifying during html5lib parsing, commonly used with Cleaner filters.

221

222

```python { .api }

223

class LinkifyFilter(html5lib_shim.Filter):

224

"""

225

HTML filter that linkifies text during html5lib parsing.

226

Can be used with Cleaner filters for combined cleaning and linkification.

227

"""

228

229

def __init__(

230

self,

231

source,

232

callbacks: list = DEFAULT_CALLBACKS,

233

skip_tags: set = None,

234

parse_email: bool = False,

235

url_re = URL_RE,

236

email_re = EMAIL_RE

237

):

238

"""

239

Create a LinkifyFilter instance.

240

241

Parameters:

242

- source: html5lib TreeWalker stream

243

- callbacks: list of callbacks for adjusting tag attributes

244

- skip_tags: set of tags to skip linkifying contents of

245

- parse_email: whether to linkify email addresses

246

- url_re: custom URL matching regex

247

- email_re: custom email matching regex

248

"""

249

```

250

251

### HTML Sanitization Filter

252

253

HTML filter for sanitizing content during html5lib parsing, commonly used with other filters.

254

255

```python { .api }

256

class BleachSanitizerFilter(html5lib_shim.SanitizerFilter):

257

"""

258

HTML filter that sanitizes HTML during html5lib parsing.

259

Can be used with other html5lib filters for custom processing.

260

"""

261

262

def __init__(

263

self,

264

source,

265

allowed_tags: frozenset = ALLOWED_TAGS,

266

attributes = ALLOWED_ATTRIBUTES,

267

allowed_protocols: frozenset = ALLOWED_PROTOCOLS,

268

attr_val_is_uri = html5lib_shim.attr_val_is_uri,

269

svg_attr_val_allows_ref = html5lib_shim.svg_attr_val_allows_ref,

270

svg_allow_local_href = html5lib_shim.svg_allow_local_href,

271

strip_disallowed_tags: bool = False,

272

strip_html_comments: bool = True,

273

css_sanitizer: CSSSanitizer = None

274

):

275

"""

276

Create a BleachSanitizerFilter instance.

277

278

Parameters:

279

- source: html5lib TreeWalker stream

280

- allowed_tags: set of allowed tags

281

- attributes: allowed attributes configuration

282

- allowed_protocols: allowed protocols for links

283

- attr_val_is_uri: set of attributes that have URI values

284

- svg_attr_val_allows_ref: set of SVG attributes that can have references

285

- svg_allow_local_href: set of SVG elements that can have local hrefs

286

- strip_disallowed_tags: whether to strip disallowed tags

287

- strip_html_comments: whether to strip HTML comments

288

- css_sanitizer: CSS sanitizer instance

289

"""

290

```

291

292

### CSS Sanitization

293

294

Sanitizes CSS declarations in style attributes and style elements.

295

296

```python { .api }

297

class CSSSanitizer:

298

"""

299

CSS sanitizer for cleaning style attributes and style text.

300

"""

301

302

def __init__(

303

self,

304

allowed_css_properties: frozenset = ALLOWED_CSS_PROPERTIES,

305

allowed_svg_properties: frozenset = ALLOWED_SVG_PROPERTIES

306

):

307

"""

308

Initialize CSS sanitizer.

309

310

Parameters:

311

- allowed_css_properties: set of allowed CSS properties

312

- allowed_svg_properties: set of allowed SVG properties

313

"""

314

315

def sanitize_css(self, style: str) -> str:

316

"""

317

Sanitize CSS declarations.

318

319

Parameters:

320

- style: CSS declarations string

321

322

Returns:

323

Sanitized CSS string

324

"""

325

```

326

327

### Linkification Callbacks

328

329

Callback functions for customizing link attributes during linkification.

330

331

```python { .api }

332

def nofollow(attrs: dict, new: bool = False) -> dict:

333

"""

334

Add rel="nofollow" to links (except mailto links).

335

336

Parameters:

337

- attrs: link attributes dictionary

338

- new: whether this is a new link

339

340

Returns:

341

Modified attributes dictionary

342

"""

343

344

def target_blank(attrs: dict, new: bool = False) -> dict:

345

"""

346

Add target="_blank" to links (except mailto links).

347

348

Parameters:

349

- attrs: link attributes dictionary

350

- new: whether this is a new link

351

352

Returns:

353

Modified attributes dictionary

354

"""

355

```

356

357

### Attribute Filter Factory

358

359

Utility function for creating attribute filter functions from various attribute configurations.

360

361

```python { .api }

362

def attribute_filter_factory(attributes) -> callable:

363

"""

364

Generate attribute filter function for the given attributes configuration.

365

366

The attributes value can be a callable, dict, or list. This returns a filter

367

function appropriate to the attributes value.

368

369

Parameters:

370

- attributes: attribute configuration (callable, dict, or list)

371

372

Returns:

373

Filter function that takes (tag, attr, value) and returns bool

374

375

Raises:

376

ValueError: if attributes is not a callable, list, or dict

377

"""

378

```

379

380

### URL and Email Pattern Building

381

382

Functions for creating custom URL and email matching patterns.

383

384

```python { .api }

385

def build_url_re(

386

tlds: list = TLDS,

387

protocols = html5lib_shim.allowed_protocols

388

) -> re.Pattern:

389

"""

390

Build URL regex with custom TLDs and protocols.

391

392

Parameters:

393

- tlds: list of top-level domains

394

- protocols: set of allowed protocols

395

396

Returns:

397

Compiled regex pattern for URL matching

398

"""

399

400

def build_email_re(tlds: list = TLDS) -> re.Pattern:

401

"""

402

Build email regex with custom TLDs.

403

404

Parameters:

405

- tlds: list of top-level domains

406

407

Returns:

408

Compiled regex pattern for email matching

409

"""

410

```

411

412

## Constants

413

414

### Default Sanitization Settings

415

416

```python { .api }

417

# Default allowed HTML tags

418

ALLOWED_TAGS: frozenset = frozenset((

419

"a", "abbr", "acronym", "b", "blockquote", "code",

420

"em", "i", "li", "ol", "strong", "ul"

421

))

422

423

# Default allowed attributes by tag

424

ALLOWED_ATTRIBUTES: dict = {

425

"a": ["href", "title"],

426

"abbr": ["title"],

427

"acronym": ["title"]

428

}

429

430

# Default allowed protocols for links

431

ALLOWED_PROTOCOLS: frozenset = frozenset(("http", "https", "mailto"))

432

433

# Invisible character handling (requires: from itertools import chain)

434

INVISIBLE_CHARACTERS: str = "".join([chr(c) for c in chain(range(0, 9), range(11, 13), range(14, 32))])

435

INVISIBLE_CHARACTERS_RE: re.Pattern = re.compile("[" + INVISIBLE_CHARACTERS + "]", re.UNICODE)

436

INVISIBLE_REPLACEMENT_CHAR: str = "?"

437

```

438

439

### Default Linkification Settings

440

441

```python { .api }

442

# Default linkification callbacks

443

DEFAULT_CALLBACKS: list = [nofollow]

444

445

# Top-level domains for URL detection

446

TLDS: list = [

447

"ac", "ad", "ae", "aero", "af", "ag", "ai", "al", "am", "an", "ao", "aq", "ar", "arpa", "as", "asia", "at", "au", "aw", "ax", "az",

448

"ba", "bb", "bd", "be", "bf", "bg", "bh", "bi", "biz", "bj", "bm", "bn", "bo", "br", "bs", "bt", "bv", "bw", "by", "bz",

449

"ca", "cat", "cc", "cd", "cf", "cg", "ch", "ci", "ck", "cl", "cm", "cn", "co", "com", "coop", "cr", "cu", "cv", "cx", "cy", "cz",

450

"de", "dj", "dk", "dm", "do", "dz", "ec", "edu", "ee", "eg", "er", "es", "et", "eu", "fi", "fj", "fk", "fm", "fo", "fr",

451

"ga", "gb", "gd", "ge", "gf", "gg", "gh", "gi", "gl", "gm", "gn", "gov", "gp", "gq", "gr", "gs", "gt", "gu", "gw", "gy",

452

"hk", "hm", "hn", "hr", "ht", "hu", "id", "ie", "il", "im", "in", "info", "int", "io", "iq", "ir", "is", "it",

453

"je", "jm", "jo", "jobs", "jp", "ke", "kg", "kh", "ki", "km", "kn", "kp", "kr", "kw", "ky", "kz",

454

"la", "lb", "lc", "li", "lk", "lr", "ls", "lt", "lu", "lv", "ly", "ma", "mc", "md", "me", "mg", "mh", "mil", "mk", "ml", "mm", "mn", "mo", "mobi", "mp", "mq", "mr", "ms", "mt", "mu", "museum", "mv", "mw", "mx", "my", "mz",

455

"na", "name", "nc", "ne", "net", "nf", "ng", "ni", "nl", "no", "np", "nr", "nu", "nz", "om", "org",

456

"pa", "pe", "pf", "pg", "ph", "pk", "pl", "pm", "pn", "post", "pr", "pro", "ps", "pt", "pw", "py",

457

"qa", "re", "ro", "rs", "ru", "rw", "sa", "sb", "sc", "sd", "se", "sg", "sh", "si", "sj", "sk", "sl", "sm", "sn", "so", "sr", "ss", "st", "su", "sv", "sx", "sy", "sz",

458

"tc", "td", "tel", "tf", "tg", "th", "tj", "tk", "tl", "tm", "tn", "to", "tp", "tr", "travel", "tt", "tv", "tw", "tz",

459

"ua", "ug", "uk", "us", "uy", "uz", "va", "vc", "ve", "vg", "vi", "vn", "vu", "wf", "ws", "xn", "xxx", "ye", "yt", "yu", "za", "zm", "zw"

460

]

461

462

# Default URL matching regex

463

URL_RE: re.Pattern = build_url_re()

464

465

# Default email matching regex

466

EMAIL_RE: re.Pattern = build_email_re()

467

468

# Protocol matching regex for URL detection

469

PROTO_RE: re.Pattern = re.compile(r"^[\w-]+:/{0,3}", re.IGNORECASE)

470

```

471

472

### CSS Sanitization Settings

473

474

```python { .api }

475

# Allowed CSS properties

476

ALLOWED_CSS_PROPERTIES: frozenset = frozenset((

477

"azimuth", "background-color", "border-bottom-color", "border-collapse",

478

"border-color", "border-left-color", "border-right-color", "border-top-color",

479

"clear", "color", "cursor", "direction", "display", "elevation", "float",

480

"font", "font-family", "font-size", "font-style", "font-variant", "font-weight",

481

"height", "letter-spacing", "line-height", "overflow", "pause", "pause-after",

482

"pause-before", "pitch", "pitch-range", "richness", "speak", "speak-header",

483

"speak-numeral", "speak-punctuation", "speech-rate", "stress", "text-align",

484

"text-decoration", "text-indent", "unicode-bidi", "vertical-align",

485

"voice-family", "volume", "white-space", "width"

486

))

487

488

# Allowed SVG properties

489

ALLOWED_SVG_PROPERTIES: frozenset = frozenset((

490

"fill", "fill-opacity", "fill-rule", "stroke", "stroke-width",

491

"stroke-linecap", "stroke-linejoin", "stroke-opacity"

492

))

493

```

494

495

### Package Version Information

496

497

```python { .api }

498

# Package version string

499

__version__: str = "6.2.0"

500

501

# Release date in YYYYMMDD format

502

__releasedate__: str = "20241029"

503

```

504

505

## Warning Classes

506

507

```python { .api }

508

class NoCssSanitizerWarning(UserWarning):

509

"""

510

Warning raised when CSS sanitization is needed but no CSS sanitizer is configured.

511

"""

512

```

513

514

## Usage Examples

515

516

### Custom Sanitization Rules

517

518

```python

519

import bleach

520

from bleach.sanitizer import Cleaner

521

522

# Custom allowed tags and attributes

523

custom_tags = ['p', 'strong', 'em', 'a', 'img']

524

custom_attributes = {

525

'a': ['href', 'title'],

526

'img': ['src', 'alt', 'width', 'height']

527

}

528

529

# Create reusable cleaner

530

cleaner = Cleaner(

531

tags=custom_tags,

532

attributes=custom_attributes,

533

strip=True # Remove disallowed tags entirely

534

)

535

536

# Clean multiple texts with same rules

537

safe_text1 = cleaner.clean(untrusted_html1)

538

safe_text2 = cleaner.clean(untrusted_html2)

539

```

540

541

### CSS Sanitization

542

543

```python

544

import bleach

545

from bleach.css_sanitizer import CSSSanitizer

546

547

# Create CSS sanitizer

548

css_sanitizer = CSSSanitizer(

549

allowed_css_properties=bleach.css_sanitizer.ALLOWED_CSS_PROPERTIES

550

)

551

552

# Clean HTML with CSS sanitization

553

html_with_styles = '<p style="color: red; background: javascript:alert();">Text</p>'

554

safe_html = bleach.clean(

555

html_with_styles,

556

tags=['p'],

557

attributes={'p': ['style']},

558

css_sanitizer=css_sanitizer

559

)

560

# Result: '<p style="color: red;">Text</p>'

561

```

562

563

### Custom Linkification

564

565

```python

566

import bleach

567

from bleach.linkifier import Linker

568

from bleach.callbacks import target_blank, nofollow

569

570

# Custom linkifier with multiple callbacks

571

linker = Linker(

572

callbacks=[nofollow, target_blank],

573

skip_tags={'pre', 'code'}, # Don't linkify in code blocks

574

parse_email=True

575

)

576

577

text = 'Email me at user@example.com or visit https://example.org'

578

linked = linker.linkify(text)

579

# Result includes both rel="nofollow" and target="_blank"

580

```

581

582

### Combined Operations

583

584

```python

585

import bleach

586

from bleach.sanitizer import Cleaner

587

from bleach.linkifier import Linker, LinkifyFilter

588

589

# Clean and linkify in single pass using LinkifyFilter

590

cleaner = Cleaner(

591

tags=['p', 'a', 'strong'],

592

attributes={'a': ['href', 'rel', 'target']},

593

filters=[LinkifyFilter()] # Linkify during cleaning

594

)

595

596

unsafe_text = '<script>alert("xss")</script><p>Visit https://example.com</p>'

597

result = cleaner.clean(unsafe_text)

598

```