or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/pypi-nh3

High-performance HTML sanitization library with Python bindings to Rust ammonia crate

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/nh3@0.3.x

To install, run

npx @tessl/cli install tessl/pypi-nh3@0.3.0

0

# NH3

1

2

High-performance HTML sanitization library providing Python bindings to the Rust ammonia crate. NH3 delivers fast, secure HTML cleaning with comprehensive configuration options, approximately 20x faster than alternatives like bleach while maintaining security and flexibility.

3

4

## Package Information

5

6

- **Package Name**: nh3

7

- **Language**: Python

8

- **Backend**: Rust (via PyO3/maturin)

9

- **Installation**: `pip install nh3`

10

11

## Core Imports

12

13

```python

14

import nh3

15

```

16

17

All functionality is available at the module level. For type hints:

18

19

```python

20

from typing import Callable, Dict, Optional, Set

21

```

22

23

## Basic Usage

24

25

```python

26

import nh3

27

28

# Basic HTML sanitization

29

html = '<script>alert("xss")</script><p>Safe <b>content</b></p>'

30

clean_html = nh3.clean(html)

31

print(clean_html) # Output: '<p>Safe <b>content</b></p>'

32

33

# Text escaping

34

text = 'User input with <dangerous> characters & symbols'

35

escaped = nh3.clean_text(text)

36

print(escaped) # Output: 'User input with &lt;dangerous&gt; characters &amp; symbols'

37

38

# Check if string contains HTML

39

has_html = nh3.is_html('<p>HTML content</p>') # True

40

has_no_html = nh3.is_html('Plain text') # False

41

42

# Using a reusable cleaner with custom configuration

43

cleaner = nh3.Cleaner(

44

tags={'p', 'b', 'i', 'strong', 'em'},

45

attributes={'*': {'class', 'id'}},

46

strip_comments=True

47

)

48

result = cleaner.clean('<p class="text">Safe <script>evil()</script> content</p>')

49

print(result) # Output: '<p class="text">Safe content</p>'

50

```

51

52

## Capabilities

53

54

### HTML Sanitization

55

56

Primary function for cleaning HTML content with extensive configuration options for allowed tags, attributes, URL schemes, and content filtering.

57

58

```python { .api }

59

def clean(

60

html: str,

61

tags: Optional[Set[str]] = None,

62

clean_content_tags: Optional[Set[str]] = None,

63

attributes: Optional[Dict[str, Set[str]]] = None,

64

attribute_filter: Optional[Callable[[str, str, str], Optional[str]]] = None,

65

strip_comments: bool = True,

66

link_rel: Optional[str] = "noopener noreferrer",

67

generic_attribute_prefixes: Optional[Set[str]] = None,

68

tag_attribute_values: Optional[Dict[str, Dict[str, Set[str]]]] = None,

69

set_tag_attribute_values: Optional[Dict[str, Dict[str, str]]] = None,

70

url_schemes: Optional[Set[str]] = None,

71

allowed_classes: Optional[Dict[str, Set[str]]] = None,

72

filter_style_properties: Optional[Set[str]] = None

73

) -> str:

74

"""

75

Sanitize an HTML fragment according to the given options.

76

77

Parameters:

78

- html: Input HTML fragment to sanitize

79

- tags: Set of allowed HTML tags (defaults to ALLOWED_TAGS)

80

- clean_content_tags: Tags whose contents are completely removed

81

- attributes: Allowed attributes per tag ('*' key for any tag)

82

- attribute_filter: Callback for custom attribute processing

83

- strip_comments: Whether to remove HTML comments

84

- link_rel: Rel attribute value added to links

85

- generic_attribute_prefixes: Attribute prefixes allowed on any tag

86

- tag_attribute_values: Allowed attribute values per tag

87

- set_tag_attribute_values: Required attribute values per tag

88

- url_schemes: Permitted URL schemes for href/src attributes

89

- allowed_classes: Allowed CSS classes per tag

90

- filter_style_properties: Allowed CSS properties in style attributes

91

92

Returns:

93

Sanitized HTML fragment as string

94

"""

95

```

96

97

**Usage Examples:**

98

99

```python

100

# Allow only specific tags

101

nh3.clean('<div><p>Text</p><script>evil()</script></div>', tags={'p'})

102

# Result: '<p>Text</p>'

103

104

# Remove script/style content completely

105

nh3.clean('<style>body{}</style><p>Text</p>', clean_content_tags={'style'})

106

# Result: '<p>Text</p>'

107

108

# Custom attribute filtering

109

def filter_classes(tag, attr, value):

110

if tag == 'div' and attr == 'class':

111

allowed = {'container', 'wrapper'}

112

classes = set(value.split())

113

filtered = classes.intersection(allowed)

114

return ' '.join(filtered) if filtered else None

115

return value

116

117

nh3.clean('<div class="container evil">text</div>',

118

attributes={'div': {'class'}},

119

attribute_filter=filter_classes)

120

# Result: '<div class="container">text</div>'

121

122

# Allow data attributes with prefixes

123

nh3.clean('<div data-id="123" onclick="evil()">text</div>',

124

generic_attribute_prefixes={'data-'})

125

# Result: '<div data-id="123">text</div>'

126

127

# Control URL schemes

128

nh3.clean('<a href="javascript:alert()">link</a>', url_schemes={'https', 'http'})

129

# Result: '<a>link</a>'

130

131

# Filter CSS properties

132

nh3.clean('<p style="color:red;display:none">text</p>',

133

attributes={'p': {'style'}},

134

filter_style_properties={'color'})

135

# Result: '<p style="color:red">text</p>'

136

```

137

138

### Text Escaping

139

140

Converts arbitrary strings to HTML-safe text by escaping special characters, equivalent to html.escape() but with more aggressive escaping for maximum security.

141

142

```python { .api }

143

def clean_text(html: str) -> str:

144

"""

145

Turn an arbitrary string into unformatted HTML by escaping special characters.

146

147

Parameters:

148

- html: Input string to escape

149

150

Returns:

151

HTML-escaped string safe for display in HTML context

152

"""

153

```

154

155

**Usage Examples:**

156

157

```python

158

# Basic text escaping

159

nh3.clean_text('Price: $5 & up')

160

# Result: 'Price:&#32;$5&#32;&amp;&#32;up'

161

162

# JavaScript injection prevention

163

nh3.clean_text('"); alert("xss");//')

164

# Result: '&quot;);&#32;alert(&quot;xss&quot;);&#47;&#47;'

165

166

# HTML tag neutralization

167

nh3.clean_text('<script>alert("hello")</script>')

168

# Result: '&lt;script&gt;alert(&quot;hello&quot;)&lt;&#47;script&gt;'

169

```

170

171

### HTML Detection

172

173

Determines whether a string contains HTML syntax through full parsing, useful for conditional processing of user input.

174

175

```python { .api }

176

def is_html(html: str) -> bool:

177

"""

178

Determine if a given string contains HTML syntax.

179

180

Parameters:

181

- html: Input string to analyze

182

183

Returns:

184

True if string contains HTML syntax (including invalid HTML), False otherwise

185

"""

186

```

187

188

**Usage Examples:**

189

190

```python

191

# Valid HTML detection

192

nh3.is_html('<p>Hello world</p>') # True

193

nh3.is_html('<br>') # True

194

195

# Invalid HTML still detected

196

nh3.is_html('<invalid-tag>') # True

197

nh3.is_html('Vec::<u8>::new()') # True (angle brackets detected)

198

199

# Plain text

200

nh3.is_html('Hello world') # False

201

nh3.is_html('Price: $5 & up') # False

202

```

203

204

### Reusable Cleaner

205

206

Class-based interface for creating configured sanitizers that can be reused multiple times, providing better performance for repeated sanitization with the same settings.

207

208

```python { .api }

209

class Cleaner:

210

def __init__(

211

self,

212

tags: Optional[Set[str]] = None,

213

clean_content_tags: Optional[Set[str]] = None,

214

attributes: Optional[Dict[str, Set[str]]] = None,

215

attribute_filter: Optional[Callable[[str, str, str], Optional[str]]] = None,

216

strip_comments: bool = True,

217

link_rel: Optional[str] = "noopener noreferrer",

218

generic_attribute_prefixes: Optional[Set[str]] = None,

219

tag_attribute_values: Optional[Dict[str, Dict[str, Set[str]]]] = None,

220

set_tag_attribute_values: Optional[Dict[str, Dict[str, str]]] = None,

221

url_schemes: Optional[Set[str]] = None,

222

allowed_classes: Optional[Dict[str, Set[str]]] = None,

223

filter_style_properties: Optional[Set[str]] = None

224

) -> None:

225

"""

226

Create a reusable sanitizer with the given configuration.

227

228

Parameters: Same as clean() function parameters

229

"""

230

231

def clean(self, html: str) -> str:

232

"""

233

Sanitize HTML using the configured options.

234

235

Parameters:

236

- html: Input HTML fragment to sanitize

237

238

Returns:

239

Sanitized HTML fragment as string

240

"""

241

```

242

243

**Usage Examples:**

244

245

```python

246

# Create a cleaner for blog content

247

blog_cleaner = nh3.Cleaner(

248

tags={'p', 'br', 'strong', 'em', 'a', 'ul', 'ol', 'li'},

249

attributes={

250

'a': {'href', 'title'},

251

'*': {'class'}

252

},

253

allowed_classes={

254

'p': {'highlight', 'quote'},

255

'a': {'external-link'}

256

},

257

url_schemes={'http', 'https', 'mailto'}

258

)

259

260

# Reuse the cleaner for multiple inputs

261

user_content1 = blog_cleaner.clean('<p class="highlight">Safe content</p>')

262

user_content2 = blog_cleaner.clean('<script>evil()</script><p>More content</p>')

263

264

# Create a strict cleaner for user comments

265

comment_cleaner = nh3.Cleaner(

266

tags={'p', 'br'},

267

attributes={},

268

strip_comments=True,

269

link_rel=None

270

)

271

272

safe_comment = comment_cleaner.clean('<p>User comment with <a>no links</a></p>')

273

# Result: '<p>User comment with no links</p>'

274

```

275

276

## Default Constants

277

278

Pre-configured sets of allowed tags, attributes, and URL schemes based on secure defaults from the ammonia library.

279

280

```python { .api }

281

ALLOWED_TAGS: Set[str]

282

# Default set of allowed HTML tags including: a, abbr, acronym, area, article, aside,

283

# b, bdi, bdo, blockquote, br, button, caption, center, cite, code, col, colgroup,

284

# data, datalist, dd, del, details, dfn, div, dl, dt, em, fieldset, figcaption,

285

# figure, footer, form, h1, h2, h3, h4, h5, h6, header, hgroup, hr, i, img, input,

286

# ins, kbd, keygen, label, legend, li, main, map, mark, meter, nav, ol, optgroup,

287

# option, output, p, pre, progress, q, rp, rt, ruby, s, samp, section, select,

288

# small, span, strong, sub, summary, sup, table, tbody, td, textarea, tfoot, th,

289

# thead, time, tr, u, ul, var, wbr

290

291

ALLOWED_ATTRIBUTES: Dict[str, Set[str]]

292

# Default mapping of allowed attributes per tag, includes common safe attributes

293

# like href for links, src for images, type for inputs, etc.

294

295

ALLOWED_URL_SCHEMES: Set[str]

296

# Default set of allowed URL schemes: http, https, mailto

297

```

298

299

**Usage Examples:**

300

301

```python

302

# Inspect default allowed tags

303

print('p' in nh3.ALLOWED_TAGS) # True

304

print('script' in nh3.ALLOWED_TAGS) # False

305

306

# Extend default attributes

307

from copy import deepcopy

308

custom_attributes = deepcopy(nh3.ALLOWED_ATTRIBUTES)

309

custom_attributes['div'].add('data-id')

310

custom_attributes['*'] = {'class', 'id'}

311

312

# Use extended configuration

313

result = nh3.clean('<div class="box" data-id="123">content</div>',

314

attributes=custom_attributes)

315

316

# Remove tags using set operations

317

restricted_tags = nh3.ALLOWED_TAGS - {'b', 'i'}

318

nh3.clean('<b><i>text</i></b><p>paragraph</p>', tags=restricted_tags)

319

# Result: 'text<p>paragraph</p>'

320

321

# Remove URL schemes using set operations

322

safe_schemes = nh3.ALLOWED_URL_SCHEMES - {'tel'}

323

nh3.clean('<a href="tel:+1">Call</a> or <a href="mailto:me">email</a>',

324

url_schemes=safe_schemes)

325

# Result: '<a rel="noopener noreferrer">Call</a> or <a href="mailto:me" rel="noopener noreferrer">email</a>'

326

327

# Check default URL schemes

328

print('https' in nh3.ALLOWED_URL_SCHEMES) # True

329

print('javascript' in nh3.ALLOWED_URL_SCHEMES) # False

330

```

331

332

## Advanced Configuration

333

334

### Attribute Filtering

335

336

The `attribute_filter` parameter accepts a callable that receives three string parameters (tag, attribute, value) and can return a modified value or None to remove the attribute entirely.

337

338

```python

339

def smart_class_filter(tag, attr, value):

340

"""Example: Only allow specific CSS classes"""

341

if attr == 'class':

342

allowed_classes = {

343

'p': {'intro', 'highlight', 'quote'},

344

'div': {'container', 'wrapper', 'sidebar'},

345

'a': {'external', 'internal'}

346

}

347

if tag in allowed_classes:

348

classes = set(value.split())

349

filtered = classes.intersection(allowed_classes[tag])

350

return ' '.join(sorted(filtered)) if filtered else None

351

return value

352

353

# Apply the filter

354

result = nh3.clean(

355

'<div class="container evil"><p class="intro spam">Text</p></div>',

356

attributes={'div': {'class'}, 'p': {'class'}},

357

attribute_filter=smart_class_filter

358

)

359

# Result: '<div class="container"><p class="intro">Text</p></div>'

360

```

361

362

### Tag Attribute Values

363

364

Control which specific values are allowed for attributes on specific tags.

365

366

```python

367

# Only allow specific form input types

368

result = nh3.clean(

369

'<input type="text"><input type="password"><input type="file">',

370

tags={'input'},

371

tag_attribute_values={

372

'input': {

373

'type': {'text', 'email', 'password', 'number'}

374

}

375

}

376

)

377

# Result: '<input type="text"><input type="password"><input>'

378

```

379

380

### Set Tag Attribute Values

381

382

Automatically add or override attribute values on specific tags.

383

384

```python

385

# Always add target="_blank" to external links

386

result = nh3.clean(

387

'<a href="https://example.com">Link</a>',

388

tags={'a'},

389

attributes={'a': {'href', 'target'}},

390

set_tag_attribute_values={

391

'a': {'target': '_blank'}

392

},

393

link_rel='noopener noreferrer'

394

)

395

# Result: '<a href="https://example.com" target="_blank" rel="noopener noreferrer">Link</a>'

396

```

397

398

## Error Handling

399

400

NH3 follows Python conventions for error handling:

401

402

- **Invalid attribute_filter**: Raises `TypeError` if the provided callback is not callable. Exceptions raised within the callback are handled as unraisable exceptions and logged, allowing processing to continue

403

- **Malformed HTML**: Processed with best-effort parsing, invalid elements are removed

404

- **Invalid CSS**: When style filtering is enabled, invalid declarations and @rules are removed, leaving only syntactically valid CSS declarations that are normalized (e.g., whitespace standardization)

405

- **Thread Safety**: All operations are thread-safe and release the GIL during processing

406

407

## Performance Characteristics

408

409

- **Speed**: Approximately 20x faster than bleach for typical HTML sanitization

410

- **Memory**: Efficient streaming processing with minimal memory overhead

411

- **Threading**: Thread-safe operations with GIL release during Rust processing

412

- **Scalability**: Suitable for high-throughput applications and large HTML documents

413

414

## Module Attributes

415

416

```python { .api }

417

__version__: str

418

# Package version string (e.g., "0.3.0")

419

```