or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

index.md

index.mddocs/

0

# Soupsieve

1

2

A modern CSS selector implementation for Beautiful Soup 4, providing comprehensive CSS selector support from CSS Level 1 through CSS Level 4 drafts. Soupsieve serves as the default selector engine for Beautiful Soup 4.7.0+ and can be used independently for sophisticated CSS-based element selection from HTML/XML documents.

3

4

## Package Information

5

6

- **Package Name**: soupsieve

7

- **Language**: Python

8

- **Installation**: `pip install soupsieve`

9

10

## Core Imports

11

12

```python

13

import soupsieve

14

```

15

16

Alternative import for shorter syntax:

17

18

```python

19

import soupsieve as sv

20

```

21

22

Specific functions and classes can be imported directly:

23

24

```python

25

from soupsieve import compile, select, match, SoupSieve, SelectorSyntaxError

26

```

27

28

## Basic Usage

29

30

```python

31

import soupsieve as sv

32

from bs4 import BeautifulSoup

33

34

# Create a soup object from HTML

35

html = """

36

<div class="container">

37

<p id="intro">Introduction paragraph</p>

38

<div class="content">

39

<p class="highlight">Important content</p>

40

<span>Additional info</span>

41

</div>

42

</div>

43

"""

44

soup = BeautifulSoup(html, 'html.parser')

45

46

# Basic selection - find all paragraphs

47

paragraphs = sv.select('p', soup)

48

print(f"Found {len(paragraphs)} paragraphs")

49

50

# Select with class

51

highlighted = sv.select('.highlight', soup)

52

if highlighted:

53

print(f"Highlighted text: {highlighted[0].get_text()}")

54

55

# Select first match only

56

first_p = sv.select_one('p', soup)

57

print(f"First paragraph: {first_p.get_text()}")

58

59

# Test if element matches selector

60

intro = soup.find(id='intro')

61

if sv.match('#intro', intro):

62

print("Element matches #intro selector")

63

64

# Compiled selectors for reuse

65

compiled = sv.compile('div.content > *')

66

children = compiled.select(soup)

67

print(f"Found {len(children)} direct children of .content")

68

```

69

70

## Architecture

71

72

Soupsieve's architecture centers around CSS parsing and matching:

73

74

- **Parser**: Converts CSS selector strings into structured selector objects

75

- **Matcher**: Evaluates selectors against Beautiful Soup elements using tree traversal

76

- **Compiler**: Provides caching and reusable compiled selector objects

77

- **Types**: Immutable data structures representing selector components

78

79

The library automatically handles HTML vs XML differences and provides namespace support for XML documents.

80

81

## Capabilities

82

83

### CSS Selector Functions

84

85

Core functions for selecting elements using CSS selectors. These provide the primary interface for CSS-based element selection.

86

87

```python { .api }

88

def select(select, tag, namespaces=None, limit=0, flags=0, **kwargs):

89

"""

90

Select all matching elements under the specified tag.

91

92

Parameters:

93

- select: str, CSS selector string

94

- tag: BeautifulSoup Tag or document to search within

95

- namespaces: dict, optional namespace mappings for XML

96

- limit: int, maximum results to return (0 = unlimited)

97

- flags: int, selection flags for advanced options

98

- **kwargs: additional options including 'custom' selectors

99

100

Returns:

101

List of matching BeautifulSoup Tag objects

102

"""

103

104

def select_one(select, tag, namespaces=None, flags=0, **kwargs):

105

"""

106

Select the first matching element.

107

108

Parameters:

109

- select: str, CSS selector string

110

- tag: BeautifulSoup Tag or document to search within

111

- namespaces: dict, optional namespace mappings for XML

112

- flags: int, selection flags for advanced options

113

- **kwargs: additional options including 'custom' selectors

114

115

Returns:

116

First matching BeautifulSoup Tag object or None

117

"""

118

119

def iselect(select, tag, namespaces=None, limit=0, flags=0, **kwargs):

120

"""

121

Iterate over matching elements (generator).

122

123

Parameters:

124

- select: str, CSS selector string

125

- tag: BeautifulSoup Tag or document to search within

126

- namespaces: dict, optional namespace mappings for XML

127

- limit: int, maximum results to yield (0 = unlimited)

128

- flags: int, selection flags for advanced options

129

- **kwargs: additional options including 'custom' selectors

130

131

Yields:

132

BeautifulSoup Tag objects that match the selector

133

"""

134

```

135

136

### Element Matching and Filtering

137

138

Functions for testing individual elements and filtering collections.

139

140

```python { .api }

141

def match(select, tag, namespaces=None, flags=0, **kwargs):

142

"""

143

Test if a tag matches the CSS selector.

144

145

Parameters:

146

- select: str, CSS selector string

147

- tag: BeautifulSoup Tag to test

148

- namespaces: dict, optional namespace mappings for XML

149

- flags: int, matching flags for advanced options

150

- **kwargs: additional options including 'custom' selectors

151

152

Returns:

153

bool, True if tag matches selector, False otherwise

154

"""

155

156

def filter(select, iterable, namespaces=None, flags=0, **kwargs):

157

"""

158

Filter a collection of tags by CSS selector.

159

160

Parameters:

161

- select: str, CSS selector string

162

- iterable: collection of BeautifulSoup Tags to filter

163

- namespaces: dict, optional namespace mappings for XML

164

- flags: int, filtering flags for advanced options

165

- **kwargs: additional options including 'custom' selectors

166

167

Returns:

168

List of Tags from iterable that match the selector

169

"""

170

171

def closest(select, tag, namespaces=None, flags=0, **kwargs):

172

"""

173

Find the closest matching ancestor element.

174

175

Parameters:

176

- select: str, CSS selector string

177

- tag: BeautifulSoup Tag to start ancestor search from

178

- namespaces: dict, optional namespace mappings for XML

179

- flags: int, matching flags for advanced options

180

- **kwargs: additional options including 'custom' selectors

181

182

Returns:

183

Closest ancestor Tag that matches selector or None

184

"""

185

```

186

187

### Selector Compilation and Caching

188

189

Functions for compiling selectors for reuse and managing the selector cache.

190

191

```python { .api }

192

def compile(pattern, namespaces=None, flags=0, **kwargs):

193

"""

194

Compile CSS selector pattern into reusable SoupSieve object.

195

196

Parameters:

197

- pattern: str or SoupSieve, CSS selector string to compile

198

- namespaces: dict, optional namespace mappings for XML

199

- flags: int, compilation flags for advanced options

200

- **kwargs: additional options including 'custom' selectors

201

202

Returns:

203

SoupSieve compiled selector object

204

205

Raises:

206

ValueError: if flags/namespaces/custom provided with SoupSieve input

207

SelectorSyntaxError: for invalid CSS selector syntax

208

"""

209

210

def purge():

211

"""

212

Clear the internal compiled selector cache.

213

214

Returns:

215

None

216

"""

217

```

218

219

### Utility Functions

220

221

Helper functions for CSS identifier escaping.

222

223

```python { .api }

224

def escape(ident):

225

"""

226

Escape CSS identifier for safe use in selectors.

227

228

Parameters:

229

- ident: str, identifier string to escape

230

231

Returns:

232

str, CSS-escaped identifier safe for use in selectors

233

"""

234

```

235

236

### Deprecated Comment Functions

237

238

Functions for extracting comments (deprecated, will be removed in future versions).

239

240

```python { .api }

241

def comments(tag, limit=0, flags=0, **kwargs):

242

"""

243

Extract comments from tag tree [DEPRECATED].

244

245

Parameters:

246

- tag: BeautifulSoup Tag to search for comments

247

- limit: int, maximum comments to return (0 = unlimited)

248

- flags: int, unused flags parameter

249

- **kwargs: additional unused options

250

251

Returns:

252

List of comment strings

253

254

Note: Deprecated - not related to CSS selectors, will be removed

255

"""

256

257

def icomments(tag, limit=0, flags=0, **kwargs):

258

"""

259

Iterate comments from tag tree [DEPRECATED].

260

261

Parameters:

262

- tag: BeautifulSoup Tag to search for comments

263

- limit: int, maximum comments to yield (0 = unlimited)

264

- flags: int, unused flags parameter

265

- **kwargs: additional unused options

266

267

Yields:

268

Comment strings

269

270

Note: Deprecated - not related to CSS selectors, will be removed

271

"""

272

```

273

274

## Classes

275

276

### SoupSieve

277

278

The main compiled selector class providing reusable CSS selector functionality with caching benefits.

279

280

```python { .api }

281

class SoupSieve:

282

"""

283

Compiled CSS selector object for efficient reuse.

284

285

Attributes:

286

- pattern: str, original CSS selector pattern

287

- selectors: internal parsed selector structure

288

- namespaces: namespace mappings used during compilation

289

- custom: custom selector definitions used during compilation

290

- flags: compilation flags used during compilation

291

"""

292

293

def match(self, tag):

294

"""

295

Test if tag matches this compiled selector.

296

297

Parameters:

298

- tag: BeautifulSoup Tag to test

299

300

Returns:

301

bool, True if tag matches, False otherwise

302

"""

303

304

def select(self, tag, limit=0):

305

"""

306

Select all matching elements under tag using this compiled selector.

307

308

Parameters:

309

- tag: BeautifulSoup Tag or document to search within

310

- limit: int, maximum results to return (0 = unlimited)

311

312

Returns:

313

List of matching BeautifulSoup Tag objects

314

"""

315

316

def select_one(self, tag):

317

"""

318

Select first matching element using this compiled selector.

319

320

Parameters:

321

- tag: BeautifulSoup Tag or document to search within

322

323

Returns:

324

First matching BeautifulSoup Tag object or None

325

"""

326

327

def iselect(self, tag, limit=0):

328

"""

329

Iterate matching elements using this compiled selector.

330

331

Parameters:

332

- tag: BeautifulSoup Tag or document to search within

333

- limit: int, maximum results to yield (0 = unlimited)

334

335

Yields:

336

BeautifulSoup Tag objects that match the selector

337

"""

338

339

def filter(self, iterable):

340

"""

341

Filter collection of tags using this compiled selector.

342

343

Parameters:

344

- iterable: collection of BeautifulSoup Tags to filter

345

346

Returns:

347

List of Tags from iterable that match this selector

348

"""

349

350

def closest(self, tag):

351

"""

352

Find closest matching ancestor using this compiled selector.

353

354

Parameters:

355

- tag: BeautifulSoup Tag to start ancestor search from

356

357

Returns:

358

Closest ancestor Tag that matches this selector or None

359

"""

360

361

def comments(self, tag, limit=0):

362

"""

363

Extract comments using this selector [DEPRECATED].

364

365

Parameters:

366

- tag: BeautifulSoup Tag to search for comments

367

- limit: int, maximum comments to return (0 = unlimited)

368

369

Returns:

370

List of comment strings

371

372

Note: Deprecated - will be removed in future versions

373

"""

374

375

def icomments(self, tag, limit=0):

376

"""

377

Iterate comments using this selector [DEPRECATED].

378

379

Parameters:

380

- tag: BeautifulSoup Tag to search for comments

381

- limit: int, maximum comments to yield (0 = unlimited)

382

383

Yields:

384

Comment strings

385

386

Note: Deprecated - will be removed in future versions

387

"""

388

```

389

390

### Exception Classes

391

392

Exception types raised by soupsieve for error conditions.

393

394

```python { .api }

395

class SelectorSyntaxError(SyntaxError):

396

"""

397

Exception raised for invalid CSS selector syntax.

398

399

Attributes:

400

- line: int, line number of syntax error (if available)

401

- col: int, column number of syntax error (if available)

402

- context: str, pattern context showing error location (if available)

403

"""

404

405

def __init__(self, msg, pattern=None, index=None):

406

"""

407

Initialize syntax error with optional location information.

408

409

Parameters:

410

- msg: str, error message

411

- pattern: str, CSS pattern that caused error (optional)

412

- index: int, character index of error in pattern (optional)

413

"""

414

```

415

416

### Constants

417

418

```python { .api }

419

DEBUG = 0x00001 # Debug flag constant for development and testing

420

```

421

422

## Types

423

424

### Namespace Support

425

426

```python { .api }

427

# Namespace dictionary for XML documents

428

Namespaces = dict[str, str]

429

# Example: {'html': 'http://www.w3.org/1999/xhtml', 'svg': 'http://www.w3.org/2000/svg'}

430

431

# Custom selector definitions

432

CustomSelectors = dict[str, str]

433

# Example: {'my-selector': 'div.custom-class', 'important': '.highlight.critical'}

434

```

435

436

## Advanced Usage Examples

437

438

### Namespace-Aware Selection (XML)

439

440

```python

441

import soupsieve as sv

442

from bs4 import BeautifulSoup

443

444

xml_content = '''

445

<root xmlns:html="http://www.w3.org/1999/xhtml">

446

<html:div class="content">

447

<html:p>Namespaced paragraph</html:p>

448

</html:div>

449

</root>

450

'''

451

452

soup = BeautifulSoup(xml_content, 'xml')

453

namespaces = {'html': 'http://www.w3.org/1999/xhtml'}

454

455

# Select namespaced elements

456

divs = sv.select('html|div', soup, namespaces=namespaces)

457

paragraphs = sv.select('html|p', soup, namespaces=namespaces)

458

```

459

460

### Custom Selectors

461

462

```python

463

import soupsieve as sv

464

from bs4 import BeautifulSoup

465

466

html = '<div class="important highlight">Content</div><p class="note">Note</p>'

467

soup = BeautifulSoup(html, 'html.parser')

468

469

# Define custom selectors

470

custom = {

471

'special': '.important.highlight',

472

'content': 'div, p'

473

}

474

475

# Use custom selectors

476

special_divs = sv.select(':special', soup, custom=custom)

477

content_elements = sv.select(':content', soup, custom=custom)

478

```

479

480

### Performance with Compiled Selectors

481

482

```python

483

import soupsieve as sv

484

from bs4 import BeautifulSoup

485

486

# Compile once, use many times for better performance

487

complex_selector = sv.compile('div.container > p:nth-child(odd):not(.excluded)')

488

489

# Use compiled selector on multiple documents

490

for html_content in document_list:

491

soup = BeautifulSoup(html_content, 'html.parser')

492

matches = complex_selector.select(soup)

493

process_matches(matches)

494

495

# Clear cache when done with heavy selector use

496

sv.purge()

497

```

498

499

## Error Handling

500

501

```python

502

import soupsieve as sv

503

from soupsieve import SelectorSyntaxError

504

from bs4 import BeautifulSoup

505

506

soup = BeautifulSoup('<div>content</div>', 'html.parser')

507

508

try:

509

# This will raise SelectorSyntaxError due to invalid CSS

510

results = sv.select('div[invalid-syntax', soup)

511

except SelectorSyntaxError as e:

512

print(f"CSS selector error: {e}")

513

if e.line and e.col:

514

print(f"Error at line {e.line}, column {e.col}")

515

516

try:

517

# This will raise TypeError for invalid tag input

518

results = sv.select('div', "not a tag object")

519

except TypeError as e:

520

print(f"Invalid input type: {e}")

521

```