Tessl Tile for pypi/soupsieve@1.9.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

index.md

index.mddocs/

0
# Soupsieve
1

2
A modern CSS selector implementation for Beautiful Soup 4, providing comprehensive CSS selector support from CSS Level 1 through CSS Level 4 drafts. Soupsieve serves as the default selector engine for Beautiful Soup 4.7.0+ and can be used independently for sophisticated CSS-based element selection from HTML/XML documents.
3

4
## Package Information
5

6
- **Package Name**: soupsieve
7
- **Language**: Python
8
- **Installation**: `pip install soupsieve`
9

10
## Core Imports
11

12
```python
13
import soupsieve
14
```
15

16
Alternative import for shorter syntax:
17

18
```python
19
import soupsieve as sv
20
```
21

22
Specific functions and classes can be imported directly:
23

24
```python
25
from soupsieve import compile, select, match, SoupSieve, SelectorSyntaxError
26
```
27

28
## Basic Usage
29

30
```python
31
import soupsieve as sv
32
from bs4 import BeautifulSoup
33

34
# Create a soup object from HTML
35
html = """
36
<div class="container">
37
    <p id="intro">Introduction paragraph</p>
38
    <div class="content">
39
        <p class="highlight">Important content</p>
40
        <span>Additional info</span>
41
    </div>
42
</div>
43
"""
44
soup = BeautifulSoup(html, 'html.parser')
45

46
# Basic selection - find all paragraphs
47
paragraphs = sv.select('p', soup)
48
print(f"Found {len(paragraphs)} paragraphs")
49

50
# Select with class
51
highlighted = sv.select('.highlight', soup)
52
if highlighted:
53
    print(f"Highlighted text: {highlighted[0].get_text()}")
54

55
# Select first match only
56
first_p = sv.select_one('p', soup)
57
print(f"First paragraph: {first_p.get_text()}")
58

59
# Test if element matches selector  
60
intro = soup.find(id='intro')
61
if sv.match('#intro', intro):
62
    print("Element matches #intro selector")
63

64
# Compiled selectors for reuse
65
compiled = sv.compile('div.content > *')
66
children = compiled.select(soup)
67
print(f"Found {len(children)} direct children of .content")
68
```
69

70
## Architecture
71

72
Soupsieve's architecture centers around CSS parsing and matching:
73

74
- **Parser**: Converts CSS selector strings into structured selector objects
75
- **Matcher**: Evaluates selectors against Beautiful Soup elements using tree traversal
76
- **Compiler**: Provides caching and reusable compiled selector objects
77
- **Types**: Immutable data structures representing selector components
78

79
The library automatically handles HTML vs XML differences and provides namespace support for XML documents.
80

81
## Capabilities
82

83
### CSS Selector Functions  
84

85
Core functions for selecting elements using CSS selectors. These provide the primary interface for CSS-based element selection.
86

87
```python { .api }
88
def select(select, tag, namespaces=None, limit=0, flags=0, **kwargs):
89
    """
90
    Select all matching elements under the specified tag.
91
    
92
    Parameters:
93
    - select: str, CSS selector string
94
    - tag: BeautifulSoup Tag or document to search within
95
    - namespaces: dict, optional namespace mappings for XML
96
    - limit: int, maximum results to return (0 = unlimited)
97
    - flags: int, selection flags for advanced options
98
    - **kwargs: additional options including 'custom' selectors
99
    
100
    Returns:
101
    List of matching BeautifulSoup Tag objects
102
    """
103

104
def select_one(select, tag, namespaces=None, flags=0, **kwargs):
105
    """
106
    Select the first matching element.
107
    
108
    Parameters:
109
    - select: str, CSS selector string  
110
    - tag: BeautifulSoup Tag or document to search within
111
    - namespaces: dict, optional namespace mappings for XML
112
    - flags: int, selection flags for advanced options
113
    - **kwargs: additional options including 'custom' selectors
114
    
115
    Returns:
116
    First matching BeautifulSoup Tag object or None
117
    """
118

119
def iselect(select, tag, namespaces=None, limit=0, flags=0, **kwargs):
120
    """
121
    Iterate over matching elements (generator).
122
    
123
    Parameters:
124
    - select: str, CSS selector string
125
    - tag: BeautifulSoup Tag or document to search within  
126
    - namespaces: dict, optional namespace mappings for XML
127
    - limit: int, maximum results to yield (0 = unlimited)
128
    - flags: int, selection flags for advanced options
129
    - **kwargs: additional options including 'custom' selectors
130
    
131
    Yields:
132
    BeautifulSoup Tag objects that match the selector
133
    """
134
```
135

136
### Element Matching and Filtering
137

138
Functions for testing individual elements and filtering collections.
139

140
```python { .api }
141
def match(select, tag, namespaces=None, flags=0, **kwargs):
142
    """
143
    Test if a tag matches the CSS selector.
144
    
145
    Parameters:
146
    - select: str, CSS selector string
147
    - tag: BeautifulSoup Tag to test
148
    - namespaces: dict, optional namespace mappings for XML
149
    - flags: int, matching flags for advanced options
150
    - **kwargs: additional options including 'custom' selectors
151
    
152
    Returns:
153
    bool, True if tag matches selector, False otherwise
154
    """
155

156
def filter(select, iterable, namespaces=None, flags=0, **kwargs):
157
    """
158
    Filter a collection of tags by CSS selector.
159
    
160
    Parameters:
161
    - select: str, CSS selector string
162
    - iterable: collection of BeautifulSoup Tags to filter
163
    - namespaces: dict, optional namespace mappings for XML
164
    - flags: int, filtering flags for advanced options
165
    - **kwargs: additional options including 'custom' selectors
166
    
167
    Returns:
168
    List of Tags from iterable that match the selector
169
    """
170

171
def closest(select, tag, namespaces=None, flags=0, **kwargs):
172
    """
173
    Find the closest matching ancestor element.
174
    
175
    Parameters:
176
    - select: str, CSS selector string
177
    - tag: BeautifulSoup Tag to start ancestor search from
178
    - namespaces: dict, optional namespace mappings for XML
179
    - flags: int, matching flags for advanced options
180
    - **kwargs: additional options including 'custom' selectors
181
    
182
    Returns:
183
    Closest ancestor Tag that matches selector or None
184
    """
185
```
186

187
### Selector Compilation and Caching
188

189
Functions for compiling selectors for reuse and managing the selector cache.
190

191
```python { .api }
192
def compile(pattern, namespaces=None, flags=0, **kwargs):
193
    """
194
    Compile CSS selector pattern into reusable SoupSieve object.
195
    
196
    Parameters:
197
    - pattern: str or SoupSieve, CSS selector string to compile
198
    - namespaces: dict, optional namespace mappings for XML
199
    - flags: int, compilation flags for advanced options
200
    - **kwargs: additional options including 'custom' selectors
201
    
202
    Returns:
203
    SoupSieve compiled selector object
204
    
205
    Raises:
206
    ValueError: if flags/namespaces/custom provided with SoupSieve input
207
    SelectorSyntaxError: for invalid CSS selector syntax
208
    """
209

210
def purge():
211
    """
212
    Clear the internal compiled selector cache.
213
    
214
    Returns:
215
    None
216
    """
217
```
218

219
### Utility Functions
220

221
Helper functions for CSS identifier escaping.
222

223
```python { .api }
224
def escape(ident):
225
    """
226
    Escape CSS identifier for safe use in selectors.
227
    
228
    Parameters:
229
    - ident: str, identifier string to escape
230
    
231
    Returns:
232
    str, CSS-escaped identifier safe for use in selectors
233
    """
234
```
235

236
### Deprecated Comment Functions
237

238
Functions for extracting comments (deprecated, will be removed in future versions).
239

240
```python { .api }
241
def comments(tag, limit=0, flags=0, **kwargs):
242
    """
243
    Extract comments from tag tree [DEPRECATED].
244
    
245
    Parameters:
246
    - tag: BeautifulSoup Tag to search for comments
247
    - limit: int, maximum comments to return (0 = unlimited)
248
    - flags: int, unused flags parameter
249
    - **kwargs: additional unused options
250
    
251
    Returns:
252
    List of comment strings
253
    
254
    Note: Deprecated - not related to CSS selectors, will be removed
255
    """
256

257
def icomments(tag, limit=0, flags=0, **kwargs):
258
    """
259
    Iterate comments from tag tree [DEPRECATED].
260
    
261
    Parameters:
262
    - tag: BeautifulSoup Tag to search for comments
263
    - limit: int, maximum comments to yield (0 = unlimited)
264
    - flags: int, unused flags parameter
265
    - **kwargs: additional unused options
266
    
267
    Yields:
268
    Comment strings
269
    
270
    Note: Deprecated - not related to CSS selectors, will be removed
271
    """
272
```
273

274
## Classes
275

276
### SoupSieve
277

278
The main compiled selector class providing reusable CSS selector functionality with caching benefits.
279

280
```python { .api }
281
class SoupSieve:
282
    """
283
    Compiled CSS selector object for efficient reuse.
284
    
285
    Attributes:
286
    - pattern: str, original CSS selector pattern
287
    - selectors: internal parsed selector structure  
288
    - namespaces: namespace mappings used during compilation
289
    - custom: custom selector definitions used during compilation
290
    - flags: compilation flags used during compilation
291
    """
292
    
293
    def match(self, tag):
294
        """
295
        Test if tag matches this compiled selector.
296
        
297
        Parameters:
298
        - tag: BeautifulSoup Tag to test
299
        
300
        Returns:
301
        bool, True if tag matches, False otherwise
302
        """
303
    
304
    def select(self, tag, limit=0):
305
        """
306
        Select all matching elements under tag using this compiled selector.
307
        
308
        Parameters:
309
        - tag: BeautifulSoup Tag or document to search within
310
        - limit: int, maximum results to return (0 = unlimited)
311
        
312
        Returns:
313
        List of matching BeautifulSoup Tag objects
314
        """
315
    
316
    def select_one(self, tag):
317
        """
318
        Select first matching element using this compiled selector.
319
        
320
        Parameters:
321
        - tag: BeautifulSoup Tag or document to search within
322
        
323
        Returns:
324
        First matching BeautifulSoup Tag object or None
325
        """
326
    
327
    def iselect(self, tag, limit=0):
328
        """
329
        Iterate matching elements using this compiled selector.
330
        
331
        Parameters:
332
        - tag: BeautifulSoup Tag or document to search within
333
        - limit: int, maximum results to yield (0 = unlimited)
334
        
335
        Yields:
336
        BeautifulSoup Tag objects that match the selector
337
        """
338
    
339
    def filter(self, iterable):
340
        """
341
        Filter collection of tags using this compiled selector.
342
        
343
        Parameters:
344
        - iterable: collection of BeautifulSoup Tags to filter
345
        
346
        Returns:
347
        List of Tags from iterable that match this selector
348
        """
349
    
350
    def closest(self, tag):
351
        """
352
        Find closest matching ancestor using this compiled selector.
353
        
354
        Parameters:
355
        - tag: BeautifulSoup Tag to start ancestor search from
356
        
357
        Returns:
358
        Closest ancestor Tag that matches this selector or None
359
        """
360
    
361
    def comments(self, tag, limit=0):
362
        """
363
        Extract comments using this selector [DEPRECATED].
364
        
365
        Parameters:
366
        - tag: BeautifulSoup Tag to search for comments
367
        - limit: int, maximum comments to return (0 = unlimited)
368
        
369
        Returns:
370
        List of comment strings
371
        
372
        Note: Deprecated - will be removed in future versions
373
        """
374
    
375
    def icomments(self, tag, limit=0):
376
        """
377
        Iterate comments using this selector [DEPRECATED].
378
        
379
        Parameters:
380
        - tag: BeautifulSoup Tag to search for comments  
381
        - limit: int, maximum comments to yield (0 = unlimited)
382
        
383
        Yields:
384
        Comment strings
385
        
386
        Note: Deprecated - will be removed in future versions
387
        """
388
```
389

390
### Exception Classes
391

392
Exception types raised by soupsieve for error conditions.
393

394
```python { .api }
395
class SelectorSyntaxError(SyntaxError):
396
    """
397
    Exception raised for invalid CSS selector syntax.
398
    
399
    Attributes:
400
    - line: int, line number of syntax error (if available)
401
    - col: int, column number of syntax error (if available)  
402
    - context: str, pattern context showing error location (if available)
403
    """
404
    
405
    def __init__(self, msg, pattern=None, index=None):
406
        """
407
        Initialize syntax error with optional location information.
408
        
409
        Parameters:
410
        - msg: str, error message
411
        - pattern: str, CSS pattern that caused error (optional)
412
        - index: int, character index of error in pattern (optional)
413
        """
414
```
415

416
### Constants
417

418
```python { .api }
419
DEBUG = 0x00001  # Debug flag constant for development and testing
420
```
421

422
## Types
423

424
### Namespace Support
425

426
```python { .api }
427
# Namespace dictionary for XML documents
428
Namespaces = dict[str, str]
429
# Example: {'html': 'http://www.w3.org/1999/xhtml', 'svg': 'http://www.w3.org/2000/svg'}
430

431
# Custom selector definitions  
432
CustomSelectors = dict[str, str]
433
# Example: {'my-selector': 'div.custom-class', 'important': '.highlight.critical'}
434
```
435

436
## Advanced Usage Examples
437

438
### Namespace-Aware Selection (XML)
439

440
```python
441
import soupsieve as sv
442
from bs4 import BeautifulSoup
443

444
xml_content = '''
445
<root xmlns:html="http://www.w3.org/1999/xhtml">
446
    <html:div class="content">
447
        <html:p>Namespaced paragraph</html:p>
448
    </html:div>
449
</root>
450
'''
451

452
soup = BeautifulSoup(xml_content, 'xml')
453
namespaces = {'html': 'http://www.w3.org/1999/xhtml'}
454

455
# Select namespaced elements
456
divs = sv.select('html|div', soup, namespaces=namespaces)
457
paragraphs = sv.select('html|p', soup, namespaces=namespaces)
458
```
459

460
### Custom Selectors
461

462
```python
463
import soupsieve as sv
464
from bs4 import BeautifulSoup
465

466
html = '<div class="important highlight">Content</div><p class="note">Note</p>'
467
soup = BeautifulSoup(html, 'html.parser')
468

469
# Define custom selectors
470
custom = {
471
    'special': '.important.highlight',
472
    'content': 'div, p'
473
}
474

475
# Use custom selectors
476
special_divs = sv.select(':special', soup, custom=custom)
477
content_elements = sv.select(':content', soup, custom=custom)
478
```
479

480
### Performance with Compiled Selectors
481

482
```python
483
import soupsieve as sv
484
from bs4 import BeautifulSoup
485

486
# Compile once, use many times for better performance
487
complex_selector = sv.compile('div.container > p:nth-child(odd):not(.excluded)')
488

489
# Use compiled selector on multiple documents
490
for html_content in document_list:
491
    soup = BeautifulSoup(html_content, 'html.parser')
492
    matches = complex_selector.select(soup)
493
    process_matches(matches)
494

495
# Clear cache when done with heavy selector use
496
sv.purge()
497
```
498

499
## Error Handling
500

501
```python
502
import soupsieve as sv
503
from soupsieve import SelectorSyntaxError
504
from bs4 import BeautifulSoup
505

506
soup = BeautifulSoup('<div>content</div>', 'html.parser')
507

508
try:
509
    # This will raise SelectorSyntaxError due to invalid CSS
510
    results = sv.select('div[invalid-syntax', soup)
511
except SelectorSyntaxError as e:
512
    print(f"CSS selector error: {e}")
513
    if e.line and e.col:
514
        print(f"Error at line {e.line}, column {e.col}")
515

516
try:
517
    # This will raise TypeError for invalid tag input
518
    results = sv.select('div', "not a tag object")
519
except TypeError as e:
520
    print(f"Invalid input type: {e}")
521
```

Version

Tile

Files

index.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

index.mddocs/