cssselect parses CSS3 Selectors and translates them to XPath 1.0
npx @tessl/cli install tessl/pypi-cssselect@1.3.00
# cssselect
1
2
cssselect is a Python library that parses CSS3 selectors and translates them to XPath 1.0 expressions. It enables developers to use CSS selector syntax to find matching elements in XML or HTML documents through XPath engines like lxml. The library provides a clean API for converting CSS selectors into XPath expressions, making it easier to work with HTML/XML parsing and element selection in Python applications.
3
4
## Package Information
5
6
- **Package Name**: cssselect
7
- **Language**: Python
8
- **Installation**: `pip install cssselect`
9
- **Python Support**: >= 3.9
10
11
## Core Imports
12
13
```python
14
import cssselect
15
```
16
17
Common usage patterns:
18
19
```python
20
from cssselect import GenericTranslator, HTMLTranslator, parse
21
```
22
23
For accessing all public API components:
24
25
```python
26
from cssselect import (
27
ExpressionError,
28
FunctionalPseudoElement,
29
GenericTranslator,
30
HTMLTranslator,
31
Selector,
32
SelectorError,
33
SelectorSyntaxError,
34
parse,
35
)
36
```
37
38
## Basic Usage
39
40
```python
41
from cssselect import GenericTranslator, HTMLTranslator
42
43
# Basic CSS to XPath translation
44
translator = GenericTranslator()
45
xpath = translator.css_to_xpath('div.content > p')
46
print(xpath) # "descendant-or-self::div[@class and contains(concat(' ', normalize-space(@class), ' '), ' content ')]/p"
47
48
# HTML-specific translation with pseudo-class support
49
html_translator = HTMLTranslator()
50
xpath = html_translator.css_to_xpath('input:checked')
51
print(xpath) # XPath expression for checked input elements
52
53
# Parse selectors for inspection
54
from cssselect import parse
55
selectors = parse('div.content, #main')
56
for selector in selectors:
57
print(f"Selector: {selector.canonical()}")
58
print(f"Specificity: {selector.specificity()}")
59
```
60
61
## Capabilities
62
63
### CSS Selector Parsing
64
65
Parse CSS selector strings into structured Selector objects for analysis and manipulation.
66
67
```python { .api }
68
def parse(css: str) -> list[Selector]:
69
"""
70
Parse a CSS group of selectors into Selector objects.
71
72
Parameters:
73
- css (str): A group of selectors as a string
74
75
Returns:
76
list[Selector]: List of parsed Selector objects
77
78
Raises:
79
SelectorSyntaxError: On invalid selectors
80
"""
81
```
82
83
### Generic XML Translation
84
85
Translate CSS selectors to XPath expressions for generic XML documents with case-sensitive matching.
86
87
```python { .api }
88
class GenericTranslator:
89
"""
90
Translator for generic XML documents.
91
92
Everything is case-sensitive, no assumption is made on the meaning
93
of element names and attribute names.
94
"""
95
96
def __init__(self):
97
"""Initialize a GenericTranslator instance."""
98
99
def css_to_xpath(self, css: str, prefix: str = "descendant-or-self::") -> str:
100
"""
101
Translate a group of selectors to XPath.
102
103
Parameters:
104
- css (str): A group of selectors as a string
105
- prefix (str): Prepended to XPath expression (default: "descendant-or-self::")
106
107
Returns:
108
str: The equivalent XPath 1.0 expression
109
110
Raises:
111
SelectorSyntaxError: On invalid selectors
112
ExpressionError: On unknown/unsupported selectors
113
"""
114
115
def selector_to_xpath(
116
self,
117
selector: Selector,
118
prefix: str = "descendant-or-self::",
119
translate_pseudo_elements: bool = False
120
) -> str:
121
"""
122
Translate a single parsed selector to XPath.
123
124
Parameters:
125
- selector (Selector): A parsed Selector object
126
- prefix (str): Prepended to XPath expression (default: "descendant-or-self::")
127
- translate_pseudo_elements (bool): Whether to handle pseudo-elements
128
129
Returns:
130
str: The equivalent XPath 1.0 expression
131
132
Raises:
133
ExpressionError: On unknown/unsupported selectors
134
"""
135
136
def xpath_pseudo_element(self, xpath, pseudo_element):
137
"""
138
Handle pseudo-element in XPath translation.
139
140
Parameters:
141
- xpath: XPath expression object
142
- pseudo_element (PseudoElement): Pseudo-element to handle
143
144
Returns:
145
XPath expression with pseudo-element handling
146
"""
147
148
@staticmethod
149
def xpath_literal(s: str) -> str:
150
"""
151
Create properly escaped XPath literal from string.
152
153
Parameters:
154
- s (str): String to escape
155
156
Returns:
157
str: XPath-escaped string literal
158
"""
159
160
# Configuration attributes
161
id_attribute: str = "id" # Attribute used for ID selectors
162
lang_attribute: str = "xml:lang" # Attribute used for :lang() pseudo-class
163
lower_case_element_names: bool = False # Case sensitivity for element names
164
lower_case_attribute_names: bool = False # Case sensitivity for attribute names
165
lower_case_attribute_values: bool = False # Case sensitivity for attribute values
166
```
167
168
### HTML-Specific Translation
169
170
Translate CSS selectors to XPath expressions optimized for HTML documents with HTML-specific pseudo-class support.
171
172
```python { .api }
173
class HTMLTranslator(GenericTranslator):
174
"""
175
Translator for HTML documents.
176
177
Has useful implementations of HTML-specific pseudo-classes and
178
handles HTML case-insensitivity rules.
179
"""
180
181
def __init__(self, xhtml: bool = False):
182
"""
183
Initialize HTML translator.
184
185
Parameters:
186
- xhtml (bool): If False (default), element and attribute names are case-insensitive
187
"""
188
189
# Overridden configuration attributes
190
lang_attribute: str = "lang" # Uses 'lang' instead of 'xml:lang' for HTML
191
```
192
193
### Selector Objects
194
195
Work with parsed CSS selectors as structured objects for analysis and manipulation.
196
197
```python { .api }
198
class Selector:
199
"""
200
Represents a parsed CSS selector.
201
"""
202
203
def __init__(self, tree: Tree, pseudo_element: PseudoElement | None = None):
204
"""
205
Create a Selector object.
206
207
Parameters:
208
- tree (Tree): The parsed selector tree
209
- pseudo_element (PseudoElement | None): Pseudo-element if present
210
"""
211
212
def canonical(self) -> str:
213
"""
214
Return a CSS representation for this selector.
215
216
Returns:
217
str: CSS selector string
218
"""
219
220
def specificity(self) -> tuple[int, int, int]:
221
"""
222
Return the CSS specificity of this selector.
223
224
Returns:
225
tuple[int, int, int]: Specificity as (a, b, c) tuple per CSS specification
226
"""
227
228
# Attributes
229
parsed_tree: Tree # The parsed selector tree
230
pseudo_element: PseudoElement | None # Pseudo-element if present
231
```
232
233
### Functional Pseudo-Elements
234
235
Handle functional pseudo-elements with arguments like `::name(arguments)`.
236
237
```python { .api }
238
class FunctionalPseudoElement:
239
"""
240
Represents functional pseudo-elements like ::name(arguments).
241
"""
242
243
def __init__(self, name: str, arguments: Sequence[Token]):
244
"""
245
Create a functional pseudo-element.
246
247
Parameters:
248
- name (str): The pseudo-element name
249
- arguments (Sequence[Token]): The argument tokens
250
"""
251
252
def argument_types(self) -> list[str]:
253
"""
254
Get the types of the pseudo-element arguments.
255
256
Returns:
257
list[str]: List of argument token types
258
"""
259
260
def canonical(self) -> str:
261
"""
262
Return CSS representation of the functional pseudo-element.
263
264
Returns:
265
str: CSS pseudo-element string
266
"""
267
268
# Attributes
269
name: str # The pseudo-element name
270
arguments: Sequence[Token] # The argument tokens
271
```
272
273
## Exception Handling
274
275
### Exception Types
276
277
```python { .api }
278
class SelectorError(Exception):
279
"""
280
Base exception for CSS selector related errors.
281
282
Common parent for SelectorSyntaxError and ExpressionError.
283
Use except SelectorError: to catch both exception types.
284
"""
285
286
class SelectorSyntaxError(SelectorError, SyntaxError):
287
"""
288
Exception raised when parsing a selector that does not match the CSS grammar.
289
"""
290
291
class ExpressionError(SelectorError, RuntimeError):
292
"""
293
Exception raised for unknown or unsupported selector features during XPath translation.
294
"""
295
```
296
297
### Error Handling Examples
298
299
**Basic error handling:**
300
301
```python
302
from cssselect import GenericTranslator, SelectorError
303
304
translator = GenericTranslator()
305
306
try:
307
xpath = translator.css_to_xpath('div.content > p')
308
except SelectorError as e:
309
print(f"Selector error: {e}")
310
```
311
312
**Specific error handling:**
313
314
```python
315
from cssselect import parse, SelectorSyntaxError, ExpressionError
316
317
try:
318
selectors = parse('div.content > p')
319
# Process selectors...
320
except SelectorSyntaxError as e:
321
print(f"Invalid CSS syntax: {e}")
322
except ExpressionError as e:
323
print(f"Unsupported selector feature: {e}")
324
```
325
326
## Advanced Usage
327
328
### Selector Analysis
329
330
```python
331
from cssselect import parse
332
333
# Analyze selector specificity and structure
334
selectors = parse('div.content #main, body > nav a:hover')
335
for selector in selectors:
336
print(f"Selector: {selector.canonical()}")
337
print(f"Specificity: {selector.specificity()}")
338
if selector.pseudo_element:
339
print(f"Pseudo-element: {selector.pseudo_element}")
340
```
341
342
### Custom Translation
343
344
```python
345
from cssselect import GenericTranslator
346
347
# Use custom prefix for XPath expression
348
translator = GenericTranslator()
349
xpath = translator.css_to_xpath('div > p', prefix="./")
350
print(xpath) # "./div/p"
351
352
# Translate single selector with pseudo-element handling
353
from cssselect import parse
354
selectors = parse('div::before')
355
xpath = translator.selector_to_xpath(
356
selectors[0],
357
prefix="descendant::",
358
translate_pseudo_elements=True
359
)
360
```
361
362
### HTML vs Generic Translation
363
364
```python
365
from cssselect import GenericTranslator, HTMLTranslator
366
367
css = 'INPUT:checked'
368
369
# Generic (case-sensitive) translation
370
generic = GenericTranslator()
371
generic_xpath = generic.css_to_xpath(css)
372
373
# HTML (case-insensitive with HTML pseudo-classes) translation
374
html = HTMLTranslator()
375
html_xpath = html.css_to_xpath(css)
376
377
print(f"Generic: {generic_xpath}")
378
print(f"HTML: {html_xpath}")
379
```
380
381
## Parsed Selector Tree Components
382
383
Advanced users working with parsed selectors may encounter these tree node classes:
384
385
### Tree Node Classes
386
387
```python { .api }
388
class Element:
389
"""Represents element selectors (tag, *, namespace|tag)."""
390
def canonical(self) -> str: ...
391
def specificity(self) -> tuple[int, int, int]: ...
392
393
class Class:
394
"""Represents class selectors (.classname)."""
395
def canonical(self) -> str: ...
396
def specificity(self) -> tuple[int, int, int]: ...
397
398
class Hash:
399
"""Represents ID selectors (#id)."""
400
def canonical(self) -> str: ...
401
def specificity(self) -> tuple[int, int, int]: ...
402
403
class Attrib:
404
"""Represents attribute selectors ([attr], [attr=val], etc.)."""
405
def canonical(self) -> str: ...
406
def specificity(self) -> tuple[int, int, int]: ...
407
408
class Pseudo:
409
"""Represents pseudo-class selectors (:hover, :first-child)."""
410
def canonical(self) -> str: ...
411
def specificity(self) -> tuple[int, int, int]: ...
412
413
class Function:
414
"""Represents functional pseudo-classes (:nth-child(2n+1))."""
415
def canonical(self) -> str: ...
416
def specificity(self) -> tuple[int, int, int]: ...
417
418
class Negation:
419
"""Represents :not() pseudo-class."""
420
def canonical(self) -> str: ...
421
def specificity(self) -> tuple[int, int, int]: ...
422
423
class Relation:
424
"""Represents :has() relational pseudo-class."""
425
def canonical(self) -> str: ...
426
def specificity(self) -> tuple[int, int, int]: ...
427
428
class Matching:
429
"""Represents :is() pseudo-class."""
430
def canonical(self) -> str: ...
431
def specificity(self) -> tuple[int, int, int]: ...
432
433
class SpecificityAdjustment:
434
"""Represents :where() pseudo-class."""
435
def canonical(self) -> str: ...
436
def specificity(self) -> tuple[int, int, int]: ...
437
438
class CombinedSelector:
439
"""Represents combined selectors with combinators ('>', '+', '~', ' ')."""
440
def canonical(self) -> str: ...
441
def specificity(self) -> tuple[int, int, int]: ...
442
```
443
444
## Types
445
446
### Core Types
447
448
```python { .api }
449
# Type aliases for internal selector tree structure
450
Tree = Union[
451
Element, Hash, Class, Function, Pseudo, Attrib,
452
Negation, Relation, Matching, SpecificityAdjustment, CombinedSelector
453
]
454
455
PseudoElement = Union[FunctionalPseudoElement, str]
456
```
457
458
### Token Type
459
460
```python { .api }
461
class Token(tuple[str, Optional[str]]):
462
"""
463
Represents a CSS token during parsing.
464
465
Token types include: IDENT, HASH, STRING, S (whitespace), DELIM, NUMBER, EOF
466
"""
467
468
def __new__(cls, type_: str, value: str | None, pos: int):
469
"""
470
Create a new token.
471
472
Parameters:
473
- type_ (str): Token type (IDENT, HASH, STRING, S, DELIM, NUMBER, EOF)
474
- value (str | None): Token value
475
- pos (int): Position in source string
476
"""
477
478
def is_delim(self, *values: str) -> bool:
479
"""
480
Check if token is delimiter with specific value(s).
481
482
Parameters:
483
- *values (str): Values to check against
484
485
Returns:
486
bool: True if token is delimiter with one of the specified values
487
"""
488
489
def css(self) -> str:
490
"""
491
Return CSS representation of the token.
492
493
Returns:
494
str: CSS string representation
495
"""
496
497
# Properties
498
type: str # Token type
499
value: str | None # Token value
500
pos: int # Position in source
501
502
class EOFToken(Token):
503
"""Special end-of-file token."""
504
```
505
506
## Utility Functions
507
508
Advanced parsing and string manipulation utilities:
509
510
```python { .api }
511
def parse_series(tokens) -> tuple[int, int]:
512
"""
513
Parse :nth-child() style arguments like '2n+1'.
514
515
Parameters:
516
- tokens: Iterable of tokens representing the series expression
517
518
Returns:
519
tuple[int, int]: (a, b) values for an + b expression
520
"""
521
522
def ascii_lower(string: str) -> str:
523
"""
524
ASCII-only lowercase conversion.
525
526
Parameters:
527
- string (str): String to convert
528
529
Returns:
530
str: Lowercase string using ASCII rules only
531
"""
532
533
def unescape_ident(value: str) -> str:
534
"""
535
Unescape CSS identifier strings.
536
537
Parameters:
538
- value (str): CSS identifier with possible escape sequences
539
540
Returns:
541
str: Unescaped identifier string
542
"""
543
```
544
545
## Package Version
546
547
```python { .api }
548
VERSION = "1.3.0"
549
__version__ = "1.3.0"
550
```