Tessl Tile for pypi/parsel@1.10.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

css-translation.md data-extraction.md element-modification.md index.md parsing-selection.md selectorlist-operations.md xml-namespaces.md xpath-extensions.md

parsing-selection.mddocs/

0
# Document Parsing and Selection
1

2
Core functionality for parsing HTML, XML, JSON, and text documents with unified selector interface supporting multiple query languages including XPath, CSS selectors, and JMESPath.
3

4
## Capabilities
5

6
### Selector Initialization
7

8
Create Selector instances from various input formats with configurable parsing options.
9

10
```python { .api }
11
class Selector:
12
    def __init__(
13
        self,
14
        text: Optional[str] = None,
15
        type: Optional[str] = None,
16
        body: bytes = b"",
17
        encoding: str = "utf-8",
18
        namespaces: Optional[Mapping[str, str]] = None,
19
        root: Optional[Any] = None,
20
        base_url: Optional[str] = None,
21
        _expr: Optional[str] = None,
22
        huge_tree: bool = True,
23
    ) -> None:
24
        """
25
        Initialize a Selector for parsing and selecting from documents.
26

27
        Parameters:
28
        - text (str, optional): Text content to parse
29
        - type (str, optional): Document type - "html", "xml", "json", or "text"
30
        - body (bytes): Raw bytes content (alternative to text)
31
        - encoding (str): Character encoding for body content, defaults to "utf-8"
32
        - namespaces (dict, optional): XML namespace prefix mappings
33
        - root (Any, optional): Pre-parsed root element or data
34
        - base_url (str, optional): Base URL for resolving relative URLs
35
        - _expr (str, optional): Expression that created this selector
36
        - huge_tree (bool): Enable large document parsing support, defaults to True
37

38
        Raises:
39
        - ValueError: Invalid type or missing required arguments
40
        - TypeError: Invalid argument types
41
        """
42
```
43

44
**Usage Example:**
45

46
```python
47
from parsel import Selector
48

49
# Parse HTML text
50
html_selector = Selector(text="<html><body><h1>Title</h1></body></html>")
51

52
# Parse XML with explicit type
53
xml_selector = Selector(text="<root><item>data</item></root>", type="xml")
54

55
# Parse JSON
56
json_selector = Selector(text='{"name": "value", "items": [1, 2, 3]}', type="json")
57

58
# Parse from bytes with encoding
59
bytes_selector = Selector(body=b"<html><body>Content</body></html>", encoding="utf-8")
60

61
# Parse with XML namespaces
62
ns_selector = Selector(
63
    text="<root xmlns:ns='http://example.com'><ns:item>data</ns:item></root>",
64
    type="xml",
65
    namespaces={"ns": "http://example.com"}
66
)
67
```
68

69
### XPath Selection
70

71
Execute XPath expressions for precise element selection with namespace support and variable binding.
72

73
```python { .api }
74
def xpath(
75
    self,
76
    query: str,
77
    namespaces: Optional[Mapping[str, str]] = None,
78
    **kwargs: Any,
79
) -> SelectorList["Selector"]:
80
    """
81
    Find nodes matching the XPath query.
82

83
    Parameters:
84
    - query (str): XPath expression to execute
85
    - namespaces (dict, optional): Additional namespace prefix mappings
86
    - **kwargs: Variable bindings for XPath variables
87

88
    Returns:
89
    SelectorList: Collection of matching Selector objects
90

91
    Raises:
92
    - ValueError: Invalid XPath expression or unsupported selector type
93
    - XPathError: XPath syntax or evaluation errors
94
    """
95
```
96

97
**Usage Example:**
98

99
```python
100
selector = Selector(text="""
101
<html>
102
    <body>
103
        <div class="content">
104
            <p>First paragraph</p>
105
            <p>Second paragraph</p>
106
        </div>
107
        <a href="http://example.com">Link</a>
108
    </body>
109
</html>
110
""")
111

112
# Select all paragraphs
113
paragraphs = selector.xpath('//p')
114

115
# Select text content
116
text_nodes = selector.xpath('//p/text()')
117

118
# Select attributes
119
hrefs = selector.xpath('//a/@href')
120

121
# Use XPath variables
122
links = selector.xpath('//a[@href=$url]', url="http://example.com")
123

124
# Complex XPath expressions
125
content_divs = selector.xpath('//div[@class="content"]//p[position()>1]')
126
```
127

128
### CSS Selection
129

130
Apply CSS selectors with support for pseudo-elements and advanced CSS features.
131

132
```python { .api }
133
def css(self, query: str) -> SelectorList["Selector"]:
134
    """
135
    Apply CSS selector and return matching elements.
136

137
    Parameters:
138
    - query (str): CSS selector expression
139

140
    Returns:
141
    SelectorList: Collection of matching Selector objects
142

143
    Raises:
144
    - ValueError: Invalid CSS selector or unsupported selector type
145
    - ExpressionError: CSS syntax errors
146
    """
147
```
148

149
**Usage Example:**
150

151
```python
152
selector = Selector(text="""
153
<html>
154
    <body>
155
        <div class="container">
156
            <h1 id="title">Main Title</h1>
157
            <p class="intro">Introduction text</p>
158
            <ul>
159
                <li><a href="link1.html">Link 1</a></li>
160
                <li><a href="link2.html">Link 2</a></li>
161
            </ul>
162
        </div>
163
    </body>
164
</html>
165
""")
166

167
# Select by class
168
intro = selector.css('.intro')
169

170
# Select by ID
171
title = selector.css('#title')
172

173
# Select descendants
174
links = selector.css('.container a')
175

176
# Pseudo-element selectors for text content
177
title_text = selector.css('h1::text')
178

179
# Pseudo-element selectors for attributes
180
link_urls = selector.css('a::attr(href)')
181

182
# Complex selectors
183
first_link = selector.css('ul li:first-child a')
184
```
185

186
### JMESPath Selection
187

188
Query JSON data using JMESPath expressions for complex data extraction.
189

190
```python { .api }
191
def jmespath(self, query: str, **kwargs: Any) -> SelectorList["Selector"]:
192
    """
193
    Find objects matching the JMESPath query for JSON data.
194

195
    Parameters:
196
    - query (str): JMESPath expression to apply
197
    - **kwargs: Additional options passed to jmespath.search()
198

199
    Returns:
200
    SelectorList: Collection of matching Selector objects with extracted data
201

202
    Note:
203
    - Works with JSON-type selectors or JSON content within HTML/XML elements
204
    - Results are wrapped in new Selector objects for chaining
205
    """
206
```
207

208
**Usage Example:**
209

210
```python
211
# JSON document
212
json_text = '''
213
{
214
    "users": [
215
        {"name": "Alice", "age": 30, "email": "alice@example.com"},
216
        {"name": "Bob", "age": 25, "email": "bob@example.com"}
217
    ],
218
    "metadata": {
219
        "total": 2,
220
        "page": 1
221
    }
222
}
223
'''
224

225
selector = Selector(text=json_text, type="json")
226

227
# Extract all user names
228
names = selector.jmespath('users[*].name')
229

230
# Extract specific user
231
first_user = selector.jmespath('users[0]')
232

233
# Complex queries
234
adult_emails = selector.jmespath('users[?age >= `30`].email')
235

236
# Nested data extraction
237
metadata = selector.jmespath('metadata.total')
238

239
# JSON within HTML
240
html_with_json = """
241
<script type="application/json">
242
{"config": {"theme": "dark", "version": "1.0"}}
243
</script>
244
"""
245
html_selector = Selector(text=html_with_json)
246
theme = html_selector.css('script::text').jmespath('config.theme')
247
```
248

249
## Document Type Detection
250

251
Parsel automatically detects document types or allows explicit specification:
252

253
- **HTML**: Default type, uses HTML5-compliant parsing
254
- **XML**: Strict XML parsing with namespace support  
255
- **JSON**: Native JSON parsing with JMESPath support
256
- **Text**: Plain text content for regex extraction
257

258
Auto-detection works by examining content structure:
259
- JSON: Valid JSON syntax detected automatically
260
- XML: Explicit type specification recommended for XML namespaces
261
- HTML: Default fallback for markup content

Version

Tile

Files

parsing-selection.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

parsing-selection.mddocs/