Tessl Tile for pypi/parsel@1.10.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

css-translation.md data-extraction.md element-modification.md index.md parsing-selection.md selectorlist-operations.md xml-namespaces.md xpath-extensions.md

element-modification.mddocs/

0
# Element Modification
1

2
Methods for removing and modifying DOM elements within the parsed document structure. These operations modify the underlying document tree and affect subsequent queries.
3

4
## Capabilities
5

6
### Element Removal
7

8
Remove selected elements from their parent containers in the document tree.
9

10
```python { .api }
11
def drop(self) -> None:
12
    """
13
    Drop matched nodes from the parent element.
14
    
15
    Removes the selected element from its parent in the DOM tree.
16
    Uses appropriate removal method based on document type:
17
    - HTML: Uses lxml's drop_tree() method
18
    - XML: Uses parent.remove() method
19
    
20
    Raises:
21
    - CannotRemoveElementWithoutRoot: Element has no root document
22
    - CannotDropElementWithoutParent: Element has no parent to remove from
23
    """
24

25
def remove(self) -> None:
26
    """
27
    Remove matched nodes from the parent element.
28
    
29
    Deprecated: Use drop() method instead.
30
    
31
    Raises:
32
    - CannotRemoveElementWithoutRoot: Element has no root document  
33
    - CannotRemoveElementWithoutParent: Element has no parent to remove from
34
    - DeprecationWarning: Method is deprecated
35
    """
36
```
37

38
**Usage Example:**
39

40
```python
41
from parsel import Selector
42

43
html = """
44
<article>
45
    <h1>Article Title</h1>
46
    <div class="ads">Advertisement content</div>
47
    <p>First paragraph of content.</p>
48
    <div class="ads">Another advertisement</div>
49
    <p>Second paragraph of content.</p>
50
    <div class="sidebar">Sidebar content</div>
51
</article>
52
"""
53

54
selector = Selector(text=html)
55

56
# Remove all advertisement elements
57
ads = selector.css('.ads')
58
ads.drop()
59

60
# Verify ads are removed
61
remaining_content = selector.css('article').get()
62
print("Ads removed:", "ads" not in remaining_content)
63

64
# Remove sidebar
65
sidebar = selector.css('.sidebar')
66
sidebar.drop()
67

68
# Check final structure - only h1 and p elements remain
69
final_structure = selector.css('article > *')
70
elements = [elem.root.tag for elem in final_structure]
71
# Returns: ['h1', 'p', 'p']
72
```
73

74
### Batch Element Removal
75

76
Remove multiple elements using SelectorList operations.
77

78
**Usage Example:**
79

80
```python
81
html_with_cleanup = """
82
<div class="content">
83
    <h2>Important Heading</h2>
84
    <script>tracking_code();</script>
85
    <p>Valuable content paragraph.</p>
86
    <div class="popup">Popup modal</div>
87
    <p>Another valuable paragraph.</p>
88
    <noscript>No JavaScript message</noscript>
89
    <footer>Footer content</footer>
90
</div>
91
"""
92

93
selector = Selector(text=html_with_cleanup)
94

95
# Remove multiple unwanted element types at once
96
unwanted = selector.css('script, .popup, noscript')
97
unwanted.drop()
98

99
# Verify cleanup
100
cleaned_content = selector.css('.content').get()
101
print("Scripts removed:", "script" not in cleaned_content)
102
print("Popups removed:", "popup" not in cleaned_content)
103
print("Noscript removed:", "noscript" not in cleaned_content)
104

105
# Extract clean content
106
clean_paragraphs = selector.css('p::text').getall()
107
# Returns: ['Valuable content paragraph.', 'Another valuable paragraph.']
108
```
109

110
### Conditional Element Removal
111

112
Remove elements based on content or attribute conditions.
113

114
**Usage Example:**
115

116
```python
117
html_with_conditions = """
118
<div class="comments">
119
    <div class="comment" data-score="5">Great article!</div>
120
    <div class="comment" data-score="1">Spam content here</div>
121
    <div class="comment" data-score="4">Very helpful, thanks.</div>
122
    <div class="comment" data-score="2">Not very useful</div>
123
    <div class="comment" data-score="5">Excellent explanation!</div>
124
</div>
125
"""
126

127
selector = Selector(text=html_with_conditions)
128

129
# Remove low-quality comments (score <= 2)
130
low_quality = selector.xpath('//div[@class="comment"][@data-score<=2]')
131
low_quality.drop()
132

133
# Verify only high-quality comments remain
134
remaining_scores = selector.css('.comment').xpath('./@data-score').getall()
135
# Returns: ['5', '4', '5'] - only scores > 2
136

137
# Remove comments containing specific text
138
spam_comments = selector.xpath('//div[@class="comment"][contains(text(), "spam")]')
139
spam_comments.drop()
140
```
141

142
### Targeted Content Removal
143

144
Remove specific content while preserving structure.
145

146
**Usage Example:**
147

148
```python
149
html_with_mixed_content = """
150
<article>
151
    <h1>Product Review</h1>
152
    <div class="meta">
153
        <span class="author">John Doe</span>
154
        <span class="date">2024-01-15</span>
155
        <span class="tracking" data-track="view">TRACK123</span>
156
    </div>
157
    <div class="content">
158
        <p>This product is amazing!</p>
159
        <div class="affiliate-link">
160
            <a href="/affiliate?id=123">Buy Now - Special Offer!</a>
161
        </div>
162
        <p>I highly recommend it to everyone.</p>
163
    </div>
164
</article>
165
"""
166

167
selector = Selector(text=html_with_mixed_content)
168

169
# Remove tracking and affiliate elements
170
tracking_elements = selector.css('[data-track], .affiliate-link')
171
tracking_elements.drop()
172

173
# Extract clean content
174
article_text = selector.css('.content p::text').getall()
175
# Returns: ['This product is amazing!', 'I highly recommend it to everyone.']
176

177
# Verify meta information is preserved (author, date kept)
178
meta_info = selector.css('.meta span:not(.tracking)::text').getall()
179
# Returns: ['John Doe', '2024-01-15']
180
```
181

182
## Exception Handling
183

184
Element modification operations can raise specific exceptions that should be handled appropriately.
185

186
### Exception Types
187

188
```python { .api }
189
class CannotRemoveElementWithoutRoot(Exception):
190
    """
191
    Raised when attempting to remove an element that has no root document.
192
    
193
    Common causes:
194
    - Trying to remove text nodes or pseudo-elements
195
    - Working with detached elements
196
    """
197

198
class CannotRemoveElementWithoutParent(Exception):
199
    """
200
    Raised when attempting to remove an element that has no parent.
201
    
202
    Common causes:
203
    - Trying to remove the root element
204
    - Working with already-removed elements
205
    """
206

207
class CannotDropElementWithoutParent(CannotRemoveElementWithoutParent):
208
    """
209
    Specific exception for drop() operations.
210
    Inherits from CannotRemoveElementWithoutParent.
211
    """
212
```
213

214
**Exception Handling Example:**
215

216
```python
217
from parsel import Selector
218
from parsel.selector import (
219
    CannotRemoveElementWithoutRoot,
220
    CannotDropElementWithoutParent
221
)
222

223
html = """
224
<div>
225
    <p>Paragraph with <em>emphasis</em> text.</p>
226
    <ul>
227
        <li>Item 1</li>
228
        <li>Item 2</li>
229
    </ul>
230
</div>
231
"""
232

233
selector = Selector(text=html)
234

235
# Safe element removal with exception handling
236
def safe_remove_elements(selector, css_query):
237
    try:
238
        elements = selector.css(css_query)
239
        elements.drop()
240
        return True
241
    except CannotRemoveElementWithoutRoot:
242
        print(f"Cannot remove {css_query}: elements have no root")
243
        return False
244
    except CannotDropElementWithoutParent:
245
        print(f"Cannot remove {css_query}: elements have no parent")
246
        return False
247

248
# Remove list items safely
249
success = safe_remove_elements(selector, 'li')
250
print(f"List items removed: {success}")
251

252
# Try to remove text nodes (will fail gracefully)
253
text_nodes = selector.xpath('//text()')
254
try:
255
    text_nodes.drop()
256
except CannotRemoveElementWithoutRoot as e:
257
    print(f"Expected error: {e}")
258

259
# Try to remove root element (will fail)
260
try:
261
    root_div = selector.css('div')
262
    if root_div:
263
        root_div[0].drop()  # Try to remove root
264
except CannotDropElementWithoutParent as e:
265
    print(f"Cannot remove root: {e}")
266
```
267

268
## Document State After Modification
269

270
Element removal permanently modifies the document structure:
271

272
- **Subsequent queries** reflect the modified document state
273
- **Removed elements** are no longer accessible via selectors
274
- **Parent-child relationships** are updated automatically
275
- **Document serialization** excludes removed elements
276

277
**State Tracking Example:**
278

279
```python
280
html = """
281
<nav>
282
    <ul>
283
        <li><a href="/home">Home</a></li>
284
        <li class="active"><a href="/products">Products</a></li>
285
        <li><a href="/contact">Contact</a></li>
286
    </ul>
287
</nav>
288
"""
289

290
selector = Selector(text=html)
291

292
# Count elements before removal
293
initial_count = len(selector.css('li'))
294
print(f"Initial list items: {initial_count}")  # 3
295

296
# Remove active item
297
active_item = selector.css('li.active')
298
active_item.drop()
299

300
# Count elements after removal
301
final_count = len(selector.css('li'))
302
print(f"Remaining list items: {final_count}")  # 2
303

304
# Verify active item is gone
305
active_check = selector.css('li.active')
306
print(f"Active items found: {len(active_check)}")  # 0
307

308
# Get final HTML structure
309
final_html = selector.css('nav').get()
310
print("Active class removed:", "active" not in final_html)
311
```
312

313
## Performance and Memory Considerations
314

315
- **Memory usage**: Removed elements are freed from memory
316
- **Query performance**: Fewer elements improve subsequent query speed  
317
- **Irreversible**: Element removal cannot be undone without re-parsing
318
- **Document size**: Serialized output is smaller after element removal

Version

Tile

Files

element-modification.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

element-modification.mddocs/