0
# Element Modification
1
2
Methods for removing and modifying DOM elements within the parsed document structure. These operations modify the underlying document tree and affect subsequent queries.
3
4
## Capabilities
5
6
### Element Removal
7
8
Remove selected elements from their parent containers in the document tree.
9
10
```python { .api }
11
def drop(self) -> None:
12
"""
13
Drop matched nodes from the parent element.
14
15
Removes the selected element from its parent in the DOM tree.
16
Uses appropriate removal method based on document type:
17
- HTML: Uses lxml's drop_tree() method
18
- XML: Uses parent.remove() method
19
20
Raises:
21
- CannotRemoveElementWithoutRoot: Element has no root document
22
- CannotDropElementWithoutParent: Element has no parent to remove from
23
"""
24
25
def remove(self) -> None:
26
"""
27
Remove matched nodes from the parent element.
28
29
Deprecated: Use drop() method instead.
30
31
Raises:
32
- CannotRemoveElementWithoutRoot: Element has no root document
33
- CannotRemoveElementWithoutParent: Element has no parent to remove from
34
- DeprecationWarning: Method is deprecated
35
"""
36
```
37
38
**Usage Example:**
39
40
```python
41
from parsel import Selector
42
43
html = """
44
<article>
45
<h1>Article Title</h1>
46
<div class="ads">Advertisement content</div>
47
<p>First paragraph of content.</p>
48
<div class="ads">Another advertisement</div>
49
<p>Second paragraph of content.</p>
50
<div class="sidebar">Sidebar content</div>
51
</article>
52
"""
53
54
selector = Selector(text=html)
55
56
# Remove all advertisement elements
57
ads = selector.css('.ads')
58
ads.drop()
59
60
# Verify ads are removed
61
remaining_content = selector.css('article').get()
62
print("Ads removed:", "ads" not in remaining_content)
63
64
# Remove sidebar
65
sidebar = selector.css('.sidebar')
66
sidebar.drop()
67
68
# Check final structure - only h1 and p elements remain
69
final_structure = selector.css('article > *')
70
elements = [elem.root.tag for elem in final_structure]
71
# Returns: ['h1', 'p', 'p']
72
```
73
74
### Batch Element Removal
75
76
Remove multiple elements using SelectorList operations.
77
78
**Usage Example:**
79
80
```python
81
html_with_cleanup = """
82
<div class="content">
83
<h2>Important Heading</h2>
84
<script>tracking_code();</script>
85
<p>Valuable content paragraph.</p>
86
<div class="popup">Popup modal</div>
87
<p>Another valuable paragraph.</p>
88
<noscript>No JavaScript message</noscript>
89
<footer>Footer content</footer>
90
</div>
91
"""
92
93
selector = Selector(text=html_with_cleanup)
94
95
# Remove multiple unwanted element types at once
96
unwanted = selector.css('script, .popup, noscript')
97
unwanted.drop()
98
99
# Verify cleanup
100
cleaned_content = selector.css('.content').get()
101
print("Scripts removed:", "script" not in cleaned_content)
102
print("Popups removed:", "popup" not in cleaned_content)
103
print("Noscript removed:", "noscript" not in cleaned_content)
104
105
# Extract clean content
106
clean_paragraphs = selector.css('p::text').getall()
107
# Returns: ['Valuable content paragraph.', 'Another valuable paragraph.']
108
```
109
110
### Conditional Element Removal
111
112
Remove elements based on content or attribute conditions.
113
114
**Usage Example:**
115
116
```python
117
html_with_conditions = """
118
<div class="comments">
119
<div class="comment" data-score="5">Great article!</div>
120
<div class="comment" data-score="1">Spam content here</div>
121
<div class="comment" data-score="4">Very helpful, thanks.</div>
122
<div class="comment" data-score="2">Not very useful</div>
123
<div class="comment" data-score="5">Excellent explanation!</div>
124
</div>
125
"""
126
127
selector = Selector(text=html_with_conditions)
128
129
# Remove low-quality comments (score <= 2)
130
low_quality = selector.xpath('//div[@class="comment"][@data-score<=2]')
131
low_quality.drop()
132
133
# Verify only high-quality comments remain
134
remaining_scores = selector.css('.comment').xpath('./@data-score').getall()
135
# Returns: ['5', '4', '5'] - only scores > 2
136
137
# Remove comments containing specific text
138
spam_comments = selector.xpath('//div[@class="comment"][contains(text(), "spam")]')
139
spam_comments.drop()
140
```
141
142
### Targeted Content Removal
143
144
Remove specific content while preserving structure.
145
146
**Usage Example:**
147
148
```python
149
html_with_mixed_content = """
150
<article>
151
<h1>Product Review</h1>
152
<div class="meta">
153
<span class="author">John Doe</span>
154
<span class="date">2024-01-15</span>
155
<span class="tracking" data-track="view">TRACK123</span>
156
</div>
157
<div class="content">
158
<p>This product is amazing!</p>
159
<div class="affiliate-link">
160
<a href="/affiliate?id=123">Buy Now - Special Offer!</a>
161
</div>
162
<p>I highly recommend it to everyone.</p>
163
</div>
164
</article>
165
"""
166
167
selector = Selector(text=html_with_mixed_content)
168
169
# Remove tracking and affiliate elements
170
tracking_elements = selector.css('[data-track], .affiliate-link')
171
tracking_elements.drop()
172
173
# Extract clean content
174
article_text = selector.css('.content p::text').getall()
175
# Returns: ['This product is amazing!', 'I highly recommend it to everyone.']
176
177
# Verify meta information is preserved (author, date kept)
178
meta_info = selector.css('.meta span:not(.tracking)::text').getall()
179
# Returns: ['John Doe', '2024-01-15']
180
```
181
182
## Exception Handling
183
184
Element modification operations can raise specific exceptions that should be handled appropriately.
185
186
### Exception Types
187
188
```python { .api }
189
class CannotRemoveElementWithoutRoot(Exception):
190
"""
191
Raised when attempting to remove an element that has no root document.
192
193
Common causes:
194
- Trying to remove text nodes or pseudo-elements
195
- Working with detached elements
196
"""
197
198
class CannotRemoveElementWithoutParent(Exception):
199
"""
200
Raised when attempting to remove an element that has no parent.
201
202
Common causes:
203
- Trying to remove the root element
204
- Working with already-removed elements
205
"""
206
207
class CannotDropElementWithoutParent(CannotRemoveElementWithoutParent):
208
"""
209
Specific exception for drop() operations.
210
Inherits from CannotRemoveElementWithoutParent.
211
"""
212
```
213
214
**Exception Handling Example:**
215
216
```python
217
from parsel import Selector
218
from parsel.selector import (
219
CannotRemoveElementWithoutRoot,
220
CannotDropElementWithoutParent
221
)
222
223
html = """
224
<div>
225
<p>Paragraph with <em>emphasis</em> text.</p>
226
<ul>
227
<li>Item 1</li>
228
<li>Item 2</li>
229
</ul>
230
</div>
231
"""
232
233
selector = Selector(text=html)
234
235
# Safe element removal with exception handling
236
def safe_remove_elements(selector, css_query):
237
try:
238
elements = selector.css(css_query)
239
elements.drop()
240
return True
241
except CannotRemoveElementWithoutRoot:
242
print(f"Cannot remove {css_query}: elements have no root")
243
return False
244
except CannotDropElementWithoutParent:
245
print(f"Cannot remove {css_query}: elements have no parent")
246
return False
247
248
# Remove list items safely
249
success = safe_remove_elements(selector, 'li')
250
print(f"List items removed: {success}")
251
252
# Try to remove text nodes (will fail gracefully)
253
text_nodes = selector.xpath('//text()')
254
try:
255
text_nodes.drop()
256
except CannotRemoveElementWithoutRoot as e:
257
print(f"Expected error: {e}")
258
259
# Try to remove root element (will fail)
260
try:
261
root_div = selector.css('div')
262
if root_div:
263
root_div[0].drop() # Try to remove root
264
except CannotDropElementWithoutParent as e:
265
print(f"Cannot remove root: {e}")
266
```
267
268
## Document State After Modification
269
270
Element removal permanently modifies the document structure:
271
272
- **Subsequent queries** reflect the modified document state
273
- **Removed elements** are no longer accessible via selectors
274
- **Parent-child relationships** are updated automatically
275
- **Document serialization** excludes removed elements
276
277
**State Tracking Example:**
278
279
```python
280
html = """
281
<nav>
282
<ul>
283
<li><a href="/home">Home</a></li>
284
<li class="active"><a href="/products">Products</a></li>
285
<li><a href="/contact">Contact</a></li>
286
</ul>
287
</nav>
288
"""
289
290
selector = Selector(text=html)
291
292
# Count elements before removal
293
initial_count = len(selector.css('li'))
294
print(f"Initial list items: {initial_count}") # 3
295
296
# Remove active item
297
active_item = selector.css('li.active')
298
active_item.drop()
299
300
# Count elements after removal
301
final_count = len(selector.css('li'))
302
print(f"Remaining list items: {final_count}") # 2
303
304
# Verify active item is gone
305
active_check = selector.css('li.active')
306
print(f"Active items found: {len(active_check)}") # 0
307
308
# Get final HTML structure
309
final_html = selector.css('nav').get()
310
print("Active class removed:", "active" not in final_html)
311
```
312
313
## Performance and Memory Considerations
314
315
- **Memory usage**: Removed elements are freed from memory
316
- **Query performance**: Fewer elements improve subsequent query speed
317
- **Irreversible**: Element removal cannot be undone without re-parsing
318
- **Document size**: Serialized output is smaller after element removal