0
# XML Namespace Management
1
2
Functionality for working with XML namespaces including registration, removal, and namespace-aware queries. Essential for parsing XML documents with namespace declarations.
3
4
## Capabilities
5
6
### Namespace Registration
7
8
Register XML namespaces for use in XPath expressions.
9
10
```python { .api }
11
def register_namespace(self, prefix: str, uri: str) -> None:
12
"""
13
Register namespace prefix for use in XPath expressions.
14
15
Parameters:
16
- prefix (str): Namespace prefix to register
17
- uri (str): Namespace URI
18
19
Note:
20
- Registered namespaces persist for the lifetime of the Selector
21
- Allows XPath expressions to use registered prefixes
22
- Does not affect document structure, only query capability
23
"""
24
```
25
26
**Usage Example:**
27
28
```python
29
from parsel import Selector
30
31
xml_content = """
32
<root xmlns:books="http://example.com/books"
33
xmlns:authors="http://example.com/authors">
34
<books:catalog>
35
<books:book id="1">
36
<books:title>Python Guide</books:title>
37
<authors:author>John Doe</authors:author>
38
</books:book>
39
<books:book id="2">
40
<books:title>Web Scraping</books:title>
41
<authors:author>Jane Smith</authors:author>
42
</books:book>
43
</books:catalog>
44
</root>
45
"""
46
47
selector = Selector(text=xml_content, type="xml")
48
49
# Register namespaces for XPath queries
50
selector.register_namespace('b', 'http://example.com/books')
51
selector.register_namespace('a', 'http://example.com/authors')
52
53
# Now can use registered prefixes in XPath
54
books = selector.xpath('//b:book')
55
titles = selector.xpath('//b:title/text()').getall()
56
# Returns: ['Python Guide', 'Web Scraping']
57
58
authors = selector.xpath('//a:author/text()').getall()
59
# Returns: ['John Doe', 'Jane Smith']
60
61
# Use registered namespaces in attribute selection
62
book_ids = selector.xpath('//b:book/@id').getall()
63
# Returns: ['1', '2']
64
```
65
66
### Default Namespaces
67
68
Parsel includes built-in namespace registrations for common standards.
69
70
```python { .api }
71
# Built-in namespace registrations
72
_default_namespaces = {
73
"re": "http://exslt.org/regular-expressions",
74
"set": "http://exslt.org/sets",
75
}
76
```
77
78
**Usage Example:**
79
80
```python
81
# Built-in 're' namespace for regex functions in XPath
82
xml_with_data = """
83
<items>
84
<item>Product ABC-123</item>
85
<item>Product XYZ-456</item>
86
<item>Service DEF-789</item>
87
</items>
88
"""
89
90
selector = Selector(text=xml_with_data, type="xml")
91
92
# Use built-in 're' namespace for regex matching
93
products_only = selector.xpath('//item[re:match(text(), "Product.*")]')
94
product_texts = products_only.xpath('.//text()').getall()
95
# Returns: ['Product ABC-123', 'Product XYZ-456']
96
97
# Extract codes using regex
98
codes = selector.xpath('//item/text()[re:match(., ".*-(\d+)")]')
99
```
100
101
### Namespace Removal
102
103
Remove all namespace declarations from XML documents for simplified processing.
104
105
```python { .api }
106
def remove_namespaces(self) -> None:
107
"""
108
Remove all namespaces from the document.
109
110
This operation:
111
- Removes namespace prefixes from element and attribute names
112
- Removes namespace declarations
113
- Enables namespace-less XPath queries
114
- Modifies the document structure permanently
115
116
Note:
117
- Irreversible operation on the current Selector
118
- Useful when namespace complexity interferes with data extraction
119
- Use with caution as it changes document semantics
120
"""
121
```
122
123
**Usage Example:**
124
125
```python
126
xml_with_namespaces = """
127
<root xmlns:product="http://example.com/product"
128
xmlns:meta="http://example.com/metadata">
129
<product:catalog meta:version="1.0">
130
<product:item product:id="123" meta:created="2024-01-01">
131
<product:name>Widget</product:name>
132
<product:price>19.99</product:price>
133
</product:item>
134
</product:catalog>
135
</root>
136
"""
137
138
selector = Selector(text=xml_with_namespaces, type="xml")
139
140
# Before namespace removal - requires namespace registration
141
selector.register_namespace('p', 'http://example.com/product')
142
selector.register_namespace('m', 'http://example.com/metadata')
143
names_with_ns = selector.xpath('//p:name/text()').getall()
144
145
# Remove all namespaces
146
selector.remove_namespaces()
147
148
# After namespace removal - simple XPath works
149
names_without_ns = selector.xpath('//name/text()').getall()
150
# Returns: ['Widget']
151
152
# Attributes also lose namespace prefixes
153
item_id = selector.xpath('//item/@id').get()
154
# Returns: '123'
155
156
# All namespace-prefixed elements become simple elements
157
all_items = selector.xpath('//item')
158
all_catalogs = selector.xpath('//catalog')
159
```
160
161
### Namespace-Aware Queries
162
163
Use namespaces in XPath expressions with proper prefix handling.
164
165
**Usage Example:**
166
167
```python
168
# Complex XML with multiple namespaces
169
complex_xml = """
170
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
171
xmlns:web="http://example.com/webservice">
172
<soap:Header>
173
<web:Authentication>
174
<web:Token>abc123</web:Token>
175
</web:Authentication>
176
</soap:Header>
177
<soap:Body>
178
<web:GetDataResponse>
179
<web:Data>
180
<web:Record id="1">
181
<web:Name>Alice</web:Name>
182
<web:Age>30</web:Age>
183
</web:Record>
184
<web:Record id="2">
185
<web:Name>Bob</web:Name>
186
<web:Age>25</web:Age>
187
</web:Record>
188
</web:Data>
189
</web:GetDataResponse>
190
</soap:Body>
191
</soap:Envelope>
192
"""
193
194
selector = Selector(text=complex_xml, type="xml")
195
196
# Register both namespaces
197
selector.register_namespace('soap', 'http://schemas.xmlsoap.org/soap/envelope/')
198
selector.register_namespace('web', 'http://example.com/webservice')
199
200
# Extract authentication token
201
token = selector.xpath('//web:Token/text()').get()
202
# Returns: 'abc123'
203
204
# Extract all record data
205
records = selector.xpath('//web:Record')
206
for record in records:
207
record_id = record.xpath('./@id').get()
208
name = record.xpath('.//web:Name/text()').get()
209
age = record.xpath('.//web:Age/text()').get()
210
print(f"Record {record_id}: {name}, age {age}")
211
212
# Extract names using registered namespaces
213
all_names = selector.xpath('//web:Name/text()').getall()
214
# Returns: ['Alice', 'Bob']
215
```
216
217
### Runtime Namespace Handling
218
219
Pass namespaces to individual XPath queries without permanent registration.
220
221
**Usage Example:**
222
223
```python
224
xml_content = """
225
<root xmlns:temp="http://temp.namespace.com">
226
<temp:data>
227
<temp:item>Value 1</temp:item>
228
<temp:item>Value 2</temp:item>
229
</temp:data>
230
</root>
231
"""
232
233
selector = Selector(text=xml_content, type="xml")
234
235
# Pass namespaces directly to xpath() call
236
temp_namespaces = {'temp': 'http://temp.namespace.com'}
237
items = selector.xpath('//temp:item/text()', namespaces=temp_namespaces).getall()
238
# Returns: ['Value 1', 'Value 2']
239
240
# Combine registered and runtime namespaces
241
selector.register_namespace('root', 'http://temp.namespace.com')
242
# Runtime namespaces supplement registered ones
243
data = selector.xpath('//root:data', namespaces={'extra': 'http://extra.com'})
244
```
245
246
## Best Practices
247
248
### When to Use Namespaces
249
250
- **Always register namespaces** for XML documents with namespace declarations
251
- **Use runtime namespaces** for temporary or one-off queries
252
- **Remove namespaces** only when they complicate simple data extraction tasks
253
254
### Namespace Management Strategies
255
256
- **Register early**: Set up namespaces immediately after Selector creation
257
- **Use meaningful prefixes**: Choose short, descriptive namespace prefixes
258
- **Document namespace mappings**: Comment complex namespace registrations
259
- **Consider removal carefully**: Only remove namespaces when absolutely necessary
260
261
### Performance Considerations
262
263
- Namespace registration has minimal performance impact
264
- Namespace removal is irreversible and may affect subsequent operations
265
- Runtime namespace passing is slightly slower than registered namespaces