or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

css-translation.mddata-extraction.mdelement-modification.mdindex.mdparsing-selection.mdselectorlist-operations.mdxml-namespaces.mdxpath-extensions.md

xml-namespaces.mddocs/

0

# XML Namespace Management

1

2

Functionality for working with XML namespaces including registration, removal, and namespace-aware queries. Essential for parsing XML documents with namespace declarations.

3

4

## Capabilities

5

6

### Namespace Registration

7

8

Register XML namespaces for use in XPath expressions.

9

10

```python { .api }

11

def register_namespace(self, prefix: str, uri: str) -> None:

12

"""

13

Register namespace prefix for use in XPath expressions.

14

15

Parameters:

16

- prefix (str): Namespace prefix to register

17

- uri (str): Namespace URI

18

19

Note:

20

- Registered namespaces persist for the lifetime of the Selector

21

- Allows XPath expressions to use registered prefixes

22

- Does not affect document structure, only query capability

23

"""

24

```

25

26

**Usage Example:**

27

28

```python

29

from parsel import Selector

30

31

xml_content = """

32

<root xmlns:books="http://example.com/books"

33

xmlns:authors="http://example.com/authors">

34

<books:catalog>

35

<books:book id="1">

36

<books:title>Python Guide</books:title>

37

<authors:author>John Doe</authors:author>

38

</books:book>

39

<books:book id="2">

40

<books:title>Web Scraping</books:title>

41

<authors:author>Jane Smith</authors:author>

42

</books:book>

43

</books:catalog>

44

</root>

45

"""

46

47

selector = Selector(text=xml_content, type="xml")

48

49

# Register namespaces for XPath queries

50

selector.register_namespace('b', 'http://example.com/books')

51

selector.register_namespace('a', 'http://example.com/authors')

52

53

# Now can use registered prefixes in XPath

54

books = selector.xpath('//b:book')

55

titles = selector.xpath('//b:title/text()').getall()

56

# Returns: ['Python Guide', 'Web Scraping']

57

58

authors = selector.xpath('//a:author/text()').getall()

59

# Returns: ['John Doe', 'Jane Smith']

60

61

# Use registered namespaces in attribute selection

62

book_ids = selector.xpath('//b:book/@id').getall()

63

# Returns: ['1', '2']

64

```

65

66

### Default Namespaces

67

68

Parsel includes built-in namespace registrations for common standards.

69

70

```python { .api }

71

# Built-in namespace registrations

72

_default_namespaces = {

73

"re": "http://exslt.org/regular-expressions",

74

"set": "http://exslt.org/sets",

75

}

76

```

77

78

**Usage Example:**

79

80

```python

81

# Built-in 're' namespace for regex functions in XPath

82

xml_with_data = """

83

<items>

84

<item>Product ABC-123</item>

85

<item>Product XYZ-456</item>

86

<item>Service DEF-789</item>

87

</items>

88

"""

89

90

selector = Selector(text=xml_with_data, type="xml")

91

92

# Use built-in 're' namespace for regex matching

93

products_only = selector.xpath('//item[re:match(text(), "Product.*")]')

94

product_texts = products_only.xpath('.//text()').getall()

95

# Returns: ['Product ABC-123', 'Product XYZ-456']

96

97

# Extract codes using regex

98

codes = selector.xpath('//item/text()[re:match(., ".*-(\d+)")]')

99

```

100

101

### Namespace Removal

102

103

Remove all namespace declarations from XML documents for simplified processing.

104

105

```python { .api }

106

def remove_namespaces(self) -> None:

107

"""

108

Remove all namespaces from the document.

109

110

This operation:

111

- Removes namespace prefixes from element and attribute names

112

- Removes namespace declarations

113

- Enables namespace-less XPath queries

114

- Modifies the document structure permanently

115

116

Note:

117

- Irreversible operation on the current Selector

118

- Useful when namespace complexity interferes with data extraction

119

- Use with caution as it changes document semantics

120

"""

121

```

122

123

**Usage Example:**

124

125

```python

126

xml_with_namespaces = """

127

<root xmlns:product="http://example.com/product"

128

xmlns:meta="http://example.com/metadata">

129

<product:catalog meta:version="1.0">

130

<product:item product:id="123" meta:created="2024-01-01">

131

<product:name>Widget</product:name>

132

<product:price>19.99</product:price>

133

</product:item>

134

</product:catalog>

135

</root>

136

"""

137

138

selector = Selector(text=xml_with_namespaces, type="xml")

139

140

# Before namespace removal - requires namespace registration

141

selector.register_namespace('p', 'http://example.com/product')

142

selector.register_namespace('m', 'http://example.com/metadata')

143

names_with_ns = selector.xpath('//p:name/text()').getall()

144

145

# Remove all namespaces

146

selector.remove_namespaces()

147

148

# After namespace removal - simple XPath works

149

names_without_ns = selector.xpath('//name/text()').getall()

150

# Returns: ['Widget']

151

152

# Attributes also lose namespace prefixes

153

item_id = selector.xpath('//item/@id').get()

154

# Returns: '123'

155

156

# All namespace-prefixed elements become simple elements

157

all_items = selector.xpath('//item')

158

all_catalogs = selector.xpath('//catalog')

159

```

160

161

### Namespace-Aware Queries

162

163

Use namespaces in XPath expressions with proper prefix handling.

164

165

**Usage Example:**

166

167

```python

168

# Complex XML with multiple namespaces

169

complex_xml = """

170

<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"

171

xmlns:web="http://example.com/webservice">

172

<soap:Header>

173

<web:Authentication>

174

<web:Token>abc123</web:Token>

175

</web:Authentication>

176

</soap:Header>

177

<soap:Body>

178

<web:GetDataResponse>

179

<web:Data>

180

<web:Record id="1">

181

<web:Name>Alice</web:Name>

182

<web:Age>30</web:Age>

183

</web:Record>

184

<web:Record id="2">

185

<web:Name>Bob</web:Name>

186

<web:Age>25</web:Age>

187

</web:Record>

188

</web:Data>

189

</web:GetDataResponse>

190

</soap:Body>

191

</soap:Envelope>

192

"""

193

194

selector = Selector(text=complex_xml, type="xml")

195

196

# Register both namespaces

197

selector.register_namespace('soap', 'http://schemas.xmlsoap.org/soap/envelope/')

198

selector.register_namespace('web', 'http://example.com/webservice')

199

200

# Extract authentication token

201

token = selector.xpath('//web:Token/text()').get()

202

# Returns: 'abc123'

203

204

# Extract all record data

205

records = selector.xpath('//web:Record')

206

for record in records:

207

record_id = record.xpath('./@id').get()

208

name = record.xpath('.//web:Name/text()').get()

209

age = record.xpath('.//web:Age/text()').get()

210

print(f"Record {record_id}: {name}, age {age}")

211

212

# Extract names using registered namespaces

213

all_names = selector.xpath('//web:Name/text()').getall()

214

# Returns: ['Alice', 'Bob']

215

```

216

217

### Runtime Namespace Handling

218

219

Pass namespaces to individual XPath queries without permanent registration.

220

221

**Usage Example:**

222

223

```python

224

xml_content = """

225

<root xmlns:temp="http://temp.namespace.com">

226

<temp:data>

227

<temp:item>Value 1</temp:item>

228

<temp:item>Value 2</temp:item>

229

</temp:data>

230

</root>

231

"""

232

233

selector = Selector(text=xml_content, type="xml")

234

235

# Pass namespaces directly to xpath() call

236

temp_namespaces = {'temp': 'http://temp.namespace.com'}

237

items = selector.xpath('//temp:item/text()', namespaces=temp_namespaces).getall()

238

# Returns: ['Value 1', 'Value 2']

239

240

# Combine registered and runtime namespaces

241

selector.register_namespace('root', 'http://temp.namespace.com')

242

# Runtime namespaces supplement registered ones

243

data = selector.xpath('//root:data', namespaces={'extra': 'http://extra.com'})

244

```

245

246

## Best Practices

247

248

### When to Use Namespaces

249

250

- **Always register namespaces** for XML documents with namespace declarations

251

- **Use runtime namespaces** for temporary or one-off queries

252

- **Remove namespaces** only when they complicate simple data extraction tasks

253

254

### Namespace Management Strategies

255

256

- **Register early**: Set up namespaces immediately after Selector creation

257

- **Use meaningful prefixes**: Choose short, descriptive namespace prefixes

258

- **Document namespace mappings**: Comment complex namespace registrations

259

- **Consider removal carefully**: Only remove namespaces when absolutely necessary

260

261

### Performance Considerations

262

263

- Namespace registration has minimal performance impact

264

- Namespace removal is irreversible and may affect subsequent operations

265

- Runtime namespace passing is slightly slower than registered namespaces