0
# ElementTree Processing
1
2
Secure ElementTree-based XML parsing with configurable security restrictions. DefusedXML provides drop-in replacements for xml.etree.ElementTree and xml.etree.cElementTree with comprehensive protection against XML attacks while maintaining API compatibility.
3
4
## Capabilities
5
6
### XML Parsing Functions
7
8
Core parsing functions that provide secure alternatives to standard ElementTree parsing operations.
9
10
```python { .api }
11
def parse(source, parser=None, forbid_dtd=False, forbid_entities=True, forbid_external=True):
12
"""
13
Parse XML document from file or file-like object.
14
15
Args:
16
source (str or file-like): File path or file-like object containing XML
17
parser (XMLParser, optional): Custom parser instance
18
forbid_dtd (bool): Forbid DTD processing (default: False)
19
forbid_entities (bool): Forbid entity expansion (default: True)
20
forbid_external (bool): Forbid external references (default: True)
21
22
Returns:
23
ElementTree: Parsed XML document tree
24
25
Raises:
26
ParseError: XML syntax errors
27
DTDForbidden: DTD processing attempted when forbidden
28
EntitiesForbidden: Entity processing attempted when forbidden
29
ExternalReferenceForbidden: External reference attempted when forbidden
30
"""
31
32
def iterparse(source, events=None, parser=None, forbid_dtd=False, forbid_entities=True, forbid_external=True):
33
"""
34
Parse XML document incrementally with event-based processing.
35
36
Args:
37
source (str or file-like): File path or file-like object containing XML
38
events (tuple, optional): Events to report ('start', 'end', 'start-ns', 'end-ns')
39
parser (XMLParser, optional): Custom parser instance
40
forbid_dtd (bool): Forbid DTD processing (default: False)
41
forbid_entities (bool): Forbid entity expansion (default: True)
42
forbid_external (bool): Forbid external references (default: True)
43
44
Returns:
45
iterator: Iterator yielding (event, element) tuples
46
47
Raises:
48
ParseError: XML syntax errors
49
DTDForbidden: DTD processing attempted when forbidden
50
EntitiesForbidden: Entity processing attempted when forbidden
51
ExternalReferenceForbidden: External reference attempted when forbidden
52
"""
53
54
def fromstring(text, forbid_dtd=False, forbid_entities=True, forbid_external=True):
55
"""
56
Parse XML document from string.
57
58
Args:
59
text (str or bytes): XML content as string or bytes
60
forbid_dtd (bool): Forbid DTD processing (default: False)
61
forbid_entities (bool): Forbid entity expansion (default: True)
62
forbid_external (bool): Forbid external references (default: True)
63
64
Returns:
65
Element: Root element of parsed XML document
66
67
Raises:
68
ParseError: XML syntax errors
69
DTDForbidden: DTD processing attempted when forbidden
70
EntitiesForbidden: Entity processing attempted when forbidden
71
ExternalReferenceForbidden: External reference attempted when forbidden
72
"""
73
74
def XML(text, forbid_dtd=False, forbid_entities=True, forbid_external=True):
75
"""
76
Parse XML document from string (alias for fromstring).
77
78
Args:
79
text (str or bytes): XML content as string or bytes
80
forbid_dtd (bool): Forbid DTD processing (default: False)
81
forbid_entities (bool): Forbid entity expansion (default: True)
82
forbid_external (bool): Forbid external references (default: True)
83
84
Returns:
85
Element: Root element of parsed XML document
86
87
Raises:
88
ParseError: XML syntax errors
89
DTDForbidden: DTD processing attempted when forbidden
90
EntitiesForbidden: Entity processing attempted when forbidden
91
ExternalReferenceForbidden: External reference attempted when forbidden
92
"""
93
```
94
95
**Usage Examples:**
96
97
```python
98
import defusedxml.ElementTree as ET
99
100
# Parse from file
101
tree = ET.parse('document.xml')
102
root = tree.getroot()
103
104
# Parse from string with custom security settings
105
xml_string = '<root><item>value</item></root>'
106
root = ET.fromstring(xml_string, forbid_dtd=True, forbid_entities=False)
107
108
# Incremental parsing for large documents
109
for event, elem in ET.iterparse('large_document.xml', events=('start', 'end')):
110
if event == 'end' and elem.tag == 'record':
111
process_record(elem)
112
elem.clear() # Free memory
113
```
114
115
### Secure XML Parser
116
117
DefusedXMLParser provides the core secure parsing functionality with configurable security restrictions.
118
119
```python { .api }
120
class DefusedXMLParser:
121
"""
122
Secure XML parser with configurable security restrictions.
123
124
Inherits from xml.etree.ElementTree.XMLParser but adds security
125
handlers to prevent XML bomb attacks, DTD processing attacks,
126
and external entity attacks.
127
"""
128
129
def __init__(self, html=None, target=None, encoding=None, forbid_dtd=False, forbid_entities=True, forbid_external=True):
130
"""
131
Initialize DefusedXMLParser with security settings.
132
133
Args:
134
html (deprecated): HTML parsing mode (deprecated, raises TypeError if True)
135
target (TreeBuilder, optional): Custom tree builder instance
136
encoding (str, optional): Character encoding for parsing
137
forbid_dtd (bool): Forbid DTD processing (default: False)
138
forbid_entities (bool): Forbid entity expansion (default: True)
139
forbid_external (bool): Forbid external references (default: True)
140
141
Raises:
142
TypeError: If html=True is specified (no longer supported)
143
DeprecationWarning: If html parameter is used
144
"""
145
146
def defused_start_doctype_decl(self, name, sysid, pubid, has_internal_subset):
147
"""Handler that raises DTDForbidden when DTD processing is forbidden"""
148
149
def defused_entity_decl(self, name, is_parameter_entity, value, base, sysid, pubid, notation_name):
150
"""Handler that raises EntitiesForbidden when entity processing is forbidden"""
151
152
def defused_unparsed_entity_decl(self, name, base, sysid, pubid, notation_name):
153
"""Handler that raises EntitiesForbidden for unparsed entities when forbidden"""
154
155
def defused_external_entity_ref_handler(self, context, base, sysid, pubid):
156
"""Handler that raises ExternalReferenceForbidden when external references are forbidden"""
157
```
158
159
**Usage Examples:**
160
161
```python
162
import defusedxml.ElementTree as ET
163
from xml.etree.ElementTree import TreeBuilder
164
165
# Custom parser with specific security settings
166
parser = ET.DefusedXMLParser(forbid_dtd=True, forbid_entities=False, forbid_external=True)
167
tree = ET.parse('document.xml', parser)
168
169
# Parser with custom TreeBuilder
170
custom_builder = TreeBuilder()
171
parser = ET.DefusedXMLParser(target=custom_builder, forbid_dtd=False)
172
root = ET.fromstring(xml_content, parser)
173
```
174
175
### Parser Aliases
176
177
DefusedXML provides several aliases for backward compatibility and convenience.
178
179
```python { .api }
180
XMLParser = DefusedXMLParser
181
XMLTreeBuilder = DefusedXMLParser
182
XMLParse = DefusedXMLParser # Backwards compatibility (typo in original)
183
```
184
185
### Utility Functions
186
187
Additional utility functions re-exported from the standard library.
188
189
```python { .api }
190
def tostring(element, encoding="us-ascii", method="xml"):
191
"""
192
Convert Element to XML string representation.
193
194
Re-exported from xml.etree.ElementTree.tostring for convenience.
195
This function is safe as it only serializes existing elements.
196
197
Args:
198
element (Element): Element to serialize
199
encoding (str): Character encoding (default: "us-ascii")
200
method (str): Serialization method (default: "xml")
201
202
Returns:
203
str or bytes: XML representation of element
204
"""
205
```
206
207
### Exception Re-exports
208
209
```python { .api }
210
class ParseError(Exception):
211
"""
212
XML parsing error exception.
213
214
Re-exported from xml.etree.ElementTree.ParseError for convenience.
215
Raised for XML syntax errors and malformed documents.
216
"""
217
```
218
219
## Common Usage Patterns
220
221
### Basic Secure Parsing
222
223
```python
224
import defusedxml.ElementTree as ET
225
226
# Parse with secure defaults
227
try:
228
root = ET.fromstring(xml_content)
229
for child in root:
230
print(f"{child.tag}: {child.text}")
231
except ET.ParseError as e:
232
print(f"XML parsing error: {e}")
233
```
234
235
### File Parsing with Error Handling
236
237
```python
238
import defusedxml.ElementTree as ET
239
import defusedxml
240
241
def parse_xml_file(filename):
242
"""Parse XML file with comprehensive error handling."""
243
try:
244
tree = ET.parse(filename)
245
return tree.getroot()
246
except FileNotFoundError:
247
print(f"File not found: {filename}")
248
except ET.ParseError as e:
249
print(f"XML syntax error: {e}")
250
except defusedxml.DefusedXmlException as e:
251
print(f"XML security violation: {e}")
252
return None
253
```
254
255
### Memory-Efficient Parsing for Large Files
256
257
```python
258
import defusedxml.ElementTree as ET
259
260
def process_large_xml(filename):
261
"""Process large XML files efficiently using iterparse."""
262
context = ET.iterparse(filename, events=('start', 'end'))
263
context = iter(context)
264
event, root = next(context)
265
266
for event, elem in context:
267
if event == 'end' and elem.tag == 'record':
268
# Process individual record
269
process_record(elem)
270
271
# Clear processed elements to save memory
272
elem.clear()
273
root.clear()
274
```
275
276
### Custom Security Configuration
277
278
```python
279
import defusedxml.ElementTree as ET
280
281
def parse_trusted_xml(xml_content):
282
"""Parse XML from trusted source with relaxed security."""
283
# Allow DTDs but keep entity and external reference protection
284
return ET.fromstring(xml_content, forbid_dtd=False, forbid_entities=True, forbid_external=True)
285
286
def parse_internal_xml(xml_content):
287
"""Parse XML from internal systems with minimal restrictions."""
288
# Allow DTDs and entities but block external references
289
return ET.fromstring(xml_content, forbid_dtd=False, forbid_entities=False, forbid_external=True)
290
```
291
292
## Migration from Standard Library
293
294
DefusedXML is designed as a drop-in replacement:
295
296
```python
297
# Before (vulnerable)
298
import xml.etree.ElementTree as ET
299
root = ET.fromstring(untrusted_xml)
300
301
# After (secure)
302
import defusedxml.ElementTree as ET
303
root = ET.fromstring(untrusted_xml) # Now protected by default
304
```
305
306
The API is identical except for the addition of security parameters to parsing functions.