Powerful and Pythonic XML processing library combining libxml2/libxslt with the ElementTree API.
npx @tessl/cli install tessl/pypi-lxml@6.0.00
# lxml
1
2
A comprehensive Python library for processing XML and HTML documents. lxml combines the speed and feature completeness of libxml2 and libxslt with the simplicity of Python's ElementTree API, providing fast, standards-compliant XML/HTML processing with extensive validation, transformation, and manipulation capabilities.
3
4
## Package Information
5
6
- **Package Name**: lxml
7
- **Language**: Python
8
- **Installation**: `pip install lxml`
9
- **Documentation**: https://lxml.de/
10
- **Requirements**: Python 3.8+
11
12
## Core Imports
13
14
The library provides multiple APIs optimized for different use cases:
15
16
```python
17
# Core XML/HTML processing (ElementTree-compatible)
18
from lxml import etree
19
20
# Object-oriented XML API with Python data type mapping
21
from lxml import objectify
22
23
# HTML-specific processing with form/link handling
24
from lxml import html
25
26
# Schema validation
27
from lxml.isoschematron import Schematron
28
29
# CSS selector support
30
from lxml.cssselect import CSSSelector
31
```
32
33
## Basic Usage
34
35
### XML Processing
36
37
```python
38
from lxml import etree
39
40
# Parse XML from string
41
xml_data = """
42
<bookstore>
43
<book id="1">
44
<title>Python Guide</title>
45
<author>Jane Smith</author>
46
<price>29.99</price>
47
</book>
48
<book id="2">
49
<title>XML Processing</title>
50
<author>John Doe</author>
51
<price>34.95</price>
52
</book>
53
</bookstore>
54
"""
55
56
root = etree.fromstring(xml_data)
57
58
# Find elements using XPath
59
books = root.xpath('//book[@id="1"]')
60
print(books[0].find('title').text) # "Python Guide"
61
62
# Create new elements
63
new_book = etree.SubElement(root, 'book', id="3")
64
etree.SubElement(new_book, 'title').text = "Advanced Topics"
65
etree.SubElement(new_book, 'author').text = "Alice Johnson"
66
etree.SubElement(new_book, 'price').text = "39.99"
67
68
# Serialize back to XML
69
print(etree.tostring(root, pretty_print=True, encoding='unicode'))
70
```
71
72
### HTML Processing
73
74
```python
75
from lxml import html
76
77
# Parse HTML
78
html_content = """
79
<html>
80
<head><title>Example Page</title></head>
81
<body>
82
<form action="/submit" method="post">
83
<input type="text" name="username" value="john">
84
<input type="password" name="password">
85
<button type="submit">Login</button>
86
</form>
87
<div class="content">
88
<a href="https://example.com">External Link</a>
89
<a href="/internal">Internal Link</a>
90
</div>
91
</body>
92
</html>
93
"""
94
95
doc = html.fromstring(html_content)
96
97
# Find form elements
98
form = doc.forms[0]
99
print(form.fields) # Form field dictionary
100
101
# Process links
102
html.make_links_absolute(doc, base_url='https://mysite.com')
103
for element, attribute, link, pos in html.iterlinks(doc):
104
print(f"{element.tag}.{attribute}: {link}")
105
```
106
107
### Object-Oriented API
108
109
```python
110
from lxml import objectify
111
112
# Parse XML into Python objects
113
xml_data = """
114
<data>
115
<items>
116
<item>
117
<name>Widget</name>
118
<price>19.99</price>
119
<available>true</available>
120
</item>
121
</items>
122
</data>
123
"""
124
125
root = objectify.fromstring(xml_data)
126
127
# Access as Python attributes
128
print(root.items.item.name) # "Widget"
129
print(root.items.item.price) # 19.99 (automatically converted to float)
130
print(root.items.item.available) # True (automatically converted to bool)
131
132
# Add new data
133
root.items.item.category = "Electronics"
134
print(objectify.dump(root))
135
```
136
137
## Architecture
138
139
lxml provides multiple complementary APIs built on a common foundation:
140
141
- **etree**: Low-level ElementTree-compatible API for precise XML/HTML control
142
- **objectify**: High-level Pythonic API with automatic type conversion
143
- **html**: Specialized HTML processing with web-specific features
144
- **Validation**: Multiple schema languages (DTD, RelaxNG, XML Schema, Schematron)
145
- **Processing**: XPath queries, XSLT transformations, canonicalization
146
147
The library's Cython implementation provides C-level performance while maintaining Python's ease of use, making it suitable for both simple scripts and high-performance applications processing large XML documents.
148
149
## Capabilities
150
151
### Core XML/HTML Processing
152
153
Low-level ElementTree-compatible API providing comprehensive XML and HTML parsing, manipulation, and serialization with full namespace support, error handling, and memory-efficient processing.
154
155
```python { .api }
156
# Parsing functions
157
def parse(source, parser=None, base_url=None): ...
158
def fromstring(text, parser=None, base_url=None): ...
159
def XML(text, parser=None, base_url=None): ...
160
def HTML(text, parser=None, base_url=None): ...
161
162
# Core classes
163
class Element: ...
164
class ElementTree: ...
165
class XMLParser: ...
166
class HTMLParser: ...
167
168
# Serialization
169
def tostring(element_or_tree, encoding=None, method='xml', pretty_print=False): ...
170
```
171
172
[Core XML/HTML Processing](./etree-core.md)
173
174
### Object-Oriented XML API
175
176
Pythonic XML processing that automatically converts XML data to Python objects with proper data types, providing intuitive attribute-based access and manipulation while maintaining full XML structure.
177
178
```python { .api }
179
# Parsing functions
180
def parse(source, parser=None, base_url=None): ...
181
def fromstring(text, parser=None, base_url=None): ...
182
183
# Core classes
184
class ObjectifiedElement: ...
185
class DataElement: ...
186
class ElementMaker: ...
187
188
# Type annotation functions
189
def annotate(element_or_tree, **kwargs): ...
190
def deannotate(element_or_tree, **kwargs): ...
191
```
192
193
[Object-Oriented XML API](./objectify-api.md)
194
195
### HTML Processing
196
197
Specialized HTML document processing with web-specific features including form handling, link processing, CSS class manipulation, and HTML5 parsing support.
198
199
```python { .api }
200
# HTML parsing
201
def parse(filename_or_url, parser=None, base_url=None): ...
202
def fromstring(html, base_url=None, parser=None): ...
203
def document_fromstring(html, parser=None, ensure_head_body=False): ...
204
205
# Link processing
206
def make_links_absolute(element, base_url=None): ...
207
def iterlinks(element): ...
208
def rewrite_links(element, link_repl_func): ...
209
210
# Form handling
211
def submit_form(form, extra_values=None, open_http=None): ...
212
```
213
214
[HTML Processing](./html-processing.md)
215
216
### Schema Validation
217
218
Comprehensive XML validation support including DTD, RelaxNG, W3C XML Schema, and ISO Schematron with detailed error reporting and custom validation rules.
219
220
```python { .api }
221
class DTD: ...
222
class RelaxNG: ...
223
class XMLSchema: ...
224
225
# Schematron validation
226
class Schematron: ...
227
def extract_xsd(element): ...
228
def extract_rng(element): ...
229
```
230
231
[Schema Validation](./validation.md)
232
233
### XPath and XSLT Processing
234
235
Advanced XML querying and transformation capabilities with XPath 1.0/2.0 evaluation, XSLT 1.0 stylesheets, extension functions, and namespace handling.
236
237
```python { .api }
238
class XPath: ...
239
class XPathEvaluator: ...
240
class XSLT: ...
241
class XSLTAccessControl: ...
242
243
# Utility functions
244
def canonicalize(xml_data, **options): ...
245
```
246
247
[XPath and XSLT Processing](./xpath-xslt.md)
248
249
### Utility Modules
250
251
Additional functionality including SAX interface compatibility, CSS selector support, element builders, XInclude processing, and namespace management.
252
253
```python { .api }
254
# SAX interface
255
class ElementTreeContentHandler: ...
256
def saxify(element_or_tree, content_handler): ...
257
258
# CSS selectors
259
class CSSSelector: ...
260
261
# Element builders
262
class ElementMaker: ...
263
264
# Development utilities
265
def get_include(): ...
266
```
267
268
[Utility Modules](./utility-modules.md)
269
270
## Error Handling
271
272
lxml provides a comprehensive exception hierarchy for precise error handling:
273
274
```python { .api }
275
class LxmlError(Exception): ...
276
class XMLSyntaxError(LxmlError): ...
277
class DTDError(LxmlError): ...
278
class RelaxNGError(LxmlError): ...
279
class XMLSchemaError(LxmlError): ...
280
class XPathError(LxmlError): ...
281
class XSLTError(LxmlError): ...
282
```
283
284
All validation and processing functions raise specific exceptions with detailed error messages and line number information when available.
285
286
## Types
287
288
### Core Types
289
290
```python { .api }
291
class Element:
292
"""XML element with tag, attributes, text content, and children."""
293
tag: str
294
text: str | None
295
tail: str | None
296
attrib: dict[str, str]
297
298
def find(self, path: str, namespaces: dict[str, str] = None) -> Element | None: ...
299
def findall(self, path: str, namespaces: dict[str, str] = None) -> list[Element]: ...
300
def xpath(self, path: str, **kwargs) -> list: ...
301
def get(self, key: str, default: str = None) -> str | None: ...
302
def set(self, key: str, value: str) -> None: ...
303
304
class ElementTree:
305
"""Document tree with root element and document-level operations."""
306
def getroot(self) -> Element: ...
307
def write(self, file, encoding: str = None, xml_declaration: bool = None): ...
308
def xpath(self, path: str, **kwargs) -> list: ...
309
310
class QName:
311
"""Qualified name with namespace URI and local name."""
312
def __init__(self, text_or_uri_or_element, tag: str = None): ...
313
localname: str
314
namespace: str | None
315
text: str
316
```
317
318
### Parser Types
319
320
```python { .api }
321
class XMLParser:
322
"""Configurable XML parser with validation and error handling options."""
323
def __init__(self, encoding: str = None, remove_blank_text: bool = False,
324
remove_comments: bool = False, remove_pis: bool = False,
325
strip_cdata: bool = True, recover: bool = False, **kwargs): ...
326
327
class HTMLParser:
328
"""Lenient HTML parser with automatic error recovery."""
329
def __init__(self, encoding: str = None, remove_blank_text: bool = False,
330
remove_comments: bool = False, **kwargs): ...
331
332
ParserType = XMLParser | HTMLParser
333
```