Tessl Tile for pypi/lxml@6.0.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

tessl/pypi-lxml

Powerful and Pythonic XML processing library combining libxml2/libxslt with the ElementTree API.

Workspace: tessl
Visibility: Public
Created: 3 months ago
Last updated: 3 months ago
Describes: pkg:pypi/lxml@6.0.x

To install, run

npx @tessl/cli install tessl/pypi-lxml@6.0.0

0
# lxml
1

2
A comprehensive Python library for processing XML and HTML documents. lxml combines the speed and feature completeness of libxml2 and libxslt with the simplicity of Python's ElementTree API, providing fast, standards-compliant XML/HTML processing with extensive validation, transformation, and manipulation capabilities.
3

4
## Package Information
5

6
- **Package Name**: lxml
7
- **Language**: Python
8
- **Installation**: `pip install lxml`
9
- **Documentation**: https://lxml.de/
10
- **Requirements**: Python 3.8+
11

12
## Core Imports
13

14
The library provides multiple APIs optimized for different use cases:
15

16
```python
17
# Core XML/HTML processing (ElementTree-compatible)
18
from lxml import etree
19

20
# Object-oriented XML API with Python data type mapping
21
from lxml import objectify
22

23
# HTML-specific processing with form/link handling
24
from lxml import html
25

26
# Schema validation
27
from lxml.isoschematron import Schematron
28

29
# CSS selector support
30
from lxml.cssselect import CSSSelector
31
```
32

33
## Basic Usage
34

35
### XML Processing
36

37
```python
38
from lxml import etree
39

40
# Parse XML from string
41
xml_data = """
42
<bookstore>
43
    <book id="1">
44
        <title>Python Guide</title>
45
        <author>Jane Smith</author>
46
        <price>29.99</price>
47
    </book>
48
    <book id="2">
49
        <title>XML Processing</title>
50
        <author>John Doe</author>
51
        <price>34.95</price>
52
    </book>
53
</bookstore>
54
"""
55

56
root = etree.fromstring(xml_data)
57

58
# Find elements using XPath
59
books = root.xpath('//book[@id="1"]')
60
print(books[0].find('title').text)  # "Python Guide"
61

62
# Create new elements
63
new_book = etree.SubElement(root, 'book', id="3")
64
etree.SubElement(new_book, 'title').text = "Advanced Topics"
65
etree.SubElement(new_book, 'author').text = "Alice Johnson"
66
etree.SubElement(new_book, 'price').text = "39.99"
67

68
# Serialize back to XML
69
print(etree.tostring(root, pretty_print=True, encoding='unicode'))
70
```
71

72
### HTML Processing
73

74
```python
75
from lxml import html
76

77
# Parse HTML
78
html_content = """
79
<html>
80
<head><title>Example Page</title></head>
81
<body>
82
    <form action="/submit" method="post">
83
        <input type="text" name="username" value="john">
84
        <input type="password" name="password">
85
        <button type="submit">Login</button>
86
    </form>
87
    <div class="content">
88
        <a href="https://example.com">External Link</a>
89
        <a href="/internal">Internal Link</a>
90
    </div>
91
</body>
92
</html>
93
"""
94

95
doc = html.fromstring(html_content)
96

97
# Find form elements
98
form = doc.forms[0]
99
print(form.fields)  # Form field dictionary
100

101
# Process links
102
html.make_links_absolute(doc, base_url='https://mysite.com')
103
for element, attribute, link, pos in html.iterlinks(doc):
104
    print(f"{element.tag}.{attribute}: {link}")
105
```
106

107
### Object-Oriented API
108

109
```python
110
from lxml import objectify
111

112
# Parse XML into Python objects
113
xml_data = """
114
<data>
115
    <items>
116
        <item>
117
            <name>Widget</name>
118
            <price>19.99</price>
119
            <available>true</available>
120
        </item>
121
    </items>
122
</data>
123
"""
124

125
root = objectify.fromstring(xml_data)
126

127
# Access as Python attributes
128
print(root.items.item.name)      # "Widget"
129
print(root.items.item.price)     # 19.99 (automatically converted to float)
130
print(root.items.item.available) # True (automatically converted to bool)
131

132
# Add new data
133
root.items.item.category = "Electronics"
134
print(objectify.dump(root))
135
```
136

137
## Architecture
138

139
lxml provides multiple complementary APIs built on a common foundation:
140

141
- **etree**: Low-level ElementTree-compatible API for precise XML/HTML control
142
- **objectify**: High-level Pythonic API with automatic type conversion  
143
- **html**: Specialized HTML processing with web-specific features
144
- **Validation**: Multiple schema languages (DTD, RelaxNG, XML Schema, Schematron)
145
- **Processing**: XPath queries, XSLT transformations, canonicalization
146

147
The library's Cython implementation provides C-level performance while maintaining Python's ease of use, making it suitable for both simple scripts and high-performance applications processing large XML documents.
148

149
## Capabilities
150

151
### Core XML/HTML Processing
152

153
Low-level ElementTree-compatible API providing comprehensive XML and HTML parsing, manipulation, and serialization with full namespace support, error handling, and memory-efficient processing.
154

155
```python { .api }
156
# Parsing functions
157
def parse(source, parser=None, base_url=None): ...
158
def fromstring(text, parser=None, base_url=None): ...
159
def XML(text, parser=None, base_url=None): ...
160
def HTML(text, parser=None, base_url=None): ...
161

162
# Core classes
163
class Element: ...
164
class ElementTree: ...
165
class XMLParser: ...
166
class HTMLParser: ...
167

168
# Serialization
169
def tostring(element_or_tree, encoding=None, method='xml', pretty_print=False): ...
170
```
171

172
[Core XML/HTML Processing](./etree-core.md)
173

174
### Object-Oriented XML API
175

176
Pythonic XML processing that automatically converts XML data to Python objects with proper data types, providing intuitive attribute-based access and manipulation while maintaining full XML structure.
177

178
```python { .api }
179
# Parsing functions
180
def parse(source, parser=None, base_url=None): ...
181
def fromstring(text, parser=None, base_url=None): ...
182

183
# Core classes
184
class ObjectifiedElement: ...
185
class DataElement: ...
186
class ElementMaker: ...
187

188
# Type annotation functions
189
def annotate(element_or_tree, **kwargs): ...
190
def deannotate(element_or_tree, **kwargs): ...
191
```
192

193
[Object-Oriented XML API](./objectify-api.md)
194

195
### HTML Processing
196

197
Specialized HTML document processing with web-specific features including form handling, link processing, CSS class manipulation, and HTML5 parsing support.
198

199
```python { .api }
200
# HTML parsing
201
def parse(filename_or_url, parser=None, base_url=None): ...
202
def fromstring(html, base_url=None, parser=None): ...
203
def document_fromstring(html, parser=None, ensure_head_body=False): ...
204

205
# Link processing
206
def make_links_absolute(element, base_url=None): ...
207
def iterlinks(element): ...
208
def rewrite_links(element, link_repl_func): ...
209

210
# Form handling
211
def submit_form(form, extra_values=None, open_http=None): ...
212
```
213

214
[HTML Processing](./html-processing.md)
215

216
### Schema Validation
217

218
Comprehensive XML validation support including DTD, RelaxNG, W3C XML Schema, and ISO Schematron with detailed error reporting and custom validation rules.
219

220
```python { .api }
221
class DTD: ...
222
class RelaxNG: ...
223
class XMLSchema: ...
224

225
# Schematron validation
226
class Schematron: ...
227
def extract_xsd(element): ...
228
def extract_rng(element): ...
229
```
230

231
[Schema Validation](./validation.md)
232

233
### XPath and XSLT Processing
234

235
Advanced XML querying and transformation capabilities with XPath 1.0/2.0 evaluation, XSLT 1.0 stylesheets, extension functions, and namespace handling.
236

237
```python { .api }
238
class XPath: ...
239
class XPathEvaluator: ...
240
class XSLT: ...
241
class XSLTAccessControl: ...
242

243
# Utility functions
244
def canonicalize(xml_data, **options): ...
245
```
246

247
[XPath and XSLT Processing](./xpath-xslt.md)
248

249
### Utility Modules
250

251
Additional functionality including SAX interface compatibility, CSS selector support, element builders, XInclude processing, and namespace management.
252

253
```python { .api }
254
# SAX interface
255
class ElementTreeContentHandler: ...
256
def saxify(element_or_tree, content_handler): ...
257

258
# CSS selectors
259
class CSSSelector: ...
260

261
# Element builders
262
class ElementMaker: ...
263

264
# Development utilities
265
def get_include(): ...
266
```
267

268
[Utility Modules](./utility-modules.md)
269

270
## Error Handling
271

272
lxml provides a comprehensive exception hierarchy for precise error handling:
273

274
```python { .api }
275
class LxmlError(Exception): ...
276
class XMLSyntaxError(LxmlError): ...
277
class DTDError(LxmlError): ...
278
class RelaxNGError(LxmlError): ...
279
class XMLSchemaError(LxmlError): ...
280
class XPathError(LxmlError): ...
281
class XSLTError(LxmlError): ...
282
```
283

284
All validation and processing functions raise specific exceptions with detailed error messages and line number information when available.
285

286
## Types
287

288
### Core Types
289

290
```python { .api }
291
class Element:
292
    """XML element with tag, attributes, text content, and children."""
293
    tag: str
294
    text: str | None
295
    tail: str | None
296
    attrib: dict[str, str]
297
    
298
    def find(self, path: str, namespaces: dict[str, str] = None) -> Element | None: ...
299
    def findall(self, path: str, namespaces: dict[str, str] = None) -> list[Element]: ...
300
    def xpath(self, path: str, **kwargs) -> list: ...
301
    def get(self, key: str, default: str = None) -> str | None: ...
302
    def set(self, key: str, value: str) -> None: ...
303

304
class ElementTree:
305
    """Document tree with root element and document-level operations."""
306
    def getroot(self) -> Element: ...
307
    def write(self, file, encoding: str = None, xml_declaration: bool = None): ...
308
    def xpath(self, path: str, **kwargs) -> list: ...
309

310
class QName:
311
    """Qualified name with namespace URI and local name."""
312
    def __init__(self, text_or_uri_or_element, tag: str = None): ...
313
    localname: str
314
    namespace: str | None
315
    text: str
316
```
317

318
### Parser Types
319

320
```python { .api }
321
class XMLParser:
322
    """Configurable XML parser with validation and error handling options."""
323
    def __init__(self, encoding: str = None, remove_blank_text: bool = False, 
324
                 remove_comments: bool = False, remove_pis: bool = False,
325
                 strip_cdata: bool = True, recover: bool = False, **kwargs): ...
326

327
class HTMLParser:
328
    """Lenient HTML parser with automatic error recovery."""
329
    def __init__(self, encoding: str = None, remove_blank_text: bool = False,
330
                 remove_comments: bool = False, **kwargs): ...
331

332
ParserType = XMLParser | HTMLParser
333
```