Tessl Tile for pypi/xhtml2pdf@0.2.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

tessl/pypi-xhtml2pdf

PDF generator using HTML and CSS

Workspace: tessl
Visibility: Public
Created: 2 months ago
Last updated: 2 months ago
Describes: pkg:pypi/xhtml2pdf@0.2.x

To install, run

npx @tessl/cli install tessl/pypi-xhtml2pdf@0.2.0

0
# xhtml2pdf
1

2
A comprehensive HTML to PDF converter for Python that transforms HTML and CSS content into high-quality PDF documents. Built on the ReportLab Toolkit, html5lib, and pypdf, xhtml2pdf supports HTML5 and CSS 2.1 (with some CSS 3 features) and is completely written in pure Python for platform independence.
3

4
## Package Information
5

6
- **Package Name**: xhtml2pdf
7
- **Package Type**: pypi
8
- **Language**: Python
9
- **Python Version**: 3.8+
10
- **License**: Apache 2.0
11
- **Installation**: `pip install xhtml2pdf`
12
- **Optional Dependencies**: 
13
  - `pip install xhtml2pdf[pycairo]` (recommended for better graphics)
14
  - `pip install xhtml2pdf[renderpm]` (legacy rendering)
15
- **Documentation**: https://xhtml2pdf.readthedocs.io/
16

17
## Core Imports
18

19
Basic import for main functionality:
20

21
```python
22
from xhtml2pdf import pisa
23
```
24

25
Complete document processing import:
26

27
```python
28
from xhtml2pdf.document import pisaDocument
29
```
30

31
Backward compatibility import:
32

33
```python
34
from xhtml2pdf.pisa import CreatePDF  # Alias for pisaDocument
35
```
36

37
Advanced imports for specific features:
38

39
```python
40
from xhtml2pdf.context import pisaContext
41
from xhtml2pdf.files import getFile, pisaFileObject
42
from xhtml2pdf.pdf import pisaPDF
43
from xhtml2pdf.util import getColor, getSize, getBool
44
```
45

46
## Basic Usage
47

48
### Simple HTML to PDF Conversion
49

50
```python
51
from xhtml2pdf import pisa
52
import io
53

54
# HTML content
55
html_content = """
56
<html>
57
    <head>
58
        <style>
59
            body { font-family: Arial, sans-serif; }
60
            h1 { color: #333; }
61
        </style>
62
    </head>
63
    <body>
64
        <h1>Hello World</h1>
65
        <p>This is a simple PDF generated from HTML.</p>
66
    </body>
67
</html>
68
"""
69

70
# Create PDF
71
output = io.BytesIO()
72
result = pisa.pisaDocument(html_content, dest=output)
73

74
# Check for errors
75
if result.err:
76
    print("Error generating PDF")
77
else:
78
    # Save or use the PDF
79
    with open("output.pdf", "wb") as f:
80
        f.write(output.getvalue())
81
```
82

83
### File-to-File Conversion
84

85
```python
86
from xhtml2pdf import pisa
87

88
# Convert HTML file to PDF file
89
with open("input.html", "r") as source:
90
    with open("output.pdf", "wb") as dest:
91
        result = pisa.pisaDocument(source, dest)
92
        
93
if not result.err:
94
    print("PDF generated successfully")
95
```
96

97
## Architecture
98

99
xhtml2pdf operates through a multi-stage processing pipeline:
100

101
- **HTML Parser**: Uses html5lib for HTML5-compliant parsing
102
- **CSS Engine**: Complete CSS 2.1 cascade and processing system
103
- **Context Management**: pisaContext handles fonts, resources, and conversion state
104
- **ReportLab Bridge**: Converts parsed content to ReportLab document format
105
- **PDF Generation**: Creates final PDF using ReportLab's PDF engine
106

107
The library provides both high-level convenience functions and low-level APIs for advanced customization, making it suitable for simple conversions as well as complex document generation systems.
108

109
## Capabilities
110

111
### Core Document Processing
112

113
Main conversion functions for transforming HTML to PDF, including the primary pisaDocument function and lower-level story creation capabilities.
114

115
```python { .api }
116
def pisaDocument(
117
    src,
118
    dest=None,
119
    dest_bytes=False,
120
    path="",
121
    link_callback=None,
122
    debug=0,
123
    default_css=None,
124
    xhtml=False,
125
    encoding=None,
126
    xml_output=None,
127
    raise_exception=True,
128
    capacity=100 * 1024,
129
    context_meta=None,
130
    encrypt=None,
131
    signature=None,
132
    **kwargs
133
):
134
    """
135
    Convert HTML to PDF.
136
    
137
    Args:
138
        src: HTML source (string, file-like object, or filename)
139
        dest: Output destination (file-like object or filename)
140
        dest_bytes: Return PDF as bytes if True
141
        path: Base path for relative resources
142
        link_callback: Function to resolve URLs and file paths
143
        debug: Debug level (0-2)
144
        default_css: Custom default CSS string
145
        xhtml: Force XHTML parsing
146
        encoding: Character encoding for source
147
        xml_output: XML output options
148
        raise_exception: Raise exceptions on errors
149
        capacity: Memory capacity for temp files
150
        context_meta: Additional context metadata
151
        encrypt: PDF encryption settings
152
        signature: PDF signature settings
153
    
154
    Returns:
155
        pisaContext: Processing context with results and errors
156
    """
157
```
158

159
[Document Processing](./document-processing.md)
160

161
### Context and Configuration Management
162

163
Advanced processing context management for controlling fonts, CSS, resources, and conversion behavior throughout the HTML-to-PDF pipeline.
164

165
```python { .api }
166
class pisaContext:
167
    def __init__(self, path="", debug=0, capacity=-1): ...
168
    def addCSS(self, value): ...
169
    def parseCSS(self): ...
170
    def addFrag(self, text="", frag=None): ...
171
    def getFile(self, name, relative=None): ...
172
    def getFontName(self, names, default="helvetica"): ...
173
    def registerFont(self, fontname, alias=None): ...
174
```
175

176
[Context Management](./context-management.md)
177

178
### File and Resource Handling
179

180
Comprehensive file and resource management system supporting local files, URLs, data URIs, and various resource types with automatic MIME type detection.
181

182
```python { .api }
183
def getFile(*a, **kw): ...
184
class pisaFileObject:
185
    def __init__(self, uri, basepath=None, callback=None): ...
186
    def getFileContent(self): ...
187
    def getMimeType(self): ...
188
```
189

190
[File Handling](./file-handling.md)
191

192
### CSS Processing and Styling
193

194
Advanced CSS parsing, cascade processing, and style application system supporting CSS 2.1 and select CSS 3 features for precise document styling.
195

196
```python { .api }
197
class pisaCSSBuilder:
198
    def atFontFace(self, declarations): ...
199
    def atPage(self): ...
200
    def atFrame(self): ...
201

202
class pisaCSSParser:
203
    def parseExternal(self, cssResourceName): ...
204
```
205

206
[CSS Processing](./css-processing.md)
207

208
### Utility Functions and Helpers
209

210
Collection of utility functions for size conversion, color handling, coordinate calculation, text processing, and other common operations.
211

212
```python { .api }
213
def getColor(value, default=None): ...
214
def getSize(value, relative=0, base=None, default=0.0): ...
215
def getBool(s): ...
216
def getAlign(value, default=TA_LEFT): ...
217
def arabic_format(text, language): ...
218
```
219

220
[Utilities](./utilities.md)
221

222
### PDF Manipulation and Advanced Features
223

224
PDF document manipulation, joining, encryption, digital signatures, and watermark capabilities for advanced PDF processing.
225

226
```python { .api }
227
class pisaPDF:
228
    def __init__(self, capacity=-1): ...
229
    def addFromURI(self, url, basepath=None): ...
230
    def join(self, file=None): ...
231

232
class PDFSignature:
233
    @staticmethod
234
    def sign(): ...
235
```
236

237
[PDF Features](./pdf-features.md)
238

239
### Command Line Interface
240

241
Complete command-line interface for batch processing and integration with shell scripts and automated workflows.
242

243
```python { .api }
244
def command(): ...
245
def execute(): ...
246
def usage(): ...
247
def showLogging(*, debug=False): ...
248
```
249

250
[Command Line](./command-line.md)
251

252
### WSGI Integration
253

254
WSGI middleware components for integrating PDF generation directly into web applications with automatic HTML-to-PDF conversion.
255

256
```python { .api }
257
class PisaMiddleware:
258
    def __init__(self, app): ...
259
    def __call__(self, environ, start_response): ...
260
```
261

262
[WSGI Integration](./wsgi-integration.md)
263

264
## Error Handling
265

266
xhtml2pdf uses a context-based error handling system:
267

268
```python
269
result = pisa.pisaDocument(html_content, dest=output)
270

271
# Check for errors
272
if result.err:
273
    print(f"Errors occurred during conversion: {result.log}")
274
    
275
# Check for warnings  
276
if result.warn:
277
    print(f"Warnings: {result.log}")
278
```
279

280
Common exceptions that may be raised:
281
- `IOError`: File access issues when reading HTML files or writing PDF output
282
- `FileNotFoundError`: Missing HTML files, CSS files, or image resources
283
- `PermissionError`: Insufficient permissions to read/write files
284
- `UnicodeDecodeError`: Character encoding problems in HTML/CSS content
285
- `ImportError`: Missing optional dependencies (pycairo, renderpm, pyHanko)
286
- `ValueError`: Invalid configuration parameters or malformed HTML/CSS
287
- `MemoryError`: Insufficient memory for large document processing
288
- Various ReportLab exceptions:
289
  - `reportlab.platypus.doctemplate.LayoutError`: Page layout issues
290
  - `reportlab.lib.colors.ColorError`: Invalid color specifications
291
  - PDF generation and rendering errors
292

293
Network-related exceptions (for URL resources):
294
- `urllib.error.URLError`: Network connectivity issues
295
- `urllib.error.HTTPError`: HTTP errors when fetching remote resources
296
- `ssl.SSLError`: SSL certificate issues for HTTPS resources
297

298
## Types
299

300
```python { .api }
301
class pisaContext:
302
    """
303
    Main processing context for HTML-to-PDF conversion.
304
    
305
    Attributes:
306
        err (int): Error count
307
        warn (int): Warning count  
308
        log (list): Processing log messages
309
        cssText (str): Accumulated CSS text
310
        cssParser: CSS parser instance
311
        fontList (list): Available fonts
312
        path (str): Base path for resources
313
    """
314

315
class pisaFileObject:
316
    """
317
    Unified file object for various URI types.
318
    
319
    Handles local files, URLs, data URIs, and byte streams
320
    with automatic MIME type detection and content processing.
321
    """
322

323
class pisaTempFile:
324
    """
325
    Temporary file handler for PDF generation.
326
    
327
    Manages temporary storage during conversion process
328
    with automatic cleanup and memory management.
329
    """
330
```