PDF generator using HTML and CSS
npx @tessl/cli install tessl/pypi-xhtml2pdf@0.2.00
# xhtml2pdf
1
2
A comprehensive HTML to PDF converter for Python that transforms HTML and CSS content into high-quality PDF documents. Built on the ReportLab Toolkit, html5lib, and pypdf, xhtml2pdf supports HTML5 and CSS 2.1 (with some CSS 3 features) and is completely written in pure Python for platform independence.
3
4
## Package Information
5
6
- **Package Name**: xhtml2pdf
7
- **Package Type**: pypi
8
- **Language**: Python
9
- **Python Version**: 3.8+
10
- **License**: Apache 2.0
11
- **Installation**: `pip install xhtml2pdf`
12
- **Optional Dependencies**:
13
- `pip install xhtml2pdf[pycairo]` (recommended for better graphics)
14
- `pip install xhtml2pdf[renderpm]` (legacy rendering)
15
- **Documentation**: https://xhtml2pdf.readthedocs.io/
16
17
## Core Imports
18
19
Basic import for main functionality:
20
21
```python
22
from xhtml2pdf import pisa
23
```
24
25
Complete document processing import:
26
27
```python
28
from xhtml2pdf.document import pisaDocument
29
```
30
31
Backward compatibility import:
32
33
```python
34
from xhtml2pdf.pisa import CreatePDF # Alias for pisaDocument
35
```
36
37
Advanced imports for specific features:
38
39
```python
40
from xhtml2pdf.context import pisaContext
41
from xhtml2pdf.files import getFile, pisaFileObject
42
from xhtml2pdf.pdf import pisaPDF
43
from xhtml2pdf.util import getColor, getSize, getBool
44
```
45
46
## Basic Usage
47
48
### Simple HTML to PDF Conversion
49
50
```python
51
from xhtml2pdf import pisa
52
import io
53
54
# HTML content
55
html_content = """
56
<html>
57
<head>
58
<style>
59
body { font-family: Arial, sans-serif; }
60
h1 { color: #333; }
61
</style>
62
</head>
63
<body>
64
<h1>Hello World</h1>
65
<p>This is a simple PDF generated from HTML.</p>
66
</body>
67
</html>
68
"""
69
70
# Create PDF
71
output = io.BytesIO()
72
result = pisa.pisaDocument(html_content, dest=output)
73
74
# Check for errors
75
if result.err:
76
print("Error generating PDF")
77
else:
78
# Save or use the PDF
79
with open("output.pdf", "wb") as f:
80
f.write(output.getvalue())
81
```
82
83
### File-to-File Conversion
84
85
```python
86
from xhtml2pdf import pisa
87
88
# Convert HTML file to PDF file
89
with open("input.html", "r") as source:
90
with open("output.pdf", "wb") as dest:
91
result = pisa.pisaDocument(source, dest)
92
93
if not result.err:
94
print("PDF generated successfully")
95
```
96
97
## Architecture
98
99
xhtml2pdf operates through a multi-stage processing pipeline:
100
101
- **HTML Parser**: Uses html5lib for HTML5-compliant parsing
102
- **CSS Engine**: Complete CSS 2.1 cascade and processing system
103
- **Context Management**: pisaContext handles fonts, resources, and conversion state
104
- **ReportLab Bridge**: Converts parsed content to ReportLab document format
105
- **PDF Generation**: Creates final PDF using ReportLab's PDF engine
106
107
The library provides both high-level convenience functions and low-level APIs for advanced customization, making it suitable for simple conversions as well as complex document generation systems.
108
109
## Capabilities
110
111
### Core Document Processing
112
113
Main conversion functions for transforming HTML to PDF, including the primary pisaDocument function and lower-level story creation capabilities.
114
115
```python { .api }
116
def pisaDocument(
117
src,
118
dest=None,
119
dest_bytes=False,
120
path="",
121
link_callback=None,
122
debug=0,
123
default_css=None,
124
xhtml=False,
125
encoding=None,
126
xml_output=None,
127
raise_exception=True,
128
capacity=100 * 1024,
129
context_meta=None,
130
encrypt=None,
131
signature=None,
132
**kwargs
133
):
134
"""
135
Convert HTML to PDF.
136
137
Args:
138
src: HTML source (string, file-like object, or filename)
139
dest: Output destination (file-like object or filename)
140
dest_bytes: Return PDF as bytes if True
141
path: Base path for relative resources
142
link_callback: Function to resolve URLs and file paths
143
debug: Debug level (0-2)
144
default_css: Custom default CSS string
145
xhtml: Force XHTML parsing
146
encoding: Character encoding for source
147
xml_output: XML output options
148
raise_exception: Raise exceptions on errors
149
capacity: Memory capacity for temp files
150
context_meta: Additional context metadata
151
encrypt: PDF encryption settings
152
signature: PDF signature settings
153
154
Returns:
155
pisaContext: Processing context with results and errors
156
"""
157
```
158
159
[Document Processing](./document-processing.md)
160
161
### Context and Configuration Management
162
163
Advanced processing context management for controlling fonts, CSS, resources, and conversion behavior throughout the HTML-to-PDF pipeline.
164
165
```python { .api }
166
class pisaContext:
167
def __init__(self, path="", debug=0, capacity=-1): ...
168
def addCSS(self, value): ...
169
def parseCSS(self): ...
170
def addFrag(self, text="", frag=None): ...
171
def getFile(self, name, relative=None): ...
172
def getFontName(self, names, default="helvetica"): ...
173
def registerFont(self, fontname, alias=None): ...
174
```
175
176
[Context Management](./context-management.md)
177
178
### File and Resource Handling
179
180
Comprehensive file and resource management system supporting local files, URLs, data URIs, and various resource types with automatic MIME type detection.
181
182
```python { .api }
183
def getFile(*a, **kw): ...
184
class pisaFileObject:
185
def __init__(self, uri, basepath=None, callback=None): ...
186
def getFileContent(self): ...
187
def getMimeType(self): ...
188
```
189
190
[File Handling](./file-handling.md)
191
192
### CSS Processing and Styling
193
194
Advanced CSS parsing, cascade processing, and style application system supporting CSS 2.1 and select CSS 3 features for precise document styling.
195
196
```python { .api }
197
class pisaCSSBuilder:
198
def atFontFace(self, declarations): ...
199
def atPage(self): ...
200
def atFrame(self): ...
201
202
class pisaCSSParser:
203
def parseExternal(self, cssResourceName): ...
204
```
205
206
[CSS Processing](./css-processing.md)
207
208
### Utility Functions and Helpers
209
210
Collection of utility functions for size conversion, color handling, coordinate calculation, text processing, and other common operations.
211
212
```python { .api }
213
def getColor(value, default=None): ...
214
def getSize(value, relative=0, base=None, default=0.0): ...
215
def getBool(s): ...
216
def getAlign(value, default=TA_LEFT): ...
217
def arabic_format(text, language): ...
218
```
219
220
[Utilities](./utilities.md)
221
222
### PDF Manipulation and Advanced Features
223
224
PDF document manipulation, joining, encryption, digital signatures, and watermark capabilities for advanced PDF processing.
225
226
```python { .api }
227
class pisaPDF:
228
def __init__(self, capacity=-1): ...
229
def addFromURI(self, url, basepath=None): ...
230
def join(self, file=None): ...
231
232
class PDFSignature:
233
@staticmethod
234
def sign(): ...
235
```
236
237
[PDF Features](./pdf-features.md)
238
239
### Command Line Interface
240
241
Complete command-line interface for batch processing and integration with shell scripts and automated workflows.
242
243
```python { .api }
244
def command(): ...
245
def execute(): ...
246
def usage(): ...
247
def showLogging(*, debug=False): ...
248
```
249
250
[Command Line](./command-line.md)
251
252
### WSGI Integration
253
254
WSGI middleware components for integrating PDF generation directly into web applications with automatic HTML-to-PDF conversion.
255
256
```python { .api }
257
class PisaMiddleware:
258
def __init__(self, app): ...
259
def __call__(self, environ, start_response): ...
260
```
261
262
[WSGI Integration](./wsgi-integration.md)
263
264
## Error Handling
265
266
xhtml2pdf uses a context-based error handling system:
267
268
```python
269
result = pisa.pisaDocument(html_content, dest=output)
270
271
# Check for errors
272
if result.err:
273
print(f"Errors occurred during conversion: {result.log}")
274
275
# Check for warnings
276
if result.warn:
277
print(f"Warnings: {result.log}")
278
```
279
280
Common exceptions that may be raised:
281
- `IOError`: File access issues when reading HTML files or writing PDF output
282
- `FileNotFoundError`: Missing HTML files, CSS files, or image resources
283
- `PermissionError`: Insufficient permissions to read/write files
284
- `UnicodeDecodeError`: Character encoding problems in HTML/CSS content
285
- `ImportError`: Missing optional dependencies (pycairo, renderpm, pyHanko)
286
- `ValueError`: Invalid configuration parameters or malformed HTML/CSS
287
- `MemoryError`: Insufficient memory for large document processing
288
- Various ReportLab exceptions:
289
- `reportlab.platypus.doctemplate.LayoutError`: Page layout issues
290
- `reportlab.lib.colors.ColorError`: Invalid color specifications
291
- PDF generation and rendering errors
292
293
Network-related exceptions (for URL resources):
294
- `urllib.error.URLError`: Network connectivity issues
295
- `urllib.error.HTTPError`: HTTP errors when fetching remote resources
296
- `ssl.SSLError`: SSL certificate issues for HTTPS resources
297
298
## Types
299
300
```python { .api }
301
class pisaContext:
302
"""
303
Main processing context for HTML-to-PDF conversion.
304
305
Attributes:
306
err (int): Error count
307
warn (int): Warning count
308
log (list): Processing log messages
309
cssText (str): Accumulated CSS text
310
cssParser: CSS parser instance
311
fontList (list): Available fonts
312
path (str): Base path for resources
313
"""
314
315
class pisaFileObject:
316
"""
317
Unified file object for various URI types.
318
319
Handles local files, URLs, data URIs, and byte streams
320
with automatic MIME type detection and content processing.
321
"""
322
323
class pisaTempFile:
324
"""
325
Temporary file handler for PDF generation.
326
327
Manages temporary storage during conversion process
328
with automatic cleanup and memory management.
329
"""
330
```