0
# Document Processing
1
2
Core document processing functions for converting HTML and CSS content to PDF documents. These functions provide the main entry points for xhtml2pdf's conversion capabilities, handling everything from simple HTML strings to complex documents with external resources.
3
4
## Capabilities
5
6
### Main Document Conversion
7
8
The primary function for converting HTML to PDF with comprehensive configuration options for handling various input sources, output destinations, and processing parameters.
9
10
```python { .api }
11
def pisaDocument(
12
src,
13
dest=None,
14
dest_bytes=False,
15
path="",
16
link_callback=None,
17
debug=0,
18
default_css=None,
19
xhtml=False,
20
encoding=None,
21
xml_output=None,
22
raise_exception=True,
23
capacity=100 * 1024,
24
context_meta=None,
25
encrypt=None,
26
signature=None,
27
**kwargs
28
):
29
"""
30
Convert HTML to PDF with full control over processing options.
31
32
Args:
33
src: HTML source - can be:
34
- str: HTML content as string
35
- file-like object: Open file or BytesIO
36
- filename: Path to HTML file
37
dest: Output destination - can be:
38
- file-like object: Open file or BytesIO for writing
39
- filename: Path for output PDF file
40
- None: Return PDF content in context
41
dest_bytes (bool): If True and dest is None, return bytes
42
path (str): Base path for resolving relative URLs and file paths
43
link_callback (callable): Custom function to resolve URLs and file paths
44
Signature: callback(uri, rel) -> resolved_uri
45
debug (int): Debug level 0-2, higher values provide more logging
46
default_css (str): Custom default CSS to apply before document CSS
47
xhtml (bool): Force XHTML parsing mode instead of HTML5
48
encoding (str): Character encoding for source document
49
If None, encoding is auto-detected from HTML meta tags
50
xml_output: XML output configuration options
51
raise_exception (bool): Raise exceptions on conversion errors
52
capacity (int): Memory capacity in bytes for temporary files
53
context_meta (dict): Additional metadata to add to PDF context
54
encrypt (dict): PDF encryption settings with keys:
55
- userPassword: User password for PDF
56
- ownerPassword: Owner password for PDF
57
- canPrint: Allow printing (bool)
58
- canModify: Allow modifications (bool)
59
- canCopy: Allow copying content (bool)
60
- canAnnotate: Allow annotations (bool)
61
signature (dict): PDF digital signature settings
62
**kwargs: Additional processing options
63
64
Returns:
65
pisaContext: Processing context object with attributes:
66
- err (int): Number of errors encountered
67
- warn (int): Number of warnings encountered
68
- log (list): List of log messages
69
- dest: Output destination (if dest_bytes=True, contains PDF bytes)
70
"""
71
```
72
73
#### Usage Examples
74
75
**Basic HTML string to PDF file:**
76
77
```python
78
from xhtml2pdf import pisa
79
80
html = "<html><body><h1>Hello World</h1></body></html>"
81
with open("output.pdf", "wb") as dest:
82
result = pisa.pisaDocument(html, dest)
83
if result.err:
84
print(f"Errors: {result.log}")
85
```
86
87
**Convert with custom CSS and base path:**
88
89
```python
90
from xhtml2pdf import pisa
91
92
custom_css = """
93
@page {
94
size: A4;
95
margin: 2cm;
96
}
97
body { font-family: Arial; }
98
"""
99
100
html = """
101
<html>
102
<body>
103
<h1>Report</h1>
104
<img src="chart.png" />
105
</body>
106
</html>
107
"""
108
109
with open("report.pdf", "wb") as dest:
110
result = pisa.pisaDocument(
111
html,
112
dest,
113
path="/path/to/resources/", # Base path for resolving chart.png
114
default_css=custom_css,
115
debug=1
116
)
117
```
118
119
**Convert with custom link callback:**
120
121
```python
122
from xhtml2pdf import pisa
123
import os
124
125
def link_callback(uri, rel):
126
"""
127
Resolve relative URLs to absolute file paths.
128
"""
129
if uri.startswith(('http://', 'https://')):
130
return uri
131
132
# Convert relative paths to absolute paths
133
if not os.path.isabs(uri):
134
return os.path.join('/path/to/assets/', uri)
135
return uri
136
137
html = '<html><body><img src="images/logo.png" /></body></html>'
138
with open("output.pdf", "wb") as dest:
139
result = pisa.pisaDocument(html, dest, link_callback=link_callback)
140
```
141
142
**Return PDF as bytes:**
143
144
```python
145
from xhtml2pdf import pisa
146
import io
147
148
html = "<html><body><h1>Document</h1></body></html>"
149
output = io.BytesIO()
150
result = pisa.pisaDocument(html, dest=output)
151
152
if not result.err:
153
pdf_bytes = output.getvalue()
154
# Use pdf_bytes as needed
155
```
156
157
### Document Story Creation
158
159
Lower-level function for creating ReportLab story objects from HTML content, providing more granular control over the conversion process.
160
161
```python { .api }
162
def pisaStory(
163
src,
164
path="",
165
link_callback=None,
166
debug=0,
167
default_css=None,
168
xhtml=False,
169
encoding=None,
170
context=None,
171
xml_output=None,
172
**kwargs
173
):
174
"""
175
Create ReportLab story from HTML source without generating PDF.
176
177
This function provides lower-level access to the conversion process,
178
allowing you to work with the ReportLab story directly before PDF generation.
179
180
Args:
181
src: HTML source (string, file-like object, or filename)
182
path (str): Base path for relative resource resolution
183
link_callback (callable): Custom URL/file resolution function
184
debug (int): Debug level for logging (0-2)
185
default_css (str): Custom default CSS stylesheet
186
xhtml (bool): Use XHTML parsing mode
187
encoding (str): Character encoding for source
188
context (pisaContext): Existing context to use (creates new if None)
189
xml_output: XML output options
190
**kwargs: Additional processing options
191
192
Returns:
193
pisaContext: Processing context with story in context.story attribute
194
"""
195
```
196
197
#### Usage Example
198
199
```python
200
from xhtml2pdf.document import pisaStory
201
from reportlab.pdfgen import canvas
202
from reportlab.lib.pagesizes import A4
203
204
html = """
205
<html>
206
<body>
207
<h1>Chapter 1</h1>
208
<p>Content here...</p>
209
</body>
210
</html>
211
"""
212
213
# Create story from HTML
214
context = pisaStory(html, debug=1)
215
216
if not context.err:
217
# Use the story with ReportLab directly
218
pdf_canvas = canvas.Canvas("custom.pdf", pagesize=A4)
219
# ... custom processing with context.story
220
pdf_canvas.save()
221
```
222
223
### Error Document Generation
224
225
Utility function for generating error documents when conversion fails, providing user-friendly error reporting.
226
227
```python { .api }
228
def pisaErrorDocument(dest, c):
229
"""
230
Generate a PDF document containing error information.
231
232
Args:
233
dest: Output destination for error PDF
234
c (pisaContext): Context containing error information
235
236
Returns:
237
pisaContext: Updated context after error document generation
238
"""
239
```
240
241
### PDF Encryption Helper
242
243
Utility function for creating PDF encryption instances from encryption configuration data.
244
245
```python { .api }
246
def get_encrypt_instance(data):
247
"""
248
Create PDF encryption instance from configuration data.
249
250
Args:
251
data (dict): Encryption configuration with keys:
252
- userPassword (str): User password
253
- ownerPassword (str): Owner password
254
- canPrint (bool): Allow printing
255
- canModify (bool): Allow modifications
256
- canCopy (bool): Allow copying
257
- canAnnotate (bool): Allow annotations
258
259
Returns:
260
Encryption instance for PDF generation
261
"""
262
```
263
264
#### Usage Example
265
266
```python
267
from xhtml2pdf import pisa
268
269
html = "<html><body><h1>Confidential</h1></body></html>"
270
271
encrypt_config = {
272
'userPassword': 'user123',
273
'ownerPassword': 'owner456',
274
'canPrint': True,
275
'canModify': False,
276
'canCopy': False,
277
'canAnnotate': False
278
}
279
280
with open("secure.pdf", "wb") as dest:
281
result = pisa.pisaDocument(html, dest, encrypt=encrypt_config)
282
```
283
284
## Advanced Processing Options
285
286
### Memory Management
287
288
The `capacity` parameter controls memory usage during conversion:
289
290
- **Default**: 100KB - suitable for most documents
291
- **Large documents**: Increase to 1MB+ for better performance
292
- **Memory-constrained**: Decrease to 50KB or less
293
294
```python
295
# For large documents
296
result = pisa.pisaDocument(html, dest, capacity=1024*1024) # 1MB
297
298
# For memory-constrained environments
299
result = pisa.pisaDocument(html, dest, capacity=50*1024) # 50KB
300
```
301
302
### Debug Levels
303
304
Debug levels provide different amounts of processing information:
305
306
- **0**: No debug output (default)
307
- **1**: Basic processing information and warnings
308
- **2**: Detailed processing steps and CSS parsing information
309
310
```python
311
result = pisa.pisaDocument(html, dest, debug=2)
312
for log_entry in result.log:
313
print(log_entry)
314
```
315
316
### Context Metadata
317
318
Additional metadata can be embedded in the PDF:
319
320
```python
321
metadata = {
322
'author': 'John Doe',
323
'title': 'My Document',
324
'subject': 'Sample PDF',
325
'creator': 'My Application'
326
}
327
328
result = pisa.pisaDocument(html, dest, context_meta=metadata)
329
```
330
331
## Return Values and Error Handling
332
333
All document processing functions return a `pisaContext` object with these key attributes:
334
335
- **`err`** (int): Number of errors encountered (0 = success)
336
- **`warn`** (int): Number of warnings generated
337
- **`log`** (list): Detailed log messages for debugging
338
- **`dest`**: Output destination or PDF bytes (if dest_bytes=True)
339
340
```python
341
result = pisa.pisaDocument(html, dest)
342
343
# Check for success
344
if result.err:
345
print(f"Conversion failed with {result.err} errors")
346
for msg in result.log:
347
if 'ERROR' in str(msg):
348
print(f"Error: {msg}")
349
else:
350
print("PDF generated successfully")
351
352
# Handle warnings
353
if result.warn:
354
print(f"Generated with {result.warn} warnings")
355
```
356
357
## Backward Compatibility
358
359
The legacy `CreatePDF` alias is still available for backward compatibility:
360
361
```python { .api }
362
CreatePDF = pisaDocument # Backward compatibility alias
363
```
364
365
```python
366
from xhtml2pdf.pisa import CreatePDF
367
368
# Legacy usage (deprecated but still works)
369
result = CreatePDF(html, dest)
370
```
371
372
## Types
373
374
```python { .api }
375
class pisaContext:
376
"""
377
Processing context returned by document processing functions.
378
379
Attributes:
380
err (int): Error count
381
warn (int): Warning count
382
log (list): Processing log messages
383
dest: Output destination or PDF content
384
story (list): ReportLab story elements (from pisaStory)
385
cssText (str): Processed CSS content
386
path (str): Base path for resources
387
"""
388
```