0
# PDF Manipulation and Advanced Features
1
2
Advanced PDF document manipulation, joining, encryption, digital signatures, and watermark capabilities for enterprise-grade PDF processing and security.
3
4
## Capabilities
5
6
### PDF Document Manipulation
7
8
Core PDF manipulation functionality for joining multiple PDFs, managing document structure, and handling complex PDF operations.
9
10
```python { .api }
11
class pisaPDF:
12
"""
13
PDF document handler for joining and manipulating PDF files.
14
15
Provides capabilities for combining multiple PDFs, managing
16
document structure, and handling PDF-specific operations.
17
"""
18
def __init__(self, capacity=-1):
19
"""
20
Initialize PDF handler with optional capacity limit.
21
22
Args:
23
capacity (int): Memory capacity limit in bytes (-1 for unlimited)
24
"""
25
26
def addFromURI(self, url, basepath=None):
27
"""
28
Add PDF content from URI (file path or URL).
29
30
Args:
31
url (str): Path or URL to PDF file
32
basepath (str, optional): Base path for relative URLs
33
"""
34
35
def addFromFile(self, f):
36
"""
37
Add PDF content from file object.
38
39
Args:
40
f: File-like object containing PDF data
41
"""
42
43
def addFromString(self, data):
44
"""
45
Add PDF content from string data.
46
47
Args:
48
data (str): PDF data as string
49
"""
50
51
def addDocument(self, doc):
52
"""
53
Add PDF document object.
54
55
Args:
56
doc: PDF document object to add
57
"""
58
59
def join(self, file=None):
60
"""
61
Join all added PDFs into single document.
62
63
Args:
64
file (optional): Output file path or file object
65
66
Returns:
67
Combined PDF document
68
"""
69
```
70
71
#### Usage Example
72
73
```python
74
from xhtml2pdf.pdf import pisaPDF
75
76
# Create PDF handler
77
pdf_handler = pisaPDF()
78
79
# Add multiple PDF sources
80
pdf_handler.addFromURI("report_part1.pdf")
81
pdf_handler.addFromURI("report_part2.pdf")
82
pdf_handler.addFromURI("appendix.pdf")
83
84
# Join into single PDF
85
with open("combined_report.pdf", "wb") as output:
86
pdf_handler.join(output)
87
```
88
89
### PDF Encryption and Security
90
91
Comprehensive PDF encryption system supporting password protection, permission controls, and access restrictions.
92
93
```python { .api }
94
def get_encrypt_instance(data):
95
"""
96
Create PDF encryption instance from configuration data.
97
98
Args:
99
data (dict): Encryption configuration with keys:
100
- userPassword (str): User password for opening PDF
101
- ownerPassword (str): Owner password for full access
102
- canPrint (bool): Allow printing permission
103
- canModify (bool): Allow modification permission
104
- canCopy (bool): Allow copying content permission
105
- canAnnotate (bool): Allow annotation permission
106
107
Returns:
108
Encryption instance for PDF generation
109
"""
110
```
111
112
#### Encryption Usage Example
113
114
```python
115
from xhtml2pdf import pisa
116
117
html_content = """
118
<html>
119
<body>
120
<h1>Confidential Document</h1>
121
<p>This document contains sensitive information.</p>
122
</body>
123
</html>
124
"""
125
126
# Configure encryption settings
127
encryption_config = {
128
'userPassword': 'view123', # Password to open PDF
129
'ownerPassword': 'admin456', # Password for full access
130
'canPrint': True, # Allow printing
131
'canModify': False, # Prevent modifications
132
'canCopy': False, # Prevent copying text
133
'canAnnotate': False # Prevent annotations
134
}
135
136
# Generate encrypted PDF
137
with open("secure_document.pdf", "wb") as dest:
138
result = pisa.pisaDocument(
139
html_content,
140
dest,
141
encrypt=encryption_config
142
)
143
144
if not result.err:
145
print("Encrypted PDF generated successfully")
146
```
147
148
### Watermarks and Backgrounds
149
150
Advanced watermark and background processing capabilities for PDF document branding and visual enhancement.
151
152
```python { .api }
153
class WaterMarks:
154
"""
155
Watermark and background processing system for PDF documents.
156
157
Provides comprehensive watermark capabilities including image overlays,
158
background patterns, opacity control, and positioning for document branding.
159
"""
160
@staticmethod
161
def process_doc(context, input_doc, output_doc):
162
"""
163
Process PDF document with watermarks and backgrounds.
164
165
Args:
166
context (pisaContext): Processing context with background settings
167
input_doc (bytes): Input PDF document data
168
output_doc (bytes): Output PDF document data
169
170
Returns:
171
tuple: (processed_pdf_bytes, has_background_flag)
172
"""
173
174
@staticmethod
175
def get_watermark(context, max_numpage):
176
"""
177
Generate watermark iterator for multi-page documents.
178
179
Args:
180
context (pisaContext): Processing context
181
max_numpage (int): Maximum number of pages
182
183
Returns:
184
Iterator: Watermark data for each page
185
"""
186
187
@staticmethod
188
def get_size_location(img, context, pagesize, *, is_portrait):
189
"""
190
Calculate watermark size and position on page.
191
192
Args:
193
img: Image object for watermark
194
context (dict): Watermark context with positioning data
195
pagesize (tuple): Page dimensions (width, height)
196
is_portrait (bool): Whether page is in portrait orientation
197
198
Returns:
199
tuple: Position and size coordinates (x, y, width, height)
200
"""
201
202
@staticmethod
203
def get_img_with_opacity(pisafile, context):
204
"""
205
Apply opacity settings to watermark image.
206
207
Args:
208
pisafile (pisaFileObject): Image file object
209
context (dict): Context with opacity settings
210
211
Returns:
212
BytesIO: Processed image with opacity applied
213
"""
214
```
215
216
#### Watermark Usage Example
217
218
```python
219
from xhtml2pdf import pisa
220
221
# HTML with background/watermark CSS
222
html_with_watermark = """
223
<html>
224
<head>
225
<style>
226
@page {
227
background-image: url('watermark.png');
228
background-opacity: 0.3;
229
background-object-position: center center;
230
}
231
body { font-family: Arial; padding: 2cm; }
232
</style>
233
</head>
234
<body>
235
<h1>Confidential Document</h1>
236
<p>This document has a watermark background.</p>
237
</body>
238
</html>
239
"""
240
241
with open("watermarked.pdf", "wb") as dest:
242
result = pisa.pisaDocument(html_with_watermark, dest)
243
```
244
245
### Digital Signatures
246
247
PDF digital signature support for document authentication, integrity verification, and non-repudiation.
248
249
```python { .api }
250
class PDFSignature:
251
"""
252
PDF digital signature handler for document authentication.
253
254
Provides capabilities for applying digital signatures to PDFs
255
for legal compliance and document integrity verification.
256
"""
257
@staticmethod
258
def sign(inputfile, output, config):
259
"""
260
Apply digital signature to PDF document.
261
262
Args:
263
inputfile: Input PDF file path or file object
264
output: Output PDF file path or file object
265
config (dict): Signature configuration with type and parameters
266
267
Creates cryptographic signature for document authentication
268
and integrity verification purposes.
269
"""
270
271
@staticmethod
272
def simple_sign(inputfile, output, config):
273
"""
274
Apply simple digital signature to PDF.
275
276
Args:
277
inputfile: Input PDF file path or file object
278
output: Output PDF file path or file object
279
config (dict): Simple signature configuration
280
"""
281
282
@staticmethod
283
def lta_sign(inputfile, output, config):
284
"""
285
Apply Long Term Archive (LTA) signature to PDF.
286
287
Args:
288
inputfile: Input PDF file path or file object
289
output: Output PDF file path or file object
290
config (dict): LTA signature configuration with timestamps
291
"""
292
293
@staticmethod
294
def get_passphrase(config):
295
"""
296
Extract passphrase from signature configuration.
297
298
Args:
299
config (dict): Signature configuration
300
301
Returns:
302
bytes: Passphrase for private key access
303
"""
304
305
@staticmethod
306
def get_signature_meta(config):
307
"""
308
Extract signature metadata from configuration.
309
310
Args:
311
config (dict): Signature configuration
312
313
Returns:
314
dict: Signature metadata (reason, location, contact info)
315
"""
316
```
317
318
#### Digital Signature Usage
319
320
```python
321
from xhtml2pdf import pisa
322
323
html_content = """
324
<html>
325
<body>
326
<h1>Legal Document</h1>
327
<p>This document requires digital signature.</p>
328
</body>
329
</html>
330
"""
331
332
# Signature configuration
333
signature_config = {
334
'certificate_path': 'path/to/certificate.p12',
335
'password': 'cert_password',
336
'reason': 'Document approval',
337
'location': 'New York, NY',
338
'contact_info': 'legal@company.com'
339
}
340
341
# Generate signed PDF
342
with open("signed_document.pdf", "wb") as dest:
343
result = pisa.pisaDocument(
344
html_content,
345
dest,
346
signature=signature_config
347
)
348
```
349
350
### Watermarks and Background Elements
351
352
PDF watermark processing for adding background images, text overlays, and document branding.
353
354
```python { .api }
355
class WaterMarks:
356
"""
357
PDF watermark processing for background elements and overlays.
358
359
Handles watermark positioning, sizing, and application
360
to PDF documents for branding and security purposes.
361
"""
362
@staticmethod
363
def get_size_location():
364
"""
365
Calculate watermark size and position parameters.
366
367
Returns:
368
Size and location parameters for watermark placement
369
"""
370
371
@staticmethod
372
def process_doc():
373
"""
374
Process document for watermark application.
375
376
Applies watermark elements to PDF document pages
377
with proper positioning and transparency settings.
378
"""
379
```
380
381
#### Watermark Usage Example
382
383
```python
384
from xhtml2pdf import pisa
385
386
# HTML with watermark CSS
387
html_with_watermark = """
388
<html>
389
<head>
390
<style>
391
@page {
392
size: A4;
393
margin: 1in;
394
background-image: url('watermark.png');
395
background-repeat: no-repeat;
396
background-position: center;
397
background-size: 50%;
398
}
399
400
body {
401
position: relative;
402
z-index: 1;
403
}
404
405
.watermark-text {
406
position: fixed;
407
top: 50%;
408
left: 50%;
409
transform: rotate(-45deg);
410
font-size: 72pt;
411
color: rgba(200, 200, 200, 0.3);
412
z-index: 0;
413
}
414
</style>
415
</head>
416
<body>
417
<div class="watermark-text">DRAFT</div>
418
<h1>Document Title</h1>
419
<p>Document content here...</p>
420
</body>
421
</html>
422
"""
423
424
with open("watermarked.pdf", "wb") as dest:
425
result = pisa.pisaDocument(html_with_watermark, dest)
426
```
427
428
### Advanced PDF Metadata
429
430
Enhanced PDF metadata management for document properties, compliance, and archival requirements.
431
432
```python { .api }
433
# PDF metadata configuration
434
metadata_config = {
435
'title': 'Annual Financial Report',
436
'author': 'Finance Department',
437
'subject': 'Q4 2023 Financial Results',
438
'keywords': 'finance, report, quarterly, 2023',
439
'creator': 'xhtml2pdf Financial System',
440
'producer': 'Company Document Generator',
441
'creation_date': '2023-12-31',
442
'modification_date': '2023-12-31',
443
'trapped': False,
444
'pdf_version': '1.4'
445
}
446
447
# Apply metadata during conversion
448
result = pisa.pisaDocument(
449
html_content,
450
dest,
451
context_meta=metadata_config
452
)
453
```
454
455
### PDF/A Compliance
456
457
Support for PDF/A standard compliance for long-term document archival and accessibility requirements.
458
459
```python
460
# PDF/A compliance configuration
461
pdfa_config = {
462
'pdf_version': '1.4',
463
'color_profile': 'sRGB',
464
'embed_fonts': True,
465
'compress_images': False,
466
'metadata_xmp': True,
467
'accessibility': True
468
}
469
470
# Generate PDF/A compliant document
471
result = pisa.pisaDocument(
472
html_content,
473
dest,
474
pdfa_compliance=pdfa_config
475
)
476
```
477
478
## Error Handling
479
480
PDF manipulation operations can raise various exceptions:
481
482
```python
483
from xhtml2pdf.pdf import pisaPDF
484
from xhtml2pdf import pisa
485
486
try:
487
# PDF manipulation
488
pdf_handler = pisaPDF()
489
pdf_handler.addFromURI("nonexistent.pdf")
490
491
except FileNotFoundError:
492
print("PDF file not found")
493
except PermissionError:
494
print("Insufficient permissions to access PDF")
495
except Exception as e:
496
print(f"PDF processing error: {e}")
497
498
try:
499
# Encrypted PDF generation
500
result = pisa.pisaDocument(html, dest, encrypt=encrypt_config)
501
502
except ValueError:
503
print("Invalid encryption configuration")
504
except ImportError:
505
print("Encryption libraries not available")
506
```
507
508
## Types
509
510
```python { .api }
511
class pisaPDF:
512
"""
513
PDF document handler for joining and manipulating PDF files.
514
515
Attributes:
516
capacity (int): Memory capacity limit
517
documents (list): List of added PDF documents
518
"""
519
520
class PDFSignature:
521
"""
522
PDF digital signature handler for document authentication.
523
524
Provides static methods for applying cryptographic signatures
525
to PDF documents for legal compliance and integrity verification.
526
"""
527
528
class WaterMarks:
529
"""
530
PDF watermark processing for background elements and overlays.
531
532
Handles watermark positioning, sizing, and application
533
with support for image and text-based watermarks.
534
"""
535
```