0
# pikepdf
1
2
A comprehensive Python library for reading, writing, and manipulating PDF files, built on top of the mature qpdf C++ library. It provides a Pythonic API for PDF operations including page manipulation, metadata editing, form field handling, encryption/decryption, and content transformation with superior performance compared to pure Python alternatives.
3
4
## Package Information
5
6
- **Package Name**: pikepdf
7
- **Language**: Python
8
- **Installation**: `pip install pikepdf`
9
10
## Core Imports
11
12
```python
13
import pikepdf
14
```
15
16
Common for working with PDFs:
17
18
```python
19
from pikepdf import Pdf
20
```
21
22
## Basic Usage
23
24
```python
25
import pikepdf
26
27
# Open an existing PDF
28
pdf = pikepdf.open('input.pdf')
29
30
# Or use the Pdf class directly
31
pdf = pikepdf.Pdf.open('input.pdf')
32
33
# Create a new empty PDF
34
new_pdf = pikepdf.new()
35
36
# Add a blank page
37
new_pdf.add_blank_page(page_size=(612, 792)) # Letter size
38
39
# Access pages
40
first_page = pdf.pages[0]
41
42
# Rotate a page
43
first_page.rotate(90, relative=True)
44
45
# Copy pages between PDFs
46
new_pdf.pages.append(first_page)
47
48
# Save the PDF
49
pdf.save('output.pdf')
50
new_pdf.save('new_document.pdf')
51
52
# Always close PDFs when done
53
pdf.close()
54
new_pdf.close()
55
```
56
57
## Architecture
58
59
pikepdf is built on a layered architecture that provides both low-level control and high-level convenience:
60
61
- **Core Layer (_core)**: C++ bindings to QPDF library providing fundamental PDF operations
62
- **Object Layer**: Python wrappers for PDF data types (Array, Dictionary, Name, String, Stream)
63
- **Model Layer**: High-level abstractions for complex operations (Image, Metadata, Outlines, Encryption)
64
- **Helper Layer**: Utility functions and convenience methods for common operations
65
66
This design enables pikepdf to handle all PDF versions (1.1-1.7), maintain compatibility with PDF/A standards, and provide exceptional performance for production applications.
67
68
## Capabilities
69
70
### Core PDF Operations
71
72
Fundamental PDF document operations including opening, creating, saving, and basic manipulation of PDF files and their structure.
73
74
```python { .api }
75
class Pdf:
76
@staticmethod
77
def open(filename, *, password=None, hex_password=None, ignore_xref_streams=False,
78
suppress_warnings=True, attempt_recovery=True, inherit_page_attributes=True,
79
access_mode=AccessMode.default) -> Pdf: ...
80
81
@staticmethod
82
def new() -> Pdf: ...
83
84
def save(self, filename, *, static_id=False, preserve_pdfa=True,
85
min_version=None, force_version=None, fix_metadata_version=True,
86
compress_streams=True, stream_decode_level=None,
87
object_stream_mode=ObjectStreamMode.preserve,
88
normalize_content=False, linearize=False, qdf=False,
89
progress=None, encryption=None, samefile_check=True) -> None: ...
90
91
def close(self) -> None: ...
92
93
def open(filename, **kwargs) -> Pdf: ... # Alias for Pdf.open()
94
def new() -> Pdf: ... # Alias for Pdf.new()
95
```
96
97
[Core PDF Operations](./core-operations.md)
98
99
### PDF Objects and Data Types
100
101
PDF object types and data structures for manipulating the internal representation of PDF content, including arrays, dictionaries, names, strings, and streams.
102
103
```python { .api }
104
class Object:
105
def is_owned_by(self, possible_owner: Pdf) -> bool: ...
106
def same_owner_as(self, other: Object) -> bool: ...
107
def with_same_owner_as(self, other: Object) -> Object: ...
108
@staticmethod
109
def parse(data: str, *, pdf_context: Pdf = None) -> Object: ...
110
def unparse(self, *, resolved: bool = False) -> str: ...
111
112
class Array(Object): ...
113
class Dictionary(Object): ...
114
class Name(Object): ...
115
class String(Object): ...
116
class Stream(Object): ...
117
```
118
119
[PDF Objects and Data Types](./objects.md)
120
121
### Page Operations
122
123
Page-level operations including manipulation, rotation, content parsing, overlays, and coordinate transformations.
124
125
```python { .api }
126
class Page(Object):
127
def rotate(self, angle: int, *, relative: bool = True) -> None: ...
128
def add_overlay(self, other: Page) -> None: ...
129
def add_underlay(self, other: Page) -> None: ...
130
def parse_contents(self) -> list[ContentStreamInstruction]: ...
131
@property
132
def mediabox(self) -> Rectangle: ...
133
@property
134
def cropbox(self) -> Rectangle: ...
135
```
136
137
[Page Operations](./pages.md)
138
139
### Forms and Annotations
140
141
Interactive PDF elements including form fields, annotations, and user input handling with comprehensive field type support.
142
143
```python { .api }
144
class AcroForm:
145
@property
146
def exists(self) -> bool: ...
147
@property
148
def fields(self) -> list[AcroFormField]: ...
149
def add_field(self, field: AcroFormField) -> None: ...
150
def remove_fields(self, names: list[str]) -> None: ...
151
152
class AcroFormField:
153
@property
154
def field_type(self) -> str: ...
155
@property
156
def fully_qualified_name(self) -> str: ...
157
def set_value(self, value) -> None: ...
158
159
class Annotation(Object):
160
@property
161
def subtype(self) -> Name: ...
162
@property
163
def rect(self) -> Rectangle: ...
164
```
165
166
[Forms and Annotations](./forms.md)
167
168
### Images and Graphics
169
170
Image extraction, manipulation, and graphics operations including support for various formats and color spaces.
171
172
```python { .api }
173
class PdfImage:
174
def extract_to(self, *, fileprefix: str = 'image') -> str: ...
175
def as_pil_image(self) -> Any: ... # PIL.Image
176
@property
177
def width(self) -> int: ...
178
@property
179
def height(self) -> int: ...
180
@property
181
def bpc(self) -> int: ... # bits per component
182
@property
183
def colorspace(self) -> Name: ...
184
185
class PdfInlineImage:
186
def as_pil_image(self) -> Any: ... # PIL.Image
187
```
188
189
[Images and Graphics](./images.md)
190
191
### Encryption and Security
192
193
PDF encryption, decryption, password handling, and permission management for document security.
194
195
```python { .api }
196
class Encryption:
197
def __init__(self, *, owner: str = '', user: str = '', R: int = 6,
198
allow: Permissions = None, aes: bool = True,
199
metadata: bool = True) -> None: ...
200
201
class Permissions:
202
accessibility: bool
203
assemble: bool
204
extract: bool
205
modify_annotation: bool
206
modify_assembly: bool
207
modify_form: bool
208
modify_other: bool
209
print_lowres: bool
210
print_highres: bool
211
```
212
213
[Encryption and Security](./encryption.md)
214
215
### Metadata and Document Properties
216
217
Document metadata, XMP data, and PDF properties including titles, authors, creation dates, and custom metadata fields.
218
219
```python { .api }
220
class PdfMetadata:
221
def __init__(self, pdf: Pdf, *, sync_docinfo: bool = True) -> None: ...
222
@property
223
def pdfa_status(self) -> str: ...
224
def load_from_docinfo(self, docinfo: Dictionary, *, delete_missing: bool = False) -> None: ...
225
```
226
227
[Metadata and Document Properties](./metadata.md)
228
229
### Outlines and Bookmarks
230
231
Document navigation structure including bookmarks, table of contents, and document outline management.
232
233
```python { .api }
234
class Outline:
235
@property
236
def root(self) -> OutlineItem: ...
237
def open_all(self) -> None: ...
238
def close_all(self) -> None: ...
239
240
class OutlineItem:
241
@property
242
def title(self) -> str: ...
243
@property
244
def destination(self) -> PageLocation: ...
245
@property
246
def action(self) -> Dictionary: ...
247
248
def make_page_destination(pdf: Pdf, page_num: int, *, view_type: str = 'Fit') -> Array: ...
249
```
250
251
[Outlines and Bookmarks](./outlines.md)
252
253
### Content Stream Processing
254
255
Low-level content stream parsing, token filtering, and PDF operator manipulation for advanced content processing.
256
257
```python { .api }
258
def parse_content_stream(page_or_stream) -> list[ContentStreamInstruction]: ...
259
def unparse_content_stream(instructions: list[ContentStreamInstruction]) -> bytes: ...
260
261
class ContentStreamInstruction:
262
@property
263
def operands(self) -> list[Object]: ...
264
@property
265
def operator(self) -> Operator: ...
266
267
class TokenFilter:
268
def handle_token(self, token: Token) -> None: ...
269
270
class Token:
271
@property
272
def type_(self) -> TokenType: ...
273
@property
274
def raw_value(self) -> bytes: ...
275
@property
276
def value(self) -> Object: ...
277
```
278
279
[Content Stream Processing](./content-streams.md)
280
281
### File Attachments
282
283
Embedded file management including attachment, extraction, and metadata handling for portfolio PDFs and file attachments.
284
285
```python { .api }
286
class AttachedFileSpec:
287
@staticmethod
288
def from_filepath(pdf: Pdf, path: str, *, description: str = '',
289
relationship: str = '/Unspecified') -> AttachedFileSpec: ...
290
def get_file(self) -> bytes: ...
291
def get_all_filenames(self) -> dict[str, str]: ...
292
@property
293
def filename(self) -> str: ...
294
@property
295
def description(self) -> str: ...
296
```
297
298
[File Attachments](./attachments.md)
299
300
### Advanced Operations
301
302
Specialized operations including matrix transformations, coordinate systems, job interface, and tree structures for advanced PDF manipulation.
303
304
```python { .api }
305
class Matrix:
306
def __init__(self, *args) -> None: ...
307
@staticmethod
308
def identity() -> Matrix: ...
309
def translated(self, dx: float, dy: float) -> Matrix: ...
310
def scaled(self, sx: float, sy: float) -> Matrix: ...
311
def rotated(self, angle_degrees: float) -> Matrix: ...
312
313
class Rectangle:
314
def __init__(self, llx: float, lly: float, urx: float, ury: float) -> None: ...
315
@property
316
def width(self) -> float: ...
317
@property
318
def height(self) -> float: ...
319
320
class Job:
321
def run(self) -> int: ...
322
def check_configuration(self) -> bool: ...
323
def create_pdf(self) -> Pdf: ...
324
```
325
326
[Advanced Operations](./advanced.md)
327
328
## Types
329
330
```python { .api }
331
from enum import Enum
332
333
class ObjectType(Enum):
334
uninitialized = ...
335
null = ...
336
boolean = ...
337
integer = ...
338
real = ...
339
string = ...
340
name_ = ...
341
array = ...
342
dictionary = ...
343
stream = ...
344
operator = ...
345
inlineimage = ...
346
347
class AccessMode(Enum):
348
default = ...
349
mmap = ...
350
mmap_only = ...
351
stream = ...
352
353
class StreamDecodeLevel(Enum):
354
none = ...
355
generalized = ...
356
specialized = ...
357
all = ...
358
359
class ObjectStreamMode(Enum):
360
disable = ...
361
preserve = ...
362
generate = ...
363
```
364
365
## Exception Hierarchy
366
367
```python { .api }
368
# Core exceptions
369
class PdfError(Exception): ...
370
class PasswordError(PdfError): ...
371
class DataDecodingError(PdfError): ...
372
class JobUsageError(PdfError): ...
373
class ForeignObjectError(PdfError): ...
374
class DeletedObjectError(PdfError): ...
375
376
# Model exceptions
377
class DependencyError(Exception): ...
378
class OutlineStructureError(Exception): ...
379
class UnsupportedImageTypeError(Exception): ...
380
class InvalidPdfImageError(Exception): ...
381
class HifiPrintImageNotTranscodableError(Exception): ...
382
```
383
384
## Models Module
385
386
Access to higher-level PDF constructs and specialized functionality through the models submodule.
387
388
```python { .api }
389
import pikepdf.models
390
391
# Direct access to model classes and functions:
392
# pikepdf.models.PdfMetadata
393
# pikepdf.models.EncryptionInfo
394
# pikepdf.models.ContentStreamInstructions
395
# pikepdf.models.UnparseableContentStreamInstructions
396
397
# All model classes are also available directly from main pikepdf module
398
```
399
400
## Settings and Configuration
401
402
```python { .api }
403
def get_decimal_precision() -> int: ...
404
def set_decimal_precision(precision: int) -> None: ...
405
def set_flate_compression_level(level: int) -> None: ...
406
```
407
408
## Version Information
409
410
```python { .api }
411
__version__: str # pikepdf package version
412
__libqpdf_version__: str # Underlying QPDF library version
413
```