0
# Types and Objects
1
2
Low-level PDF object types for advanced manipulation, constants, type definitions, and utility functions used throughout the PyPDF2 library. These components provide the foundation for PDF specification-level operations.
3
4
## Capabilities
5
6
### Generic PDF Objects
7
8
Base classes and data structures that represent PDF objects according to the PDF specification.
9
10
```python { .api }
11
class PdfObject:
12
"""Base class for all PDF objects."""
13
14
class NullObject(PdfObject):
15
"""PDF null object representation."""
16
17
class BooleanObject(PdfObject):
18
"""PDF boolean object (true/false)."""
19
20
class IndirectObject(PdfObject):
21
"""PDF indirect object reference."""
22
23
@property
24
def idnum(self) -> int:
25
"""Object ID number."""
26
27
@property
28
def generation(self) -> int:
29
"""Object generation number."""
30
31
@property
32
def pdf(self):
33
"""Associated PDF reader."""
34
35
class FloatObject(float, PdfObject):
36
"""PDF floating-point number object."""
37
38
class NumberObject(int, PdfObject):
39
"""PDF integer number object."""
40
41
class ByteStringObject(bytes, PdfObject):
42
"""PDF byte string object."""
43
44
class TextStringObject(str, PdfObject):
45
"""PDF text string object."""
46
47
class NameObject(str, PdfObject):
48
"""PDF name object (starts with /)."""
49
```
50
51
### Data Structure Objects
52
53
Collections and containers for PDF data structures.
54
55
```python { .api }
56
class ArrayObject(list, PdfObject):
57
"""PDF array object (list-like)."""
58
59
class DictionaryObject(dict, PdfObject):
60
"""PDF dictionary object (dict-like)."""
61
62
class TreeObject(DictionaryObject):
63
"""PDF tree structure for hierarchical data."""
64
65
class StreamObject(PdfObject):
66
"""PDF stream object containing binary data."""
67
68
class DecodedStreamObject(StreamObject):
69
"""Decoded (uncompressed) PDF stream."""
70
71
class EncodedStreamObject(StreamObject):
72
"""Encoded (compressed) PDF stream."""
73
74
class ContentStream(DecodedStreamObject):
75
"""PDF content stream with page content operations."""
76
77
class Field(TreeObject):
78
"""PDF form field object."""
79
```
80
81
### Navigation and Annotation Objects
82
83
Objects for document navigation, bookmarks, and annotations.
84
85
```python { .api }
86
class Destination(DictionaryObject):
87
"""PDF destination for navigation."""
88
89
@property
90
def title(self) -> Optional[str]:
91
"""Destination title."""
92
93
@property
94
def page(self):
95
"""Target page reference."""
96
97
@property
98
def typ(self) -> str:
99
"""Destination type (fit type)."""
100
101
class OutlineItem(DictionaryObject):
102
"""PDF outline item (bookmark)."""
103
104
@property
105
def title(self) -> Optional[str]:
106
"""Bookmark title."""
107
108
@property
109
def page(self):
110
"""Target page reference."""
111
112
@property
113
def parent(self):
114
"""Parent outline item."""
115
116
@property
117
def children(self):
118
"""Child outline items."""
119
120
class Bookmark(OutlineItem):
121
"""DEPRECATED: Use OutlineItem instead."""
122
123
class AnnotationBuilder:
124
"""Builder for creating PDF annotations."""
125
126
# Methods for building various annotation types
127
# Implementation depends on annotation type
128
```
129
130
### Utility Objects and Functions
131
132
Helper classes and functions for PDF manipulation.
133
134
```python { .api }
135
class PageRange:
136
"""Slice-like representation of page ranges."""
137
138
def __init__(self, arg: Union[slice, "PageRange", str]):
139
"""
140
Create a PageRange from various input types.
141
142
Args:
143
arg: Range specification (string, slice, or PageRange)
144
"""
145
146
def to_slice(self) -> slice:
147
"""Convert to Python slice object."""
148
149
def indices(self, n: int) -> Tuple[int, int, int]:
150
"""
151
Get slice indices for given length.
152
153
Args:
154
n (int): Total length
155
156
Returns:
157
tuple: (start, stop, step) indices
158
"""
159
160
@staticmethod
161
def valid(input: Any) -> bool:
162
"""
163
Check if input is valid for PageRange.
164
165
Args:
166
input: Input to validate
167
168
Returns:
169
bool: True if valid
170
"""
171
172
class PaperSize:
173
"""Standard paper size constants."""
174
175
A0: 'Dimensions' # 2384 x 3371 points
176
A1: 'Dimensions' # 1685 x 2384 points
177
A2: 'Dimensions' # 1190 x 1685 points
178
A3: 'Dimensions' # 842 x 1190 points
179
A4: 'Dimensions' # 595 x 842 points
180
A5: 'Dimensions' # 420 x 595 points
181
A6: 'Dimensions' # 298 x 420 points
182
A7: 'Dimensions' # 210 x 298 points
183
A8: 'Dimensions' # 147 x 210 points
184
C4: 'Dimensions' # 649 x 918 points (envelope)
185
186
class PasswordType:
187
"""Enumeration for password validation results."""
188
189
NOT_DECRYPTED: int = 0
190
USER_PASSWORD: int = 1
191
OWNER_PASSWORD: int = 2
192
193
# Utility functions
194
def create_string_object(string: str, forced_encoding=None) -> Union[TextStringObject, ByteStringObject]:
195
"""
196
Create appropriate string object based on content.
197
198
Args:
199
string (str): String content
200
forced_encoding (str, optional): Force specific encoding
201
202
Returns:
203
Union[TextStringObject, ByteStringObject]: Appropriate string object
204
"""
205
206
def encode_pdfdocencoding(unicode_string: str) -> bytes:
207
"""
208
Encode string using PDF document encoding.
209
210
Args:
211
unicode_string (str): Unicode string to encode
212
213
Returns:
214
bytes: Encoded bytes
215
"""
216
217
def decode_pdfdocencoding(byte_string: bytes) -> str:
218
"""
219
Decode bytes using PDF document encoding.
220
221
Args:
222
byte_string (bytes): Bytes to decode
223
224
Returns:
225
str: Decoded string
226
"""
227
228
def hex_to_rgb(color: str) -> Tuple[float, float, float]:
229
"""
230
Convert hex color to RGB tuple.
231
232
Args:
233
color (str): Hex color string (e.g., "#FF0000")
234
235
Returns:
236
tuple: (red, green, blue) values 0.0-1.0
237
"""
238
239
def read_object(stream, pdf) -> PdfObject:
240
"""
241
Read a PDF object from stream.
242
243
Args:
244
stream: Input stream
245
pdf: PDF reader reference
246
247
Returns:
248
PdfObject: Parsed PDF object
249
"""
250
251
def parse_filename_page_ranges(args: List[Union[str, PageRange, None]]) -> List[Tuple[str, PageRange]]:
252
"""
253
Parse filename and page range arguments.
254
255
Args:
256
args: Command-line style arguments
257
258
Returns:
259
list: List of (filename, page_range) tuples
260
"""
261
```
262
263
### Type Definitions
264
265
Type aliases and definitions used throughout the library.
266
267
```python { .api }
268
# Border array for annotations
269
BorderArrayType = List[Union[NameObject, NumberObject, ArrayObject]]
270
271
# Outline item types
272
OutlineItemType = Union[OutlineItem, Destination]
273
274
# PDF fit types for destinations
275
FitType = Literal["/Fit", "/XYZ", "/FitH", "/FitV", "/FitR", "/FitB", "/FitBH", "/FitBV"]
276
277
# Zoom argument types
278
ZoomArgType = Union[NumberObject, NullObject, float]
279
ZoomArgsType = List[ZoomArgType]
280
281
# Complex outline structure type
282
OutlineType = List[Union[OutlineItemType, List]]
283
284
# Page layout types
285
LayoutType = Literal[
286
"/SinglePage", "/OneColumn", "/TwoColumnLeft", "/TwoColumnRight",
287
"/TwoPageLeft", "/TwoPageRight"
288
]
289
290
# Page mode types
291
PagemodeType = Literal[
292
"/UseNone", "/UseOutlines", "/UseThumbs", "/FullScreen",
293
"/UseOC", "/UseAttachments"
294
]
295
296
# Page range specification types
297
PageRangeSpec = Union[str, PageRange, Tuple[int, int], Tuple[int, int, int], List[int]]
298
299
# Dimension type for paper sizes
300
class Dimensions:
301
"""Represents paper dimensions."""
302
303
def __init__(self, width: float, height: float):
304
"""
305
Create dimensions.
306
307
Args:
308
width (float): Width in points
309
height (float): Height in points
310
"""
311
self.width = width
312
self.height = height
313
```
314
315
## Usage Examples
316
317
### Working with Generic Objects
318
319
```python
320
from PyPDF2 import PdfReader
321
from PyPDF2.generic import DictionaryObject, ArrayObject, NameObject
322
323
reader = PdfReader("document.pdf")
324
325
# Access raw PDF objects
326
for page in reader.pages:
327
# Pages are DictionaryObject instances
328
if isinstance(page, DictionaryObject):
329
# Access dictionary entries
330
mediabox = page.get("/MediaBox")
331
if isinstance(mediabox, ArrayObject):
332
print(f"MediaBox: {[float(x) for x in mediabox]}")
333
334
# Check for resources
335
resources = page.get("/Resources")
336
if resources:
337
fonts = resources.get("/Font", {})
338
print(f"Fonts: {list(fonts.keys())}")
339
```
340
341
### Using Page Ranges
342
343
```python
344
from PyPDF2 import PdfMerger, PageRange
345
346
merger = PdfMerger()
347
348
# Various ways to specify page ranges
349
merger.append("doc1.pdf", pages=PageRange("1:5")) # Pages 1-4
350
merger.append("doc2.pdf", pages=PageRange("::2")) # Every other page
351
merger.append("doc3.pdf", pages=PageRange("10:")) # Page 10 to end
352
merger.append("doc4.pdf", pages=PageRange([1, 3, 5])) # Specific pages
353
354
# Validate page range
355
if PageRange.valid("1:10"):
356
print("Valid page range")
357
358
merger.write("output.pdf")
359
merger.close()
360
```
361
362
### Working with Paper Sizes
363
364
```python
365
from PyPDF2 import PdfWriter
366
from PyPDF2.generic import PaperSize
367
368
writer = PdfWriter()
369
370
# Create pages with standard sizes
371
a4_page = writer.add_blank_page(PaperSize.A4.width, PaperSize.A4.height)
372
letter_page = writer.add_blank_page(612, 792) # US Letter
373
a3_page = writer.add_blank_page(PaperSize.A3.width, PaperSize.A3.height)
374
375
print(f"A4 size: {PaperSize.A4.width} x {PaperSize.A4.height} points")
376
print(f"A3 size: {PaperSize.A3.width} x {PaperSize.A3.height} points")
377
378
with open("standard_sizes.pdf", "wb") as output_file:
379
writer.write(output_file)
380
```
381
382
### Creating Custom PDF Objects
383
384
```python
385
from PyPDF2.generic import (
386
DictionaryObject, ArrayObject, NameObject,
387
TextStringObject, NumberObject
388
)
389
390
# Create a custom dictionary object
391
custom_dict = DictionaryObject({
392
NameObject("/Type"): NameObject("/Annotation"),
393
NameObject("/Subtype"): NameObject("/Text"),
394
NameObject("/Contents"): TextStringObject("Custom note"),
395
NameObject("/Rect"): ArrayObject([
396
NumberObject(100), NumberObject(100),
397
NumberObject(200), NumberObject(150)
398
])
399
})
400
401
print(f"Custom object: {custom_dict}")
402
```
403
404
### String Encoding Utilities
405
406
```python
407
from PyPDF2.generic import (
408
create_string_object, encode_pdfdocencoding,
409
decode_pdfdocencoding, hex_to_rgb
410
)
411
412
# Create appropriate string objects
413
text = create_string_object("Hello, World!")
414
binary_text = create_string_object("\\x00\\xff\\x42", "latin-1")
415
416
# Encoding/decoding
417
unicode_text = "Héllo, Wørld!"
418
encoded = encode_pdfdocencoding(unicode_text)
419
decoded = decode_pdfdocencoding(encoded)
420
421
print(f"Original: {unicode_text}")
422
print(f"Decoded: {decoded}")
423
424
# Color conversion
425
red_rgb = hex_to_rgb("#FF0000") # (1.0, 0.0, 0.0)
426
blue_rgb = hex_to_rgb("#0000FF") # (0.0, 0.0, 1.0)
427
print(f"Red RGB: {red_rgb}")
428
print(f"Blue RGB: {blue_rgb}")
429
```
430
431
### Working with Outlines and Destinations
432
433
```python
434
from PyPDF2 import PdfReader
435
from PyPDF2.generic import OutlineItem, Destination
436
437
reader = PdfReader("document.pdf")
438
439
# Access document outline
440
outline = reader.outline
441
if outline:
442
def print_outline(items, level=0):
443
for item in items:
444
if isinstance(item, OutlineItem):
445
indent = " " * level
446
print(f"{indent}{item.title}")
447
if hasattr(item, 'children') and item.children:
448
print_outline(item.children, level + 1)
449
elif isinstance(item, list):
450
print_outline(item, level)
451
452
print_outline(outline)
453
454
# Access named destinations
455
destinations = reader.named_destinations
456
for name, dest in destinations.items():
457
if isinstance(dest, Destination):
458
print(f"Destination '{name}' -> Page {dest.page}, Type: {dest.typ}")
459
```
460
461
### Password Type Checking
462
463
```python
464
from PyPDF2 import PdfReader, PasswordType
465
466
reader = PdfReader("encrypted.pdf")
467
468
if reader.is_encrypted:
469
# Try different password types
470
result = reader.decrypt("user_password")
471
472
if result == PasswordType.USER_PASSWORD:
473
print("Opened with user password - some restrictions may apply")
474
elif result == PasswordType.OWNER_PASSWORD:
475
print("Opened with owner password - full access")
476
elif result == PasswordType.NOT_DECRYPTED:
477
print("Password incorrect or file corrupted")
478
```
479
480
## Constants and Enumerations
481
482
PyPDF2 includes extensive constants from the PDF specification organized in the `constants` module:
483
484
### Key Constants
485
486
```python { .api }
487
# Core PDF constants
488
class Core:
489
OUTLINES = "/Outlines"
490
THREADS = "/Threads"
491
PAGE = "/Page"
492
PAGES = "/Pages"
493
CATALOG = "/Catalog"
494
495
# User access permissions
496
class UserAccessPermissions:
497
PRINT = 1 << 2
498
MODIFY = 1 << 3
499
COPY = 1 << 4
500
ADD_OR_MODIFY = 1 << 5
501
502
# PDF filter types
503
class FilterTypes:
504
FLATE_DECODE = "/FlateDecode"
505
LZW_DECODE = "/LZWDecode"
506
ASCII_HEX_DECODE = "/ASCIIHexDecode"
507
DCT_DECODE = "/DCTDecode"
508
```
509
510
These constants ensure compliance with PDF specification requirements and provide standardized access to PDF dictionary keys and values.