Tessl Tile for pypi/pypdf2@2.12.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

errors-and-utilities.md index.md page-manipulation.md pdf-merging.md pdf-reading.md pdf-writing.md types-and-objects.md

types-and-objects.mddocs/

0
# Types and Objects
1

2
Low-level PDF object types for advanced manipulation, constants, type definitions, and utility functions used throughout the PyPDF2 library. These components provide the foundation for PDF specification-level operations.
3

4
## Capabilities
5

6
### Generic PDF Objects
7

8
Base classes and data structures that represent PDF objects according to the PDF specification.
9

10
```python { .api }
11
class PdfObject:
12
    """Base class for all PDF objects."""
13

14
class NullObject(PdfObject):
15
    """PDF null object representation."""
16

17
class BooleanObject(PdfObject):
18
    """PDF boolean object (true/false)."""
19

20
class IndirectObject(PdfObject):
21
    """PDF indirect object reference."""
22
    
23
    @property
24
    def idnum(self) -> int:
25
        """Object ID number."""
26
    
27
    @property
28
    def generation(self) -> int:
29
        """Object generation number."""
30
    
31
    @property
32
    def pdf(self):
33
        """Associated PDF reader."""
34

35
class FloatObject(float, PdfObject):
36
    """PDF floating-point number object."""
37

38
class NumberObject(int, PdfObject):
39
    """PDF integer number object."""
40

41
class ByteStringObject(bytes, PdfObject):
42
    """PDF byte string object."""
43

44
class TextStringObject(str, PdfObject):
45
    """PDF text string object."""
46

47
class NameObject(str, PdfObject):
48
    """PDF name object (starts with /)."""
49
```
50

51
### Data Structure Objects
52

53
Collections and containers for PDF data structures.
54

55
```python { .api }
56
class ArrayObject(list, PdfObject):
57
    """PDF array object (list-like)."""
58

59
class DictionaryObject(dict, PdfObject):
60
    """PDF dictionary object (dict-like)."""
61

62
class TreeObject(DictionaryObject):
63
    """PDF tree structure for hierarchical data."""
64

65
class StreamObject(PdfObject):
66
    """PDF stream object containing binary data."""
67

68
class DecodedStreamObject(StreamObject):
69
    """Decoded (uncompressed) PDF stream."""
70

71
class EncodedStreamObject(StreamObject):
72
    """Encoded (compressed) PDF stream."""
73

74
class ContentStream(DecodedStreamObject):
75
    """PDF content stream with page content operations."""
76

77
class Field(TreeObject):
78
    """PDF form field object."""
79
```
80

81
### Navigation and Annotation Objects
82

83
Objects for document navigation, bookmarks, and annotations.
84

85
```python { .api }
86
class Destination(DictionaryObject):
87
    """PDF destination for navigation."""
88
    
89
    @property
90
    def title(self) -> Optional[str]:
91
        """Destination title."""
92
    
93
    @property
94
    def page(self):
95
        """Target page reference."""
96
    
97
    @property
98
    def typ(self) -> str:
99
        """Destination type (fit type)."""
100

101
class OutlineItem(DictionaryObject):
102
    """PDF outline item (bookmark)."""
103
    
104
    @property
105
    def title(self) -> Optional[str]:
106
        """Bookmark title."""
107
    
108
    @property
109
    def page(self):
110
        """Target page reference."""
111
    
112
    @property
113
    def parent(self):
114
        """Parent outline item."""
115
    
116
    @property
117
    def children(self):
118
        """Child outline items."""
119

120
class Bookmark(OutlineItem):
121
    """DEPRECATED: Use OutlineItem instead."""
122

123
class AnnotationBuilder:
124
    """Builder for creating PDF annotations."""
125
    
126
    # Methods for building various annotation types
127
    # Implementation depends on annotation type
128
```
129

130
### Utility Objects and Functions
131

132
Helper classes and functions for PDF manipulation.
133

134
```python { .api }
135
class PageRange:
136
    """Slice-like representation of page ranges."""
137
    
138
    def __init__(self, arg: Union[slice, "PageRange", str]):
139
        """
140
        Create a PageRange from various input types.
141
        
142
        Args:
143
            arg: Range specification (string, slice, or PageRange)
144
        """
145
    
146
    def to_slice(self) -> slice:
147
        """Convert to Python slice object."""
148
    
149
    def indices(self, n: int) -> Tuple[int, int, int]:
150
        """
151
        Get slice indices for given length.
152
        
153
        Args:
154
            n (int): Total length
155
            
156
        Returns:
157
            tuple: (start, stop, step) indices
158
        """
159
    
160
    @staticmethod
161
    def valid(input: Any) -> bool:
162
        """
163
        Check if input is valid for PageRange.
164
        
165
        Args:
166
            input: Input to validate
167
            
168
        Returns:
169
            bool: True if valid
170
        """
171

172
class PaperSize:
173
    """Standard paper size constants."""
174
    
175
    A0: 'Dimensions'  # 2384 x 3371 points
176
    A1: 'Dimensions'  # 1685 x 2384 points  
177
    A2: 'Dimensions'  # 1190 x 1685 points
178
    A3: 'Dimensions'  # 842 x 1190 points
179
    A4: 'Dimensions'  # 595 x 842 points
180
    A5: 'Dimensions'  # 420 x 595 points
181
    A6: 'Dimensions'  # 298 x 420 points
182
    A7: 'Dimensions'  # 210 x 298 points
183
    A8: 'Dimensions'  # 147 x 210 points
184
    C4: 'Dimensions'  # 649 x 918 points (envelope)
185

186
class PasswordType:
187
    """Enumeration for password validation results."""
188
    
189
    NOT_DECRYPTED: int = 0
190
    USER_PASSWORD: int = 1
191
    OWNER_PASSWORD: int = 2
192

193
# Utility functions
194
def create_string_object(string: str, forced_encoding=None) -> Union[TextStringObject, ByteStringObject]:
195
    """
196
    Create appropriate string object based on content.
197
    
198
    Args:
199
        string (str): String content
200
        forced_encoding (str, optional): Force specific encoding
201
        
202
    Returns:
203
        Union[TextStringObject, ByteStringObject]: Appropriate string object
204
    """
205

206
def encode_pdfdocencoding(unicode_string: str) -> bytes:
207
    """
208
    Encode string using PDF document encoding.
209
    
210
    Args:
211
        unicode_string (str): Unicode string to encode
212
        
213
    Returns:
214
        bytes: Encoded bytes
215
    """
216

217
def decode_pdfdocencoding(byte_string: bytes) -> str:
218
    """
219
    Decode bytes using PDF document encoding.
220
    
221
    Args:
222
        byte_string (bytes): Bytes to decode
223
        
224
    Returns:
225
        str: Decoded string
226
    """
227

228
def hex_to_rgb(color: str) -> Tuple[float, float, float]:
229
    """
230
    Convert hex color to RGB tuple.
231
    
232
    Args:
233
        color (str): Hex color string (e.g., "#FF0000")
234
        
235
    Returns:
236
        tuple: (red, green, blue) values 0.0-1.0
237
    """
238

239
def read_object(stream, pdf) -> PdfObject:
240
    """
241
    Read a PDF object from stream.
242
    
243
    Args:
244
        stream: Input stream
245
        pdf: PDF reader reference
246
        
247
    Returns:
248
        PdfObject: Parsed PDF object
249
    """
250

251
def parse_filename_page_ranges(args: List[Union[str, PageRange, None]]) -> List[Tuple[str, PageRange]]:
252
    """
253
    Parse filename and page range arguments.
254
    
255
    Args:
256
        args: Command-line style arguments
257
        
258
    Returns:
259
        list: List of (filename, page_range) tuples
260
    """
261
```
262

263
### Type Definitions
264

265
Type aliases and definitions used throughout the library.
266

267
```python { .api }
268
# Border array for annotations
269
BorderArrayType = List[Union[NameObject, NumberObject, ArrayObject]]
270

271
# Outline item types
272
OutlineItemType = Union[OutlineItem, Destination]
273

274
# PDF fit types for destinations
275
FitType = Literal["/Fit", "/XYZ", "/FitH", "/FitV", "/FitR", "/FitB", "/FitBH", "/FitBV"]
276

277
# Zoom argument types
278
ZoomArgType = Union[NumberObject, NullObject, float]
279
ZoomArgsType = List[ZoomArgType]
280

281
# Complex outline structure type
282
OutlineType = List[Union[OutlineItemType, List]]
283

284
# Page layout types
285
LayoutType = Literal[
286
    "/SinglePage", "/OneColumn", "/TwoColumnLeft", "/TwoColumnRight",
287
    "/TwoPageLeft", "/TwoPageRight"
288
]
289

290
# Page mode types  
291
PagemodeType = Literal[
292
    "/UseNone", "/UseOutlines", "/UseThumbs", "/FullScreen",
293
    "/UseOC", "/UseAttachments"
294
]
295

296
# Page range specification types
297
PageRangeSpec = Union[str, PageRange, Tuple[int, int], Tuple[int, int, int], List[int]]
298

299
# Dimension type for paper sizes
300
class Dimensions:
301
    """Represents paper dimensions."""
302
    
303
    def __init__(self, width: float, height: float):
304
        """
305
        Create dimensions.
306
        
307
        Args:
308
            width (float): Width in points
309
            height (float): Height in points
310
        """
311
        self.width = width
312
        self.height = height
313
```
314

315
## Usage Examples
316

317
### Working with Generic Objects
318

319
```python
320
from PyPDF2 import PdfReader
321
from PyPDF2.generic import DictionaryObject, ArrayObject, NameObject
322

323
reader = PdfReader("document.pdf")
324

325
# Access raw PDF objects
326
for page in reader.pages:
327
    # Pages are DictionaryObject instances
328
    if isinstance(page, DictionaryObject):
329
        # Access dictionary entries
330
        mediabox = page.get("/MediaBox")
331
        if isinstance(mediabox, ArrayObject):
332
            print(f"MediaBox: {[float(x) for x in mediabox]}")
333
        
334
        # Check for resources
335
        resources = page.get("/Resources")
336
        if resources:
337
            fonts = resources.get("/Font", {})
338
            print(f"Fonts: {list(fonts.keys())}")
339
```
340

341
### Using Page Ranges
342

343
```python
344
from PyPDF2 import PdfMerger, PageRange
345

346
merger = PdfMerger()
347

348
# Various ways to specify page ranges
349
merger.append("doc1.pdf", pages=PageRange("1:5"))      # Pages 1-4
350
merger.append("doc2.pdf", pages=PageRange("::2"))      # Every other page  
351
merger.append("doc3.pdf", pages=PageRange("10:"))      # Page 10 to end
352
merger.append("doc4.pdf", pages=PageRange([1, 3, 5]))  # Specific pages
353

354
# Validate page range
355
if PageRange.valid("1:10"):
356
    print("Valid page range")
357

358
merger.write("output.pdf")
359
merger.close()
360
```
361

362
### Working with Paper Sizes
363

364
```python
365
from PyPDF2 import PdfWriter
366
from PyPDF2.generic import PaperSize
367

368
writer = PdfWriter()
369

370
# Create pages with standard sizes
371
a4_page = writer.add_blank_page(PaperSize.A4.width, PaperSize.A4.height)
372
letter_page = writer.add_blank_page(612, 792)  # US Letter
373
a3_page = writer.add_blank_page(PaperSize.A3.width, PaperSize.A3.height)
374

375
print(f"A4 size: {PaperSize.A4.width} x {PaperSize.A4.height} points")
376
print(f"A3 size: {PaperSize.A3.width} x {PaperSize.A3.height} points")
377

378
with open("standard_sizes.pdf", "wb") as output_file:
379
    writer.write(output_file)
380
```
381

382
### Creating Custom PDF Objects
383

384
```python
385
from PyPDF2.generic import (
386
    DictionaryObject, ArrayObject, NameObject, 
387
    TextStringObject, NumberObject
388
)
389

390
# Create a custom dictionary object
391
custom_dict = DictionaryObject({
392
    NameObject("/Type"): NameObject("/Annotation"),
393
    NameObject("/Subtype"): NameObject("/Text"),
394
    NameObject("/Contents"): TextStringObject("Custom note"),
395
    NameObject("/Rect"): ArrayObject([
396
        NumberObject(100), NumberObject(100),
397
        NumberObject(200), NumberObject(150)
398
    ])
399
})
400

401
print(f"Custom object: {custom_dict}")
402
```
403

404
### String Encoding Utilities
405

406
```python
407
from PyPDF2.generic import (
408
    create_string_object, encode_pdfdocencoding, 
409
    decode_pdfdocencoding, hex_to_rgb
410
)
411

412
# Create appropriate string objects
413
text = create_string_object("Hello, World!")
414
binary_text = create_string_object("\\x00\\xff\\x42", "latin-1")
415

416
# Encoding/decoding
417
unicode_text = "Héllo, Wørld!"
418
encoded = encode_pdfdocencoding(unicode_text)
419
decoded = decode_pdfdocencoding(encoded)
420

421
print(f"Original: {unicode_text}")
422
print(f"Decoded: {decoded}")
423

424
# Color conversion
425
red_rgb = hex_to_rgb("#FF0000")  # (1.0, 0.0, 0.0)
426
blue_rgb = hex_to_rgb("#0000FF")  # (0.0, 0.0, 1.0)
427
print(f"Red RGB: {red_rgb}")
428
print(f"Blue RGB: {blue_rgb}")
429
```
430

431
### Working with Outlines and Destinations
432

433
```python
434
from PyPDF2 import PdfReader
435
from PyPDF2.generic import OutlineItem, Destination
436

437
reader = PdfReader("document.pdf")
438

439
# Access document outline
440
outline = reader.outline
441
if outline:
442
    def print_outline(items, level=0):
443
        for item in items:
444
            if isinstance(item, OutlineItem):
445
                indent = "  " * level
446
                print(f"{indent}{item.title}")
447
                if hasattr(item, 'children') and item.children:
448
                    print_outline(item.children, level + 1)
449
            elif isinstance(item, list):
450
                print_outline(item, level)
451
    
452
    print_outline(outline)
453

454
# Access named destinations
455
destinations = reader.named_destinations
456
for name, dest in destinations.items():
457
    if isinstance(dest, Destination):
458
        print(f"Destination '{name}' -> Page {dest.page}, Type: {dest.typ}")
459
```
460

461
### Password Type Checking
462

463
```python
464
from PyPDF2 import PdfReader, PasswordType
465

466
reader = PdfReader("encrypted.pdf")
467

468
if reader.is_encrypted:
469
    # Try different password types
470
    result = reader.decrypt("user_password")
471
    
472
    if result == PasswordType.USER_PASSWORD:
473
        print("Opened with user password - some restrictions may apply")
474
    elif result == PasswordType.OWNER_PASSWORD:
475
        print("Opened with owner password - full access")
476
    elif result == PasswordType.NOT_DECRYPTED:
477
        print("Password incorrect or file corrupted")
478
```
479

480
## Constants and Enumerations
481

482
PyPDF2 includes extensive constants from the PDF specification organized in the `constants` module:
483

484
### Key Constants
485

486
```python { .api }
487
# Core PDF constants
488
class Core:
489
    OUTLINES = "/Outlines"
490
    THREADS = "/Threads" 
491
    PAGE = "/Page"
492
    PAGES = "/Pages"
493
    CATALOG = "/Catalog"
494

495
# User access permissions
496
class UserAccessPermissions:
497
    PRINT = 1 << 2
498
    MODIFY = 1 << 3
499
    COPY = 1 << 4
500
    ADD_OR_MODIFY = 1 << 5
501

502
# PDF filter types
503
class FilterTypes:
504
    FLATE_DECODE = "/FlateDecode"
505
    LZW_DECODE = "/LZWDecode"
506
    ASCII_HEX_DECODE = "/ASCIIHexDecode"
507
    DCT_DECODE = "/DCTDecode"
508
```
509

510
These constants ensure compliance with PDF specification requirements and provide standardized access to PDF dictionary keys and values.

Version

Tile

Files

types-and-objects.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

types-and-objects.mddocs/