Tessl Tile for pypi/pypdfium2@4.30.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

attachments.md cli-tools.md document-management.md image-bitmap.md index.md page-manipulation.md page-objects.md text-processing.md transformation.md version-info.md

document-management.mddocs/

0
# Document Management
1

2
Core PDF document operations including loading, creating, saving, metadata handling, and document-level manipulation. The PdfDocument class serves as the primary entry point for all PDF operations.
3

4
## Capabilities
5

6
### Document Creation and Loading
7

8
Create new PDF documents or load existing ones from various sources including file paths, bytes, and file-like objects.
9

10
```python { .api }
11
class PdfDocument:
12
    def __init__(self, input, password=None, autoclose=False):
13
        """
14
        Create a PDF document from various input sources.
15
        
16
        Parameters:
17
        - input: str (file path), bytes, or file-like object
18
        - password: str, optional password for encrypted PDFs
19
        - autoclose: bool, automatically close document when object is deleted
20
        """
21
    
22
    @classmethod
23
    def new(cls) -> PdfDocument:
24
        """Create a new empty PDF document."""
25
```
26

27
Example usage:
28

29
```python
30
import pypdfium2 as pdfium
31

32
# Load from file path
33
pdf = pdfium.PdfDocument("document.pdf")
34

35
# Load with password
36
pdf = pdfium.PdfDocument("encrypted.pdf", password="secret")
37

38
# Load from bytes
39
with open("document.pdf", "rb") as f:
40
    pdf_bytes = f.read()
41
pdf = pdfium.PdfDocument(pdf_bytes)
42

43
# Create new document
44
new_pdf = pdfium.PdfDocument.new()
45
```
46

47
### Document Information
48

49
Access and modify document metadata, version information, and properties.
50

51
```python { .api }
52
def __len__(self) -> int:
53
    """Get the number of pages in the document."""
54

55
def get_version(self) -> int | None:
56
    """Get PDF version number (e.g., 14 for PDF 1.4)."""
57

58
def get_identifier(self, type=...) -> bytes:
59
    """Get document file identifier."""
60

61
def is_tagged(self) -> bool:
62
    """Check if document is a tagged PDF for accessibility."""
63

64
def get_pagemode(self) -> int:
65
    """Get page mode (how document should be displayed)."""
66

67
def get_formtype(self) -> int:
68
    """Get form type if document contains interactive forms."""
69
```
70

71
### Metadata Management
72

73
Read and write PDF metadata including title, author, subject, keywords, and creation information.
74

75
```python { .api }
76
def get_metadata_value(self, key: str) -> str:
77
    """
78
    Get specific metadata value.
79
    
80
    Parameters:
81
    - key: str, metadata key (Title, Author, Subject, Keywords, Creator, Producer, CreationDate, ModDate)
82
    
83
    Returns:
84
    str: Metadata value or empty string if not found
85
    """
86

87
def get_metadata_dict(self, skip_empty=False) -> dict:
88
    """
89
    Get all metadata as dictionary.
90
    
91
    Parameters:
92
    - skip_empty: bool, exclude empty metadata values
93
    
94
    Returns:
95
    dict: Metadata key-value pairs
96
    """
97

98
# Available metadata keys
99
METADATA_KEYS = ("Title", "Author", "Subject", "Keywords", "Creator", "Producer", "CreationDate", "ModDate")
100
```
101

102
Example:
103

104
```python
105
pdf = pdfium.PdfDocument("document.pdf")
106

107
# Get specific metadata
108
title = pdf.get_metadata_value("Title")
109
author = pdf.get_metadata_value("Author")
110

111
# Get all metadata
112
metadata = pdf.get_metadata_dict()
113
print(f"Title: {metadata.get('Title', 'Unknown')}")
114
print(f"Pages: {len(pdf)}")
115
print(f"PDF Version: {pdf.get_version()}")
116
```
117

118
### Page Management
119

120
Access, create, delete, and manipulate pages within the document.
121

122
```python { .api }
123
def __iter__(self) -> Iterator[PdfPage]:
124
    """Iterate over all pages in the document."""
125

126
def __getitem__(self, index: int) -> PdfPage:
127
    """Get page by index (0-based)."""
128

129
def __delitem__(self, index: int):
130
    """Delete page by index."""
131

132
def get_page(self, index: int) -> PdfPage:
133
    """Get page by index with explicit method."""
134

135
def new_page(self, width: float, height: float, index: int = None) -> PdfPage:
136
    """
137
    Create new page in document.
138
    
139
    Parameters:
140
    - width: float, page width in PDF units (1/72 inch)
141
    - height: float, page height in PDF units  
142
    - index: int, optional insertion index (None = append)
143
    
144
    Returns:
145
    PdfPage: New page object
146
    """
147

148
def del_page(self, index: int):
149
    """Delete page by index."""
150

151
def import_pages(self, pdf: PdfDocument, pages=None, index=None):
152
    """
153
    Import pages from another PDF document.
154
    
155
    Parameters:
156
    - pdf: PdfDocument, source document
157
    - pages: list of int, page indices to import (None = all pages)
158
    - index: int, insertion point in this document (None = append)
159
    """
160

161
def get_page_size(self, index: int) -> tuple[float, float]:
162
    """Get page dimensions as (width, height) tuple."""
163

164
def get_page_label(self, index: int) -> str:
165
    """Get page label (may differ from index for custom numbering)."""
166

167
def page_as_xobject(self, index: int, dest_pdf: PdfDocument) -> PdfXObject:
168
    """Convert page to Form XObject for embedding in another document."""
169
```
170

171
Example usage:
172

173
```python
174
pdf = pdfium.PdfDocument("document.pdf")
175

176
# Access pages
177
first_page = pdf[0]
178
last_page = pdf[-1]
179

180
# Iterate pages
181
for i, page in enumerate(pdf):
182
    print(f"Page {i+1}: {page.get_size()}")
183

184
# Create new page
185
new_page = pdf.new_page(612, 792)  # US Letter size
186

187
# Import pages from another PDF
188
source_pdf = pdfium.PdfDocument("source.pdf")
189
pdf.import_pages(source_pdf, pages=[0, 2, 4])  # Import pages 1, 3, 5
190

191
# Delete a page
192
del pdf[5]
193
```
194

195
### File Attachments
196

197
Manage embedded file attachments within the PDF document.
198

199
```python { .api }
200
def count_attachments(self) -> int:
201
    """Get number of file attachments."""
202

203
def get_attachment(self, index: int) -> PdfAttachment:
204
    """Get attachment by index."""
205

206
def new_attachment(self, name: str) -> PdfAttachment:
207
    """
208
    Create new file attachment.
209
    
210
    Parameters:
211
    - name: str, attachment filename
212
    
213
    Returns:
214
    PdfAttachment: New attachment object
215
    """
216

217
def del_attachment(self, index: int):
218
    """Delete attachment by index."""
219
```
220

221
### Document Outline and Bookmarks
222

223
Navigate and extract the document's table of contents structure, including nested bookmarks.
224

225
```python { .api }
226
def get_toc(self, max_depth=15, parent=None, level=0, seen=None) -> Iterator[PdfOutlineItem]:
227
    """
228
    Iterate through the bookmarks in the document's table of contents.
229
    
230
    Parameters:
231
    - max_depth: int, maximum recursion depth to consider (default: 15)
232
    - parent: internal parent bookmark (typically None for root level)
233
    - level: internal nesting level (typically 0 for root)
234
    - seen: internal set for circular reference detection
235
    
236
    Yields:
237
    PdfOutlineItem: Bookmark information objects
238
    
239
    Each bookmark contains title, page reference, view settings, and
240
    hierarchical information including nesting level and child counts.
241
    """
242
```
243

244
#### PdfOutlineItem Class
245

246
Bookmark information structure for PDF table of contents entries.
247

248
```python { .api }
249
class PdfOutlineItem:
250
    """
251
    Bookmark information namedtuple for PDF outline entries.
252
    
253
    Represents a single bookmark/outline item from a PDF's table of contents,
254
    containing hierarchical navigation information and target page details.
255
    
256
    Attributes:
257
    - level: int, number of parent items (nesting depth)
258
    - title: str, title string of the bookmark
259
    - is_closed: bool | None, True if children should be collapsed,
260
                             False if expanded, None if no children
261
    - n_kids: int, absolute number of child items
262
    - page_index: int | None, zero-based target page index (None if no target)
263
    - view_mode: int, view mode constant defining coordinate interpretation
264
    - view_pos: list[float], target position coordinates on the page
265
    """
266
    
267
    level: int
268
    title: str  
269
    is_closed: bool | None
270
    n_kids: int
271
    page_index: int | None
272
    view_mode: int
273
    view_pos: list[float]
274
```
275

276
Example usage:
277

278
```python
279
pdf = pdfium.PdfDocument("document_with_bookmarks.pdf")
280

281
# Extract table of contents
282
for bookmark in pdf.get_toc():
283
    indent = "  " * bookmark.level  # Indent based on nesting
284
    print(f"{indent}{bookmark.title}")
285
    
286
    if bookmark.page_index is not None:
287
        print(f"{indent}  → Page {bookmark.page_index + 1}")
288
        print(f"{indent}  → Position: {bookmark.view_pos}")
289
    
290
    if bookmark.n_kids > 0:
291
        expanded = "📂" if not bookmark.is_closed else "📁"
292
        print(f"{indent}  {expanded} ({bookmark.n_kids} children)")
293

294
# Navigate to specific bookmark
295
for bookmark in pdf.get_toc():
296
    if "Chapter 1" in bookmark.title and bookmark.page_index is not None:
297
        # Load the target page
298
        target_page = pdf[bookmark.page_index]
299
        break
300
```
301

302
### Interactive Forms
303

304
Initialize interactive form environment for handling PDF forms and annotations.
305

306
```python { .api }
307
def init_forms(self, config=None):
308
    """
309
    Initialize interactive form environment.
310
    
311
    Parameters:
312
    - config: optional form configuration
313
    
314
    Sets up form environment for handling interactive elements,
315
    annotations, and form fields.
316
    """
317
```
318

319
#### PdfFormEnv Class
320

321
Form environment helper class for managing interactive PDF forms.
322

323
```python { .api }
324
class PdfFormEnv:
325
    """
326
    Form environment helper class for managing interactive PDF forms.
327
    
328
    This class provides the form environment context needed for rendering
329
    and interacting with PDF forms. Created automatically when init_forms()
330
    is called on a document that contains forms.
331
    
332
    Attributes:
333
    - raw: FPDF_FORMHANDLE, underlying PDFium form env handle
334
    - config: FPDF_FORMFILLINFO, form configuration interface
335
    - pdf: PdfDocument, parent document this form env belongs to
336
    """
337
    
338
    def __init__(self, raw, config, pdf):
339
        """
340
        Initialize form environment.
341
        
342
        Parameters:
343
        - raw: FPDF_FORMHANDLE, PDFium form handle
344
        - config: FPDF_FORMFILLINFO, form configuration
345
        - pdf: PdfDocument, parent document
346
        
347
        Note: This is typically created automatically by PdfDocument.init_forms()
348
        rather than being instantiated directly.
349
        """
350
    
351
    def close(self):
352
        """Close and clean up form environment resources."""
353
```
354

355
Example usage:
356

357
```python
358
pdf = pdfium.PdfDocument("form.pdf")
359

360
# Initialize forms if document contains them
361
pdf.init_forms()
362

363
if pdf.formenv:
364
    print("Form environment is active")
365
    # Form environment will be used automatically during page rendering
366
    # to handle interactive form elements
367
```
368

369
### Document Saving
370

371
Save PDF documents to files or buffers with version control and optimization options.
372

373
```python { .api }
374
def save(self, dest, version=None, flags=...):
375
    """
376
    Save document to file or buffer.
377
    
378
    Parameters:
379
    - dest: str (file path) or file-like object for output
380
    - version: int, optional PDF version to save as
381
    - flags: various save options and optimization flags
382
    
383
    Saves the current state of the document including all modifications,
384
    new pages, and metadata changes.
385
    """
386
```
387

388
Example:
389

390
```python
391
pdf = pdfium.PdfDocument("input.pdf")
392

393
# Modify document
394
pdf.new_page(612, 792)
395

396
# Save to new file
397
pdf.save("output.pdf")
398

399
# Save to buffer
400
import io
401
buffer = io.BytesIO()
402
pdf.save(buffer)
403
pdf_bytes = buffer.getvalue()
404
```
405

406
### Resource Management
407

408
Proper cleanup and resource management for PDF documents.
409

410
```python { .api }
411
def close():
412
    """Close document and free resources."""
413

414
def __enter__(self) -> PdfDocument:
415
    """Context manager entry."""
416

417
def __exit__(self, exc_type, exc_val, exc_tb):
418
    """Context manager exit with cleanup."""
419
```
420

421
Always close documents when done or use context managers:
422

423
```python
424
# Manual cleanup
425
pdf = pdfium.PdfDocument("document.pdf")
426
# ... work with PDF
427
pdf.close()
428

429
# Context manager (recommended)
430
with pdfium.PdfDocument("document.pdf") as pdf:
431
    # ... work with PDF
432
    pass  # Automatically closed
433
```
434

435
## Properties
436

437
```python { .api }
438
@property
439
def raw(self) -> FPDF_DOCUMENT:
440
    """Raw PDFium document handle for low-level operations."""
441

442
@property  
443
def formenv(self) -> PdfFormEnv | None:
444
    """Form environment if initialized, None otherwise."""
445
```
446

447
## Advanced Features
448

449
### Unsupported Feature Handling
450

451
Handle notifications about PDF features not supported by the PDFium library.
452

453
#### PdfUnspHandler Class
454

455
Unsupported feature handler for managing notifications about PDF features not available in PDFium.
456

457
```python { .api }
458
class PdfUnspHandler:
459
    """
460
    Unsupported feature handler helper class.
461
    
462
    Manages callbacks for handling notifications when PDFium encounters
463
    PDF features that are not supported by the current build. Useful for
464
    logging, debugging, and informing users about document limitations.
465
    
466
    Attributes:
467
    - handlers: dict[str, callable], dictionary of named handler functions
468
                called with unsupported feature codes (FPDF_UNSP_*)
469
    """
470
    
471
    def __init__(self):
472
        """Initialize unsupported feature handler."""
473
    
474
    def setup(self, add_default=True):
475
        """
476
        Attach the handler to PDFium and register exit function.
477
        
478
        Parameters:
479
        - add_default: bool, if True, add default warning callback
480
        
481
        Sets up the handler to receive notifications from PDFium when
482
        unsupported features are encountered during document processing.
483
        """
484
    
485
    def __call__(self, _, type: int):
486
        """
487
        Handle unsupported feature notification.
488
        
489
        Parameters:
490
        - _: unused parameter (PDFium context)  
491
        - type: int, unsupported feature code (FPDF_UNSP_*)
492
        
493
        Called automatically by PDFium when unsupported features are found.
494
        Executes all registered handler functions with the feature code.
495
        """
496
```
497

498
Example usage:
499

500
```python
501
import pypdfium2 as pdfium
502

503
# Create and setup unsupported feature handler
504
unsp_handler = pdfium.PdfUnspHandler()
505

506
# Add custom handler for unsupported features
507
def my_handler(feature_code):
508
    feature_name = {
509
        1: "Document XFA", 
510
        2: "Portable Collection",
511
        3: "Attachment",
512
        4: "Security", 
513
        5: "Shared Review",
514
        6: "Shared Form Acrobat",
515
        7: "Shared Form Filesystem", 
516
        8: "Shared Form Email",
517
        9: "3D Annotation",
518
        10: "Movie Annotation",
519
        11: "Sound Annotation", 
520
        12: "Screen Media",
521
        13: "Screen Rich Media",
522
        14: "Attachment 3D",
523
        15: "Multimedia"
524
    }.get(feature_code, f"Unknown feature {feature_code}")
525
    
526
    print(f"Warning: Unsupported PDF feature detected: {feature_name}")
527

528
unsp_handler.handlers["custom"] = my_handler
529

530
# Setup handler (includes default warning logger)
531
unsp_handler.setup(add_default=True)
532

533
# Now when processing PDFs, unsupported features will be reported
534
pdf = pdfium.PdfDocument("document_with_unsupported_features.pdf")
535
# Any unsupported features will trigger the handlers
536
```

Version

Tile

Files

document-management.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

document-management.mddocs/