Tessl Tile for pypi/pypdfium2@4.30.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

tessl/pypi-pypdfium2

Python bindings to PDFium for comprehensive PDF manipulation, rendering, and processing

Workspace: tessl
Visibility: Public
Created: 3 months ago
Last updated: 3 months ago
Describes: pkg:pypi/pypdfium2@4.30.x

To install, run

npx @tessl/cli install tessl/pypi-pypdfium2@4.30.0

0
# pypdfium2
1

2
Python bindings to PDFium for comprehensive PDF manipulation, rendering, and processing. Built on Google's powerful PDFium library, pypdfium2 provides both high-level helper classes for common PDF operations and low-level raw bindings for advanced functionality.
3

4
## Package Information
5

6
- **Package Name**: pypdfium2
7
- **Language**: Python
8
- **Installation**: `pip install pypdfium2`
9
- **Python Requirements**: Python 3.6+
10

11
## Core Imports
12

13
```python
14
import pypdfium2 as pdfium
15
```
16

17
For direct access to specific classes:
18

19
```python
20
from pypdfium2 import PdfDocument, PdfPage, PdfBitmap
21
```
22

23
For version information:
24

25
```python
26
from pypdfium2 import PYPDFIUM_INFO, PDFIUM_INFO
27
```
28

29
## Basic Usage
30

31
```python
32
import pypdfium2 as pdfium
33

34
# Open a PDF document
35
pdf = pdfium.PdfDocument("document.pdf")
36

37
# Get basic information
38
print(f"Pages: {len(pdf)}")
39
print(f"Version: {pdf.get_version()}")
40
print(f"Metadata: {pdf.get_metadata_dict()}")
41

42
# Render first page to image
43
page = pdf[0]
44
bitmap = page.render(scale=2.0)
45
pil_image = bitmap.to_pil()
46
pil_image.save("page1.png")
47

48
# Extract text from page
49
textpage = page.get_textpage()
50
text = textpage.get_text_range()
51
print(f"Page text: {text}")
52

53
# Clean up
54
pdf.close()
55
```
56

57
## Architecture
58

59
pypdfium2 follows a layered architecture design:
60

61
- **Helper Classes**: High-level Python API (PdfDocument, PdfPage, PdfBitmap, etc.) providing intuitive interfaces for common operations
62
- **Raw Bindings**: Direct access to PDFium C API functions through pypdfium2.raw module
63
- **Type System**: Named tuples and data classes for structured information (PdfBitmapInfo, ImageInfo, etc.)
64
- **Resource Management**: Automatic cleanup with context managers and explicit close() methods
65
- **Multi-format Support**: PDF reading/writing, image rendering (PIL, NumPy), text extraction
66

67
This design enables both simple high-level operations and advanced low-level manipulation while maintaining compatibility with the broader Python ecosystem.
68

69
## Capabilities
70

71
### Document Management
72

73
Core PDF document operations including loading, creating, saving, and metadata manipulation. Supports password-protected PDFs, form handling, and file attachments.
74

75
```python { .api }
76
class PdfDocument:
77
    def __init__(self, input_data, password=None, autoclose=False): ...
78
    @classmethod
79
    def new(cls): ...
80
    def __len__(self) -> int: ...
81
    def save(self, dest, version=None, flags=...): ...
82
    def get_metadata_dict(self, skip_empty=False) -> dict: ...
83
    def is_tagged(self) -> bool: ...
84
```
85

86
[Document Management](./document-management.md)
87

88
### Page Manipulation  
89

90
Page-level operations including rendering, rotation, dimension management, and bounding box manipulation. Supports various rendering formats and customization options.
91

92
```python { .api }
93
class PdfPage:
94
    def get_size(self) -> tuple[float, float]: ...
95
    def render(self, rotation=0, scale=1, ...) -> PdfBitmap: ...
96
    def get_rotation(self) -> int: ...
97
    def set_rotation(self, rotation): ...
98
    def get_mediabox(self, fallback_ok=True) -> tuple | None: ...
99
```
100

101
[Page Manipulation](./page-manipulation.md)
102

103
### Text Processing
104

105
Comprehensive text extraction and search capabilities with support for bounded text extraction, character-level positioning, and full-text search.
106

107
```python { .api }
108
class PdfTextPage:
109
    def get_text_range(self, index=0, count=-1, errors="ignore", force_this=False) -> str: ...
110
    def get_text_bounded(self, left=None, bottom=None, right=None, top=None, errors="ignore") -> str: ...
111
    def search(self, text, index=0, match_case=False, match_whole_word=False, consecutive=False) -> PdfTextSearcher: ...
112
    def get_charbox(self, index, loose=False) -> tuple: ...
113
```
114

115
[Text Processing](./text-processing.md)
116

117
### Image and Bitmap Operations
118

119
Image rendering, manipulation, and extraction with support for multiple output formats including PIL Images, NumPy arrays, and raw bitmaps.
120

121
```python { .api }
122
class PdfBitmap:
123
    @classmethod
124
    def from_pil(cls, pil_image, recopy=False) -> PdfBitmap: ...
125
    def to_numpy(self) -> numpy.ndarray: ...
126
    def to_pil(self) -> PIL.Image: ...
127
    def fill_rect(self, left, top, width, height, color): ...
128
```
129

130
[Image and Bitmap Operations](./image-bitmap.md)
131

132
### Page Objects and Graphics
133

134
Manipulation of PDF page objects including images, text, and vector graphics. Supports object transformation, insertion, and removal.
135

136
```python { .api }
137
class PdfObject:
138
    def get_pos(self) -> tuple: ...
139
    def get_matrix(self) -> PdfMatrix: ...
140
    def transform(self, matrix): ...
141

142
class PdfImage(PdfObject):
143
    def get_metadata(self) -> ImageInfo: ...
144
    def extract(self, dest, *args, **kwargs): ...
145
```
146

147
[Page Objects and Graphics](./page-objects.md)
148

149
### File Attachments
150

151
Management of embedded file attachments with support for attachment metadata, data extraction, and modification.
152

153
```python { .api }
154
class PdfAttachment:
155
    def get_name(self) -> str: ...
156
    def get_data(self) -> ctypes.Array: ...
157
    def set_data(self, data): ...
158
    def get_str_value(self, key) -> str: ...
159
```
160

161
[File Attachments](./attachments.md)
162

163
### Transformation and Geometry
164

165
2D transformation matrices for coordinate system manipulation, rotation, scaling, and translation operations.
166

167
```python { .api }
168
class PdfMatrix:
169
    def __init__(self, a=1, b=0, c=0, d=1, e=0, f=0): ...
170
    def translate(self, x, y) -> PdfMatrix: ...
171
    def scale(self, x, y) -> PdfMatrix: ...
172
    def rotate(self, angle, ccw=False, rad=False) -> PdfMatrix: ...
173
    def on_point(self, x, y) -> tuple: ...
174
```
175

176
[Transformation and Geometry](./transformation.md)
177

178
### Version and Library Information
179

180
Access to pypdfium2 and PDFium version information, build details, and feature flags.
181

182
```python { .api }
183
PYPDFIUM_INFO: _version_pypdfium2
184
PDFIUM_INFO: _version_pdfium
185

186
# Version properties
187
version: str
188
api_tag: tuple[int]
189
major: int
190
minor: int
191
patch: int
192
build: int  # PDFIUM_INFO only
193
```
194

195
[Version and Library Information](./version-info.md)
196

197
### Command Line Interface
198

199
Access to pypdfium2's comprehensive command-line tools for batch processing, text extraction, image operations, and document manipulation.
200

201
```python { .api }
202
def cli_main(raw_args=None) -> int:
203
    """Main CLI entry point for pypdfium2 command-line tools."""
204

205
def api_main(raw_args=None) -> int:
206
    """Alternative API entry point with same functionality as cli_main."""
207
```
208

209
[Command Line Interface](./cli-tools.md)
210

211
## Exception Handling
212

213
```python { .api }
214
class PdfiumError(RuntimeError):
215
    """Main exception for PDFium library errors"""
216
    
217
class ImageNotExtractableError(Exception):
218
    """Raised when image cannot be extracted from PDF"""
219
```
220

221
Common error scenarios include invalid PDF files, unsupported operations, memory allocation failures, and file I/O errors. Always handle exceptions when working with external PDF files or performing complex operations.
222

223
## Raw Bindings Access
224

225
For advanced use cases requiring direct PDFium API access:
226

227
```python
228
from pypdfium2 import raw
229

230
# Access low-level PDFium functions
231
doc_handle = raw.FPDF_LoadDocument(file_path, password)
232
page_count = raw.FPDF_GetPageCount(doc_handle)
233
```
234

235
The raw module provides complete access to PDFium's C API with all functions, constants, and structures available for advanced manipulation.