Tessl Tile for pypi/pypdf2@2.12.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

tessl/pypi-py-pdf2

A pure-python PDF library capable of splitting, merging, cropping, and transforming PDF files.

Workspace: tessl
Visibility: Public
Created: 3 months ago
Last updated: 3 months ago
Describes: pkg:pypi/pypdf2@2.12.x

To install, run

npx @tessl/cli install tessl/pypi-py-pdf2@2.12.0

0
# PyPDF2
1

2
A pure-Python PDF library capable of splitting, merging, cropping, and transforming PDF files. PyPDF2 can retrieve text and metadata from PDFs as well as add custom data, viewing options, and passwords to PDF files. It provides comprehensive PDF processing capabilities for developers working with PDF documents programmatically.
3

4
## Package Information
5

6
- **Package Name**: PyPDF2
7
- **Language**: Python
8
- **Installation**: `pip install PyPDF2`
9
- **Version**: 2.12.1
10

11
## Core Imports
12

13
```python
14
import PyPDF2
15
```
16

17
Common patterns for specific functionality:
18

19
```python
20
from PyPDF2 import PdfReader, PdfWriter, PdfMerger
21
from PyPDF2 import PageObject, Transformation
22
from PyPDF2 import DocumentInformation, PasswordType
23
from PyPDF2 import PageRange, PaperSize, parse_filename_page_ranges
24
```
25

26
## Basic Usage
27

28
```python
29
from PyPDF2 import PdfReader, PdfWriter, PdfMerger
30

31
# Reading a PDF file
32
reader = PdfReader("input.pdf")
33
print(f"Number of pages: {len(reader.pages)}")
34
print(f"Title: {reader.metadata.title}")
35

36
# Extract text from first page
37
page = reader.pages[0]
38
text = page.extract_text()
39
print(text)
40

41
# Writing a new PDF
42
writer = PdfWriter()
43
writer.add_page(page)
44
with open("output.pdf", "wb") as output_file:
45
    writer.write(output_file)
46

47
# Merging multiple PDFs
48
merger = PdfMerger()
49
merger.append("file1.pdf")
50
merger.append("file2.pdf")
51
merger.write("merged.pdf")
52
merger.close()
53
```
54

55
## Architecture
56

57
PyPDF2 is built around four core components:
58

59
- **PdfReader**: Reads and parses PDF files, provides access to pages, metadata, and document structure
60
- **PdfWriter**: Creates new PDF files, manages pages, metadata, and output generation
61
- **PdfMerger**: Combines multiple PDF files with advanced merging options and outline management
62
- **PageObject**: Represents individual PDF pages with transformation, text extraction, and manipulation capabilities
63
- **Generic Objects**: Low-level PDF object types (DictionaryObject, ArrayObject, etc.) for advanced manipulation
64

65
The library maintains both high-level convenience classes and low-level generic objects, enabling everything from simple PDF operations to advanced PDF specification-level manipulation.
66

67
## Capabilities
68

69
### PDF Reading
70

71
Read PDF files, access pages, extract metadata and text content, handle encrypted documents with password protection.
72

73
```python { .api }
74
class PdfReader:
75
    def __init__(self, stream: Union[str, bytes, Path], strict: bool = False, password: Union[None, str, bytes] = None): ...
76
    
77
    @property
78
    def pages(self) -> List[PageObject]: ...
79
    @property
80
    def metadata(self) -> DocumentInformation: ...
81
    @property
82
    def is_encrypted(self) -> bool: ...
83
    
84
    def decrypt(self, password: Union[str, bytes]) -> PasswordType: ...
85
    def get_page(self, page_number: int) -> PageObject: ...
86
```
87

88
[PDF Reading](./pdf-reading.md)
89

90
### PDF Writing
91

92
Create new PDF files, add pages, insert blank pages, add metadata, encryption, annotations, and JavaScript.
93

94
```python { .api }
95
class PdfWriter:
96
    def __init__(self, fileobj: Union[str, bytes] = ""): ...
97
    
98
    def add_page(self, page: PageObject) -> None: ...
99
    def insert_page(self, page: PageObject, index: int = 0) -> None: ...
100
    def add_blank_page(self, width: float, height: float) -> PageObject: ...
101
    def write(self, stream) -> None: ...
102
    def encrypt(self, user_password: str, owner_password: str = "", use_128bit: bool = True, permissions_flag: int = -1) -> None: ...
103
```
104

105
[PDF Writing](./pdf-writing.md)
106

107
### PDF Merging
108

109
Merge multiple PDF files with control over page ranges, bookmarks, and document properties.
110

111
```python { .api }
112
class PdfMerger:
113
    def __init__(self, strict: bool = False, fileobj: Union[Path, str, bytes] = ""): ...
114
    
115
    def merge(self, page_number: int, fileobj, outline_item: str = None, pages = None, import_outline: bool = True) -> None: ...
116
    def append(self, fileobj, outline_item: str = None, pages = None, import_outline: bool = True) -> None: ...
117
    def write(self, fileobj) -> None: ...
118
    def close(self) -> None: ...
119
```
120

121
[PDF Merging](./pdf-merging.md)
122

123
### Page Manipulation
124

125
Transform, scale, rotate, crop, and merge individual PDF pages with precise control over page geometry.
126

127
```python { .api }
128
class PageObject:
129
    def extract_text(self, visitor_text=None) -> str: ...
130
    def scale(self, sx: float, sy: float) -> None: ...
131
    def rotate(self, angle: int) -> 'PageObject': ...
132
    def merge_page(self, page2: 'PageObject') -> None: ...
133
    
134
    @property
135
    def mediabox(self) -> RectangleObject: ...
136
    @property
137
    def cropbox(self) -> RectangleObject: ...
138
```
139

140
[Page Manipulation](./page-manipulation.md)
141

142
### Generic PDF Objects and Types
143

144
Low-level PDF object types for advanced manipulation, constants, and type definitions used throughout the library.
145

146
```python { .api }
147
class DictionaryObject(dict): ...
148
class ArrayObject(list): ...
149
class RectangleObject(ArrayObject): ...
150
class IndirectObject: ...
151

152
# Page Range Utilities
153
class PageRange:
154
    def __init__(self, arg: Union[slice, "PageRange", str]): ...
155
    
156
    @staticmethod
157
    def valid(input: Any) -> bool: ...
158
    def to_slice(self) -> slice: ...
159
    def indices(self, n: int) -> Tuple[int, int, int]: ...
160

161
# Transformation
162
class Transformation:
163
    def __init__(self, ctm: Tuple[float, float, float, float, float, float] = (1, 0, 0, 1, 0, 0)): ...
164
    
165
    @property
166
    def matrix(self) -> Tuple[Tuple[float, float, float], Tuple[float, float, float], Tuple[float, float, float]]: ...
167
    
168
    def scale(self, sx: Optional[float] = None, sy: Optional[float] = None) -> "Transformation": ...
169
    def translate(self, tx: float = 0, ty: float = 0) -> "Transformation": ...
170
    def rotate(self, rotation: float) -> "Transformation": ...
171

172
# Enumerations
173
class PasswordType: ...
174

175
# Utility functions  
176
def parse_filename_page_ranges(args: List[Union[str, PageRange, None]]) -> List[Tuple[str, PageRange]]: ...
177

178
# Version information
179
__version__: str  # Current PyPDF2 version
180
```
181

182
[Types and Objects](./types-and-objects.md)
183

184
### Error Handling and Utilities
185

186
Exception classes for comprehensive error handling and utility functions for specialized operations.
187

188
```python { .api }
189
class PyPdfError(Exception): ...
190
class PdfReadError(PyPdfError): ...
191
class WrongPasswordError(PdfReadError): ...
192
class FileNotDecryptedError(PdfReadError): ...
193

194
# Paper size utilities
195
class PaperSize:
196
    A0: Dimensions
197
    A4: Dimensions
198
    # ... more sizes
199
```
200

201
[Errors and Utilities](./errors-and-utilities.md)