0
# PyPDF2
1
2
A pure-Python PDF library capable of splitting, merging, cropping, and transforming PDF files. PyPDF2 can retrieve text and metadata from PDFs as well as add custom data, viewing options, and passwords to PDF files. It provides comprehensive PDF processing capabilities for developers working with PDF documents programmatically.
3
4
## Package Information
5
6
- **Package Name**: PyPDF2
7
- **Language**: Python
8
- **Installation**: `pip install PyPDF2`
9
- **Version**: 2.12.1
10
11
## Core Imports
12
13
```python
14
import PyPDF2
15
```
16
17
Common patterns for specific functionality:
18
19
```python
20
from PyPDF2 import PdfReader, PdfWriter, PdfMerger
21
from PyPDF2 import PageObject, Transformation
22
from PyPDF2 import DocumentInformation, PasswordType
23
from PyPDF2 import PageRange, PaperSize, parse_filename_page_ranges
24
```
25
26
## Basic Usage
27
28
```python
29
from PyPDF2 import PdfReader, PdfWriter, PdfMerger
30
31
# Reading a PDF file
32
reader = PdfReader("input.pdf")
33
print(f"Number of pages: {len(reader.pages)}")
34
print(f"Title: {reader.metadata.title}")
35
36
# Extract text from first page
37
page = reader.pages[0]
38
text = page.extract_text()
39
print(text)
40
41
# Writing a new PDF
42
writer = PdfWriter()
43
writer.add_page(page)
44
with open("output.pdf", "wb") as output_file:
45
writer.write(output_file)
46
47
# Merging multiple PDFs
48
merger = PdfMerger()
49
merger.append("file1.pdf")
50
merger.append("file2.pdf")
51
merger.write("merged.pdf")
52
merger.close()
53
```
54
55
## Architecture
56
57
PyPDF2 is built around four core components:
58
59
- **PdfReader**: Reads and parses PDF files, provides access to pages, metadata, and document structure
60
- **PdfWriter**: Creates new PDF files, manages pages, metadata, and output generation
61
- **PdfMerger**: Combines multiple PDF files with advanced merging options and outline management
62
- **PageObject**: Represents individual PDF pages with transformation, text extraction, and manipulation capabilities
63
- **Generic Objects**: Low-level PDF object types (DictionaryObject, ArrayObject, etc.) for advanced manipulation
64
65
The library maintains both high-level convenience classes and low-level generic objects, enabling everything from simple PDF operations to advanced PDF specification-level manipulation.
66
67
## Capabilities
68
69
### PDF Reading
70
71
Read PDF files, access pages, extract metadata and text content, handle encrypted documents with password protection.
72
73
```python { .api }
74
class PdfReader:
75
def __init__(self, stream: Union[str, bytes, Path], strict: bool = False, password: Union[None, str, bytes] = None): ...
76
77
@property
78
def pages(self) -> List[PageObject]: ...
79
@property
80
def metadata(self) -> DocumentInformation: ...
81
@property
82
def is_encrypted(self) -> bool: ...
83
84
def decrypt(self, password: Union[str, bytes]) -> PasswordType: ...
85
def get_page(self, page_number: int) -> PageObject: ...
86
```
87
88
[PDF Reading](./pdf-reading.md)
89
90
### PDF Writing
91
92
Create new PDF files, add pages, insert blank pages, add metadata, encryption, annotations, and JavaScript.
93
94
```python { .api }
95
class PdfWriter:
96
def __init__(self, fileobj: Union[str, bytes] = ""): ...
97
98
def add_page(self, page: PageObject) -> None: ...
99
def insert_page(self, page: PageObject, index: int = 0) -> None: ...
100
def add_blank_page(self, width: float, height: float) -> PageObject: ...
101
def write(self, stream) -> None: ...
102
def encrypt(self, user_password: str, owner_password: str = "", use_128bit: bool = True, permissions_flag: int = -1) -> None: ...
103
```
104
105
[PDF Writing](./pdf-writing.md)
106
107
### PDF Merging
108
109
Merge multiple PDF files with control over page ranges, bookmarks, and document properties.
110
111
```python { .api }
112
class PdfMerger:
113
def __init__(self, strict: bool = False, fileobj: Union[Path, str, bytes] = ""): ...
114
115
def merge(self, page_number: int, fileobj, outline_item: str = None, pages = None, import_outline: bool = True) -> None: ...
116
def append(self, fileobj, outline_item: str = None, pages = None, import_outline: bool = True) -> None: ...
117
def write(self, fileobj) -> None: ...
118
def close(self) -> None: ...
119
```
120
121
[PDF Merging](./pdf-merging.md)
122
123
### Page Manipulation
124
125
Transform, scale, rotate, crop, and merge individual PDF pages with precise control over page geometry.
126
127
```python { .api }
128
class PageObject:
129
def extract_text(self, visitor_text=None) -> str: ...
130
def scale(self, sx: float, sy: float) -> None: ...
131
def rotate(self, angle: int) -> 'PageObject': ...
132
def merge_page(self, page2: 'PageObject') -> None: ...
133
134
@property
135
def mediabox(self) -> RectangleObject: ...
136
@property
137
def cropbox(self) -> RectangleObject: ...
138
```
139
140
[Page Manipulation](./page-manipulation.md)
141
142
### Generic PDF Objects and Types
143
144
Low-level PDF object types for advanced manipulation, constants, and type definitions used throughout the library.
145
146
```python { .api }
147
class DictionaryObject(dict): ...
148
class ArrayObject(list): ...
149
class RectangleObject(ArrayObject): ...
150
class IndirectObject: ...
151
152
# Page Range Utilities
153
class PageRange:
154
def __init__(self, arg: Union[slice, "PageRange", str]): ...
155
156
@staticmethod
157
def valid(input: Any) -> bool: ...
158
def to_slice(self) -> slice: ...
159
def indices(self, n: int) -> Tuple[int, int, int]: ...
160
161
# Transformation
162
class Transformation:
163
def __init__(self, ctm: Tuple[float, float, float, float, float, float] = (1, 0, 0, 1, 0, 0)): ...
164
165
@property
166
def matrix(self) -> Tuple[Tuple[float, float, float], Tuple[float, float, float], Tuple[float, float, float]]: ...
167
168
def scale(self, sx: Optional[float] = None, sy: Optional[float] = None) -> "Transformation": ...
169
def translate(self, tx: float = 0, ty: float = 0) -> "Transformation": ...
170
def rotate(self, rotation: float) -> "Transformation": ...
171
172
# Enumerations
173
class PasswordType: ...
174
175
# Utility functions
176
def parse_filename_page_ranges(args: List[Union[str, PageRange, None]]) -> List[Tuple[str, PageRange]]: ...
177
178
# Version information
179
__version__: str # Current PyPDF2 version
180
```
181
182
[Types and Objects](./types-and-objects.md)
183
184
### Error Handling and Utilities
185
186
Exception classes for comprehensive error handling and utility functions for specialized operations.
187
188
```python { .api }
189
class PyPdfError(Exception): ...
190
class PdfReadError(PyPdfError): ...
191
class WrongPasswordError(PdfReadError): ...
192
class FileNotDecryptedError(PdfReadError): ...
193
194
# Paper size utilities
195
class PaperSize:
196
A0: Dimensions
197
A4: Dimensions
198
# ... more sizes
199
```
200
201
[Errors and Utilities](./errors-and-utilities.md)