or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/pypi-py-pdf2

A pure-python PDF library capable of splitting, merging, cropping, and transforming PDF files.

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/pypdf2@2.12.x

To install, run

npx @tessl/cli install tessl/pypi-py-pdf2@2.12.0

0

# PyPDF2

1

2

A pure-Python PDF library capable of splitting, merging, cropping, and transforming PDF files. PyPDF2 can retrieve text and metadata from PDFs as well as add custom data, viewing options, and passwords to PDF files. It provides comprehensive PDF processing capabilities for developers working with PDF documents programmatically.

3

4

## Package Information

5

6

- **Package Name**: PyPDF2

7

- **Language**: Python

8

- **Installation**: `pip install PyPDF2`

9

- **Version**: 2.12.1

10

11

## Core Imports

12

13

```python

14

import PyPDF2

15

```

16

17

Common patterns for specific functionality:

18

19

```python

20

from PyPDF2 import PdfReader, PdfWriter, PdfMerger

21

from PyPDF2 import PageObject, Transformation

22

from PyPDF2 import DocumentInformation, PasswordType

23

from PyPDF2 import PageRange, PaperSize, parse_filename_page_ranges

24

```

25

26

## Basic Usage

27

28

```python

29

from PyPDF2 import PdfReader, PdfWriter, PdfMerger

30

31

# Reading a PDF file

32

reader = PdfReader("input.pdf")

33

print(f"Number of pages: {len(reader.pages)}")

34

print(f"Title: {reader.metadata.title}")

35

36

# Extract text from first page

37

page = reader.pages[0]

38

text = page.extract_text()

39

print(text)

40

41

# Writing a new PDF

42

writer = PdfWriter()

43

writer.add_page(page)

44

with open("output.pdf", "wb") as output_file:

45

writer.write(output_file)

46

47

# Merging multiple PDFs

48

merger = PdfMerger()

49

merger.append("file1.pdf")

50

merger.append("file2.pdf")

51

merger.write("merged.pdf")

52

merger.close()

53

```

54

55

## Architecture

56

57

PyPDF2 is built around four core components:

58

59

- **PdfReader**: Reads and parses PDF files, provides access to pages, metadata, and document structure

60

- **PdfWriter**: Creates new PDF files, manages pages, metadata, and output generation

61

- **PdfMerger**: Combines multiple PDF files with advanced merging options and outline management

62

- **PageObject**: Represents individual PDF pages with transformation, text extraction, and manipulation capabilities

63

- **Generic Objects**: Low-level PDF object types (DictionaryObject, ArrayObject, etc.) for advanced manipulation

64

65

The library maintains both high-level convenience classes and low-level generic objects, enabling everything from simple PDF operations to advanced PDF specification-level manipulation.

66

67

## Capabilities

68

69

### PDF Reading

70

71

Read PDF files, access pages, extract metadata and text content, handle encrypted documents with password protection.

72

73

```python { .api }

74

class PdfReader:

75

def __init__(self, stream: Union[str, bytes, Path], strict: bool = False, password: Union[None, str, bytes] = None): ...

76

77

@property

78

def pages(self) -> List[PageObject]: ...

79

@property

80

def metadata(self) -> DocumentInformation: ...

81

@property

82

def is_encrypted(self) -> bool: ...

83

84

def decrypt(self, password: Union[str, bytes]) -> PasswordType: ...

85

def get_page(self, page_number: int) -> PageObject: ...

86

```

87

88

[PDF Reading](./pdf-reading.md)

89

90

### PDF Writing

91

92

Create new PDF files, add pages, insert blank pages, add metadata, encryption, annotations, and JavaScript.

93

94

```python { .api }

95

class PdfWriter:

96

def __init__(self, fileobj: Union[str, bytes] = ""): ...

97

98

def add_page(self, page: PageObject) -> None: ...

99

def insert_page(self, page: PageObject, index: int = 0) -> None: ...

100

def add_blank_page(self, width: float, height: float) -> PageObject: ...

101

def write(self, stream) -> None: ...

102

def encrypt(self, user_password: str, owner_password: str = "", use_128bit: bool = True, permissions_flag: int = -1) -> None: ...

103

```

104

105

[PDF Writing](./pdf-writing.md)

106

107

### PDF Merging

108

109

Merge multiple PDF files with control over page ranges, bookmarks, and document properties.

110

111

```python { .api }

112

class PdfMerger:

113

def __init__(self, strict: bool = False, fileobj: Union[Path, str, bytes] = ""): ...

114

115

def merge(self, page_number: int, fileobj, outline_item: str = None, pages = None, import_outline: bool = True) -> None: ...

116

def append(self, fileobj, outline_item: str = None, pages = None, import_outline: bool = True) -> None: ...

117

def write(self, fileobj) -> None: ...

118

def close(self) -> None: ...

119

```

120

121

[PDF Merging](./pdf-merging.md)

122

123

### Page Manipulation

124

125

Transform, scale, rotate, crop, and merge individual PDF pages with precise control over page geometry.

126

127

```python { .api }

128

class PageObject:

129

def extract_text(self, visitor_text=None) -> str: ...

130

def scale(self, sx: float, sy: float) -> None: ...

131

def rotate(self, angle: int) -> 'PageObject': ...

132

def merge_page(self, page2: 'PageObject') -> None: ...

133

134

@property

135

def mediabox(self) -> RectangleObject: ...

136

@property

137

def cropbox(self) -> RectangleObject: ...

138

```

139

140

[Page Manipulation](./page-manipulation.md)

141

142

### Generic PDF Objects and Types

143

144

Low-level PDF object types for advanced manipulation, constants, and type definitions used throughout the library.

145

146

```python { .api }

147

class DictionaryObject(dict): ...

148

class ArrayObject(list): ...

149

class RectangleObject(ArrayObject): ...

150

class IndirectObject: ...

151

152

# Page Range Utilities

153

class PageRange:

154

def __init__(self, arg: Union[slice, "PageRange", str]): ...

155

156

@staticmethod

157

def valid(input: Any) -> bool: ...

158

def to_slice(self) -> slice: ...

159

def indices(self, n: int) -> Tuple[int, int, int]: ...

160

161

# Transformation

162

class Transformation:

163

def __init__(self, ctm: Tuple[float, float, float, float, float, float] = (1, 0, 0, 1, 0, 0)): ...

164

165

@property

166

def matrix(self) -> Tuple[Tuple[float, float, float], Tuple[float, float, float], Tuple[float, float, float]]: ...

167

168

def scale(self, sx: Optional[float] = None, sy: Optional[float] = None) -> "Transformation": ...

169

def translate(self, tx: float = 0, ty: float = 0) -> "Transformation": ...

170

def rotate(self, rotation: float) -> "Transformation": ...

171

172

# Enumerations

173

class PasswordType: ...

174

175

# Utility functions

176

def parse_filename_page_ranges(args: List[Union[str, PageRange, None]]) -> List[Tuple[str, PageRange]]: ...

177

178

# Version information

179

__version__: str # Current PyPDF2 version

180

```

181

182

[Types and Objects](./types-and-objects.md)

183

184

### Error Handling and Utilities

185

186

Exception classes for comprehensive error handling and utility functions for specialized operations.

187

188

```python { .api }

189

class PyPdfError(Exception): ...

190

class PdfReadError(PyPdfError): ...

191

class WrongPasswordError(PdfReadError): ...

192

class FileNotDecryptedError(PdfReadError): ...

193

194

# Paper size utilities

195

class PaperSize:

196

A0: Dimensions

197

A4: Dimensions

198

# ... more sizes

199

```

200

201

[Errors and Utilities](./errors-and-utilities.md)