or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/pypi-pypdf

A pure-python PDF library capable of splitting, merging, cropping, and transforming PDF files

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/pypdf@6.0.x

To install, run

npx @tessl/cli install tessl/pypi-pypdf@6.0.0

0

# pypdf

1

2

A pure-python PDF library capable of splitting, merging, cropping, and transforming PDF files. pypdf can also add custom data, viewing options, and passwords to PDF files, while providing comprehensive text and metadata extraction capabilities.

3

4

## Package Information

5

6

- **Package Name**: pypdf

7

- **Language**: Python

8

- **Installation**: `pip install pypdf`

9

- **Optional Dependencies**: `pip install pypdf[crypto]` for AES encryption/decryption

10

11

## Core Imports

12

13

```python

14

from pypdf import PdfReader, PdfWriter

15

```

16

17

For page operations:

18

19

```python

20

from pypdf import PdfReader, PdfWriter, PageObject, Transformation

21

```

22

23

For working with metadata and annotations:

24

25

```python

26

from pypdf import DocumentInformation, PageRange, PaperSize

27

```

28

29

## Basic Usage

30

31

```python

32

from pypdf import PdfReader, PdfWriter

33

34

# Reading a PDF

35

reader = PdfReader("example.pdf")

36

number_of_pages = len(reader.pages)

37

page = reader.pages[0]

38

text = page.extract_text()

39

40

# Writing a PDF

41

writer = PdfWriter()

42

writer.add_page(page)

43

with open("output.pdf", "wb") as output_file:

44

writer.write(output_file)

45

46

# Merging PDFs

47

reader1 = PdfReader("document1.pdf")

48

reader2 = PdfReader("document2.pdf")

49

50

writer = PdfWriter()

51

for page in reader1.pages:

52

writer.add_page(page)

53

for page in reader2.pages:

54

writer.add_page(page)

55

56

with open("merged.pdf", "wb") as output_file:

57

writer.write(output_file)

58

```

59

60

## Architecture

61

62

pypdf is built around two core classes and a rich ecosystem of supporting components:

63

64

- **PdfReader**: Handles PDF file parsing, decryption, and provides access to pages, metadata, and document structure

65

- **PdfWriter**: Manages PDF creation, page manipulation, encryption, and output generation

66

- **PageObject**: Represents individual PDF pages with comprehensive transformation and content manipulation capabilities

67

- **Generic Objects**: Low-level PDF object types for advanced manipulation (DictionaryObject, ArrayObject, StreamObject, etc.)

68

- **Annotations**: Complete annotation system for interactive PDF elements

69

- **Metadata**: Document information handling and XMP metadata support

70

71

## Capabilities

72

73

### PDF Reading and Writing

74

75

Core functionality for opening, reading, creating, and saving PDF documents. Includes support for encrypted PDFs, incremental updates, and context manager usage patterns.

76

77

```python { .api }

78

class PdfReader:

79

def __init__(self, stream, strict: bool = False, password: str | None = None): ...

80

def decrypt(self, password: str) -> PasswordType: ...

81

def close(self) -> None: ...

82

83

class PdfWriter:

84

def __init__(self, clone_from=None, incremental: bool = False): ...

85

def add_page(self, page: PageObject) -> None: ...

86

def write(self, stream) -> None: ...

87

def encrypt(self, user_password: str, owner_password: str | None = None, **kwargs) -> None: ...

88

```

89

90

[PDF Reading and Writing](./reading-writing.md)

91

92

### Page Operations

93

94

Comprehensive page manipulation including transformations (scaling, rotation, translation), page merging, cropping, and geometric operations. Support for blank page creation and advanced transformation matrices.

95

96

```python { .api }

97

class PageObject:

98

def extract_text(self, extraction_mode: str = "layout", **kwargs) -> str: ...

99

def scale(self, sx: float, sy: float) -> PageObject: ...

100

def rotate(self, angle: int) -> PageObject: ...

101

def merge_page(self, page2: PageObject) -> None: ...

102

def merge_transformed_page(self, page2: PageObject, ctm, expand: bool = False) -> None: ...

103

104

class Transformation:

105

def __init__(self, ctm=(1, 0, 0, 1, 0, 0)): ...

106

def translate(self, tx: float = 0, ty: float = 0) -> Transformation: ...

107

def scale(self, sx: float = 1, sy: float | None = None) -> Transformation: ...

108

def rotate(self, rotation: float) -> Transformation: ...

109

```

110

111

[Page Operations](./page-operations.md)

112

113

### Text Extraction

114

115

Advanced text extraction capabilities with multiple extraction modes, layout preservation, and customizable text processing options.

116

117

```python { .api }

118

def extract_text(

119

self,

120

orientations: tuple | int = (0, 90, 180, 270),

121

space_width: float = 200.0,

122

visitor_operand_before=None,

123

visitor_operand_after=None,

124

visitor_text=None,

125

extraction_mode: str = "plain"

126

) -> str: ...

127

```

128

129

[Text Extraction](./text-extraction.md)

130

131

### Metadata and Document Information

132

133

Access and manipulation of PDF metadata, document properties, XMP information, and custom document attributes.

134

135

```python { .api }

136

class DocumentInformation:

137

@property

138

def title(self) -> str | None: ...

139

@property

140

def author(self) -> str | None: ...

141

@property

142

def subject(self) -> str | None: ...

143

@property

144

def creator(self) -> str | None: ...

145

@property

146

def producer(self) -> str | None: ...

147

@property

148

def creation_date(self) -> datetime | None: ...

149

@property

150

def modification_date(self) -> datetime | None: ...

151

```

152

153

[Metadata](./metadata.md)

154

155

### Annotations

156

157

Complete annotation system supporting markup annotations (highlights, text annotations, shapes) and interactive elements (links, popups) with full customization capabilities.

158

159

```python { .api }

160

class AnnotationDictionary: ...

161

class Highlight: ...

162

class Text: ...

163

class Link: ...

164

class FreeText: ...

165

```

166

167

[Annotations](./annotations.md)

168

169

### Utilities and Helpers

170

171

Supporting utilities including page ranges, standard paper sizes, constants, error handling, and type definitions for enhanced developer experience.

172

173

```python { .api }

174

class PageRange:

175

def __init__(self, arg): ...

176

def indices(self, n: int) -> tuple[int, int, int]: ...

177

178

class PaperSize:

179

A4: tuple[float, float]

180

A3: tuple[float, float]

181

# ... other standard sizes

182

183

def parse_filename_page_ranges(fnprs: list[str]) -> tuple[list[str], list[PageRange]]: ...

184

```

185

186

[Utilities](./utilities.md)

187

188

### Form Fields and Interactive Elements

189

190

Comprehensive form field manipulation including reading field values, updating form data, setting field appearance properties, and managing interactive PDF forms.

191

192

```python { .api }

193

def update_page_form_field_values(

194

self,

195

page: PageObject | list[PageObject] | None,

196

fields: dict[str, str | list[str] | tuple[str, str, float]],

197

flags: int = 0,

198

auto_regenerate: bool = True,

199

flatten: bool = False

200

) -> None: ...

201

202

def set_need_appearances_writer(self, state: bool = True) -> None: ...

203

204

def reattach_fields(self, page: PageObject | None = None) -> list[DictionaryObject]: ...

205

```

206

207

[Form Fields](./form-fields.md)

208

209

## Types

210

211

```python { .api }

212

from enum import IntEnum, IntFlag

213

214

class PasswordType(IntEnum):

215

NOT_DECRYPTED = 0

216

USER_PASSWORD = 1

217

OWNER_PASSWORD = 2

218

219

class ImageType(IntFlag):

220

NONE = 0

221

XOBJECT_IMAGES = 1

222

INLINE_IMAGES = 2

223

DRAWING_IMAGES = 4

224

IMAGES = XOBJECT_IMAGES | INLINE_IMAGES

225

ALL = XOBJECT_IMAGES | INLINE_IMAGES | DRAWING_IMAGES

226

227

class ObjectDeletionFlag(IntFlag):

228

NONE = 0

229

TEXT = 1

230

LINKS = 2

231

ATTACHMENTS = 4

232

OBJECTS_3D = 8

233

ALL_ANNOTATIONS = 16

234

XOBJECT_IMAGES = 32

235

INLINE_IMAGES = 64

236

DRAWING_IMAGES = 128

237

IMAGES = XOBJECT_IMAGES | INLINE_IMAGES | DRAWING_IMAGES

238

```