or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/pypi-pikepdf

Read and write PDFs with Python, powered by qpdf

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/pikepdf@9.10.x

To install, run

npx @tessl/cli install tessl/pypi-pikepdf@9.10.0

0

# pikepdf

1

2

A comprehensive Python library for reading, writing, and manipulating PDF files, built on top of the mature qpdf C++ library. It provides a Pythonic API for PDF operations including page manipulation, metadata editing, form field handling, encryption/decryption, and content transformation with superior performance compared to pure Python alternatives.

3

4

## Package Information

5

6

- **Package Name**: pikepdf

7

- **Language**: Python

8

- **Installation**: `pip install pikepdf`

9

10

## Core Imports

11

12

```python

13

import pikepdf

14

```

15

16

Common for working with PDFs:

17

18

```python

19

from pikepdf import Pdf

20

```

21

22

## Basic Usage

23

24

```python

25

import pikepdf

26

27

# Open an existing PDF

28

pdf = pikepdf.open('input.pdf')

29

30

# Or use the Pdf class directly

31

pdf = pikepdf.Pdf.open('input.pdf')

32

33

# Create a new empty PDF

34

new_pdf = pikepdf.new()

35

36

# Add a blank page

37

new_pdf.add_blank_page(page_size=(612, 792)) # Letter size

38

39

# Access pages

40

first_page = pdf.pages[0]

41

42

# Rotate a page

43

first_page.rotate(90, relative=True)

44

45

# Copy pages between PDFs

46

new_pdf.pages.append(first_page)

47

48

# Save the PDF

49

pdf.save('output.pdf')

50

new_pdf.save('new_document.pdf')

51

52

# Always close PDFs when done

53

pdf.close()

54

new_pdf.close()

55

```

56

57

## Architecture

58

59

pikepdf is built on a layered architecture that provides both low-level control and high-level convenience:

60

61

- **Core Layer (_core)**: C++ bindings to QPDF library providing fundamental PDF operations

62

- **Object Layer**: Python wrappers for PDF data types (Array, Dictionary, Name, String, Stream)

63

- **Model Layer**: High-level abstractions for complex operations (Image, Metadata, Outlines, Encryption)

64

- **Helper Layer**: Utility functions and convenience methods for common operations

65

66

This design enables pikepdf to handle all PDF versions (1.1-1.7), maintain compatibility with PDF/A standards, and provide exceptional performance for production applications.

67

68

## Capabilities

69

70

### Core PDF Operations

71

72

Fundamental PDF document operations including opening, creating, saving, and basic manipulation of PDF files and their structure.

73

74

```python { .api }

75

class Pdf:

76

@staticmethod

77

def open(filename, *, password=None, hex_password=None, ignore_xref_streams=False,

78

suppress_warnings=True, attempt_recovery=True, inherit_page_attributes=True,

79

access_mode=AccessMode.default) -> Pdf: ...

80

81

@staticmethod

82

def new() -> Pdf: ...

83

84

def save(self, filename, *, static_id=False, preserve_pdfa=True,

85

min_version=None, force_version=None, fix_metadata_version=True,

86

compress_streams=True, stream_decode_level=None,

87

object_stream_mode=ObjectStreamMode.preserve,

88

normalize_content=False, linearize=False, qdf=False,

89

progress=None, encryption=None, samefile_check=True) -> None: ...

90

91

def close(self) -> None: ...

92

93

def open(filename, **kwargs) -> Pdf: ... # Alias for Pdf.open()

94

def new() -> Pdf: ... # Alias for Pdf.new()

95

```

96

97

[Core PDF Operations](./core-operations.md)

98

99

### PDF Objects and Data Types

100

101

PDF object types and data structures for manipulating the internal representation of PDF content, including arrays, dictionaries, names, strings, and streams.

102

103

```python { .api }

104

class Object:

105

def is_owned_by(self, possible_owner: Pdf) -> bool: ...

106

def same_owner_as(self, other: Object) -> bool: ...

107

def with_same_owner_as(self, other: Object) -> Object: ...

108

@staticmethod

109

def parse(data: str, *, pdf_context: Pdf = None) -> Object: ...

110

def unparse(self, *, resolved: bool = False) -> str: ...

111

112

class Array(Object): ...

113

class Dictionary(Object): ...

114

class Name(Object): ...

115

class String(Object): ...

116

class Stream(Object): ...

117

```

118

119

[PDF Objects and Data Types](./objects.md)

120

121

### Page Operations

122

123

Page-level operations including manipulation, rotation, content parsing, overlays, and coordinate transformations.

124

125

```python { .api }

126

class Page(Object):

127

def rotate(self, angle: int, *, relative: bool = True) -> None: ...

128

def add_overlay(self, other: Page) -> None: ...

129

def add_underlay(self, other: Page) -> None: ...

130

def parse_contents(self) -> list[ContentStreamInstruction]: ...

131

@property

132

def mediabox(self) -> Rectangle: ...

133

@property

134

def cropbox(self) -> Rectangle: ...

135

```

136

137

[Page Operations](./pages.md)

138

139

### Forms and Annotations

140

141

Interactive PDF elements including form fields, annotations, and user input handling with comprehensive field type support.

142

143

```python { .api }

144

class AcroForm:

145

@property

146

def exists(self) -> bool: ...

147

@property

148

def fields(self) -> list[AcroFormField]: ...

149

def add_field(self, field: AcroFormField) -> None: ...

150

def remove_fields(self, names: list[str]) -> None: ...

151

152

class AcroFormField:

153

@property

154

def field_type(self) -> str: ...

155

@property

156

def fully_qualified_name(self) -> str: ...

157

def set_value(self, value) -> None: ...

158

159

class Annotation(Object):

160

@property

161

def subtype(self) -> Name: ...

162

@property

163

def rect(self) -> Rectangle: ...

164

```

165

166

[Forms and Annotations](./forms.md)

167

168

### Images and Graphics

169

170

Image extraction, manipulation, and graphics operations including support for various formats and color spaces.

171

172

```python { .api }

173

class PdfImage:

174

def extract_to(self, *, fileprefix: str = 'image') -> str: ...

175

def as_pil_image(self) -> Any: ... # PIL.Image

176

@property

177

def width(self) -> int: ...

178

@property

179

def height(self) -> int: ...

180

@property

181

def bpc(self) -> int: ... # bits per component

182

@property

183

def colorspace(self) -> Name: ...

184

185

class PdfInlineImage:

186

def as_pil_image(self) -> Any: ... # PIL.Image

187

```

188

189

[Images and Graphics](./images.md)

190

191

### Encryption and Security

192

193

PDF encryption, decryption, password handling, and permission management for document security.

194

195

```python { .api }

196

class Encryption:

197

def __init__(self, *, owner: str = '', user: str = '', R: int = 6,

198

allow: Permissions = None, aes: bool = True,

199

metadata: bool = True) -> None: ...

200

201

class Permissions:

202

accessibility: bool

203

assemble: bool

204

extract: bool

205

modify_annotation: bool

206

modify_assembly: bool

207

modify_form: bool

208

modify_other: bool

209

print_lowres: bool

210

print_highres: bool

211

```

212

213

[Encryption and Security](./encryption.md)

214

215

### Metadata and Document Properties

216

217

Document metadata, XMP data, and PDF properties including titles, authors, creation dates, and custom metadata fields.

218

219

```python { .api }

220

class PdfMetadata:

221

def __init__(self, pdf: Pdf, *, sync_docinfo: bool = True) -> None: ...

222

@property

223

def pdfa_status(self) -> str: ...

224

def load_from_docinfo(self, docinfo: Dictionary, *, delete_missing: bool = False) -> None: ...

225

```

226

227

[Metadata and Document Properties](./metadata.md)

228

229

### Outlines and Bookmarks

230

231

Document navigation structure including bookmarks, table of contents, and document outline management.

232

233

```python { .api }

234

class Outline:

235

@property

236

def root(self) -> OutlineItem: ...

237

def open_all(self) -> None: ...

238

def close_all(self) -> None: ...

239

240

class OutlineItem:

241

@property

242

def title(self) -> str: ...

243

@property

244

def destination(self) -> PageLocation: ...

245

@property

246

def action(self) -> Dictionary: ...

247

248

def make_page_destination(pdf: Pdf, page_num: int, *, view_type: str = 'Fit') -> Array: ...

249

```

250

251

[Outlines and Bookmarks](./outlines.md)

252

253

### Content Stream Processing

254

255

Low-level content stream parsing, token filtering, and PDF operator manipulation for advanced content processing.

256

257

```python { .api }

258

def parse_content_stream(page_or_stream) -> list[ContentStreamInstruction]: ...

259

def unparse_content_stream(instructions: list[ContentStreamInstruction]) -> bytes: ...

260

261

class ContentStreamInstruction:

262

@property

263

def operands(self) -> list[Object]: ...

264

@property

265

def operator(self) -> Operator: ...

266

267

class TokenFilter:

268

def handle_token(self, token: Token) -> None: ...

269

270

class Token:

271

@property

272

def type_(self) -> TokenType: ...

273

@property

274

def raw_value(self) -> bytes: ...

275

@property

276

def value(self) -> Object: ...

277

```

278

279

[Content Stream Processing](./content-streams.md)

280

281

### File Attachments

282

283

Embedded file management including attachment, extraction, and metadata handling for portfolio PDFs and file attachments.

284

285

```python { .api }

286

class AttachedFileSpec:

287

@staticmethod

288

def from_filepath(pdf: Pdf, path: str, *, description: str = '',

289

relationship: str = '/Unspecified') -> AttachedFileSpec: ...

290

def get_file(self) -> bytes: ...

291

def get_all_filenames(self) -> dict[str, str]: ...

292

@property

293

def filename(self) -> str: ...

294

@property

295

def description(self) -> str: ...

296

```

297

298

[File Attachments](./attachments.md)

299

300

### Advanced Operations

301

302

Specialized operations including matrix transformations, coordinate systems, job interface, and tree structures for advanced PDF manipulation.

303

304

```python { .api }

305

class Matrix:

306

def __init__(self, *args) -> None: ...

307

@staticmethod

308

def identity() -> Matrix: ...

309

def translated(self, dx: float, dy: float) -> Matrix: ...

310

def scaled(self, sx: float, sy: float) -> Matrix: ...

311

def rotated(self, angle_degrees: float) -> Matrix: ...

312

313

class Rectangle:

314

def __init__(self, llx: float, lly: float, urx: float, ury: float) -> None: ...

315

@property

316

def width(self) -> float: ...

317

@property

318

def height(self) -> float: ...

319

320

class Job:

321

def run(self) -> int: ...

322

def check_configuration(self) -> bool: ...

323

def create_pdf(self) -> Pdf: ...

324

```

325

326

[Advanced Operations](./advanced.md)

327

328

## Types

329

330

```python { .api }

331

from enum import Enum

332

333

class ObjectType(Enum):

334

uninitialized = ...

335

null = ...

336

boolean = ...

337

integer = ...

338

real = ...

339

string = ...

340

name_ = ...

341

array = ...

342

dictionary = ...

343

stream = ...

344

operator = ...

345

inlineimage = ...

346

347

class AccessMode(Enum):

348

default = ...

349

mmap = ...

350

mmap_only = ...

351

stream = ...

352

353

class StreamDecodeLevel(Enum):

354

none = ...

355

generalized = ...

356

specialized = ...

357

all = ...

358

359

class ObjectStreamMode(Enum):

360

disable = ...

361

preserve = ...

362

generate = ...

363

```

364

365

## Exception Hierarchy

366

367

```python { .api }

368

# Core exceptions

369

class PdfError(Exception): ...

370

class PasswordError(PdfError): ...

371

class DataDecodingError(PdfError): ...

372

class JobUsageError(PdfError): ...

373

class ForeignObjectError(PdfError): ...

374

class DeletedObjectError(PdfError): ...

375

376

# Model exceptions

377

class DependencyError(Exception): ...

378

class OutlineStructureError(Exception): ...

379

class UnsupportedImageTypeError(Exception): ...

380

class InvalidPdfImageError(Exception): ...

381

class HifiPrintImageNotTranscodableError(Exception): ...

382

```

383

384

## Models Module

385

386

Access to higher-level PDF constructs and specialized functionality through the models submodule.

387

388

```python { .api }

389

import pikepdf.models

390

391

# Direct access to model classes and functions:

392

# pikepdf.models.PdfMetadata

393

# pikepdf.models.EncryptionInfo

394

# pikepdf.models.ContentStreamInstructions

395

# pikepdf.models.UnparseableContentStreamInstructions

396

397

# All model classes are also available directly from main pikepdf module

398

```

399

400

## Settings and Configuration

401

402

```python { .api }

403

def get_decimal_precision() -> int: ...

404

def set_decimal_precision(precision: int) -> None: ...

405

def set_flate_compression_level(level: int) -> None: ...

406

```

407

408

## Version Information

409

410

```python { .api }

411

__version__: str # pikepdf package version

412

__libqpdf_version__: str # Underlying QPDF library version

413

```