or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

errors-and-utilities.mdindex.mdpage-manipulation.mdpdf-merging.mdpdf-reading.mdpdf-writing.mdtypes-and-objects.md

types-and-objects.mddocs/

0

# Types and Objects

1

2

Low-level PDF object types for advanced manipulation, constants, type definitions, and utility functions used throughout the PyPDF2 library. These components provide the foundation for PDF specification-level operations.

3

4

## Capabilities

5

6

### Generic PDF Objects

7

8

Base classes and data structures that represent PDF objects according to the PDF specification.

9

10

```python { .api }

11

class PdfObject:

12

"""Base class for all PDF objects."""

13

14

class NullObject(PdfObject):

15

"""PDF null object representation."""

16

17

class BooleanObject(PdfObject):

18

"""PDF boolean object (true/false)."""

19

20

class IndirectObject(PdfObject):

21

"""PDF indirect object reference."""

22

23

@property

24

def idnum(self) -> int:

25

"""Object ID number."""

26

27

@property

28

def generation(self) -> int:

29

"""Object generation number."""

30

31

@property

32

def pdf(self):

33

"""Associated PDF reader."""

34

35

class FloatObject(float, PdfObject):

36

"""PDF floating-point number object."""

37

38

class NumberObject(int, PdfObject):

39

"""PDF integer number object."""

40

41

class ByteStringObject(bytes, PdfObject):

42

"""PDF byte string object."""

43

44

class TextStringObject(str, PdfObject):

45

"""PDF text string object."""

46

47

class NameObject(str, PdfObject):

48

"""PDF name object (starts with /)."""

49

```

50

51

### Data Structure Objects

52

53

Collections and containers for PDF data structures.

54

55

```python { .api }

56

class ArrayObject(list, PdfObject):

57

"""PDF array object (list-like)."""

58

59

class DictionaryObject(dict, PdfObject):

60

"""PDF dictionary object (dict-like)."""

61

62

class TreeObject(DictionaryObject):

63

"""PDF tree structure for hierarchical data."""

64

65

class StreamObject(PdfObject):

66

"""PDF stream object containing binary data."""

67

68

class DecodedStreamObject(StreamObject):

69

"""Decoded (uncompressed) PDF stream."""

70

71

class EncodedStreamObject(StreamObject):

72

"""Encoded (compressed) PDF stream."""

73

74

class ContentStream(DecodedStreamObject):

75

"""PDF content stream with page content operations."""

76

77

class Field(TreeObject):

78

"""PDF form field object."""

79

```

80

81

### Navigation and Annotation Objects

82

83

Objects for document navigation, bookmarks, and annotations.

84

85

```python { .api }

86

class Destination(DictionaryObject):

87

"""PDF destination for navigation."""

88

89

@property

90

def title(self) -> Optional[str]:

91

"""Destination title."""

92

93

@property

94

def page(self):

95

"""Target page reference."""

96

97

@property

98

def typ(self) -> str:

99

"""Destination type (fit type)."""

100

101

class OutlineItem(DictionaryObject):

102

"""PDF outline item (bookmark)."""

103

104

@property

105

def title(self) -> Optional[str]:

106

"""Bookmark title."""

107

108

@property

109

def page(self):

110

"""Target page reference."""

111

112

@property

113

def parent(self):

114

"""Parent outline item."""

115

116

@property

117

def children(self):

118

"""Child outline items."""

119

120

class Bookmark(OutlineItem):

121

"""DEPRECATED: Use OutlineItem instead."""

122

123

class AnnotationBuilder:

124

"""Builder for creating PDF annotations."""

125

126

# Methods for building various annotation types

127

# Implementation depends on annotation type

128

```

129

130

### Utility Objects and Functions

131

132

Helper classes and functions for PDF manipulation.

133

134

```python { .api }

135

class PageRange:

136

"""Slice-like representation of page ranges."""

137

138

def __init__(self, arg: Union[slice, "PageRange", str]):

139

"""

140

Create a PageRange from various input types.

141

142

Args:

143

arg: Range specification (string, slice, or PageRange)

144

"""

145

146

def to_slice(self) -> slice:

147

"""Convert to Python slice object."""

148

149

def indices(self, n: int) -> Tuple[int, int, int]:

150

"""

151

Get slice indices for given length.

152

153

Args:

154

n (int): Total length

155

156

Returns:

157

tuple: (start, stop, step) indices

158

"""

159

160

@staticmethod

161

def valid(input: Any) -> bool:

162

"""

163

Check if input is valid for PageRange.

164

165

Args:

166

input: Input to validate

167

168

Returns:

169

bool: True if valid

170

"""

171

172

class PaperSize:

173

"""Standard paper size constants."""

174

175

A0: 'Dimensions' # 2384 x 3371 points

176

A1: 'Dimensions' # 1685 x 2384 points

177

A2: 'Dimensions' # 1190 x 1685 points

178

A3: 'Dimensions' # 842 x 1190 points

179

A4: 'Dimensions' # 595 x 842 points

180

A5: 'Dimensions' # 420 x 595 points

181

A6: 'Dimensions' # 298 x 420 points

182

A7: 'Dimensions' # 210 x 298 points

183

A8: 'Dimensions' # 147 x 210 points

184

C4: 'Dimensions' # 649 x 918 points (envelope)

185

186

class PasswordType:

187

"""Enumeration for password validation results."""

188

189

NOT_DECRYPTED: int = 0

190

USER_PASSWORD: int = 1

191

OWNER_PASSWORD: int = 2

192

193

# Utility functions

194

def create_string_object(string: str, forced_encoding=None) -> Union[TextStringObject, ByteStringObject]:

195

"""

196

Create appropriate string object based on content.

197

198

Args:

199

string (str): String content

200

forced_encoding (str, optional): Force specific encoding

201

202

Returns:

203

Union[TextStringObject, ByteStringObject]: Appropriate string object

204

"""

205

206

def encode_pdfdocencoding(unicode_string: str) -> bytes:

207

"""

208

Encode string using PDF document encoding.

209

210

Args:

211

unicode_string (str): Unicode string to encode

212

213

Returns:

214

bytes: Encoded bytes

215

"""

216

217

def decode_pdfdocencoding(byte_string: bytes) -> str:

218

"""

219

Decode bytes using PDF document encoding.

220

221

Args:

222

byte_string (bytes): Bytes to decode

223

224

Returns:

225

str: Decoded string

226

"""

227

228

def hex_to_rgb(color: str) -> Tuple[float, float, float]:

229

"""

230

Convert hex color to RGB tuple.

231

232

Args:

233

color (str): Hex color string (e.g., "#FF0000")

234

235

Returns:

236

tuple: (red, green, blue) values 0.0-1.0

237

"""

238

239

def read_object(stream, pdf) -> PdfObject:

240

"""

241

Read a PDF object from stream.

242

243

Args:

244

stream: Input stream

245

pdf: PDF reader reference

246

247

Returns:

248

PdfObject: Parsed PDF object

249

"""

250

251

def parse_filename_page_ranges(args: List[Union[str, PageRange, None]]) -> List[Tuple[str, PageRange]]:

252

"""

253

Parse filename and page range arguments.

254

255

Args:

256

args: Command-line style arguments

257

258

Returns:

259

list: List of (filename, page_range) tuples

260

"""

261

```

262

263

### Type Definitions

264

265

Type aliases and definitions used throughout the library.

266

267

```python { .api }

268

# Border array for annotations

269

BorderArrayType = List[Union[NameObject, NumberObject, ArrayObject]]

270

271

# Outline item types

272

OutlineItemType = Union[OutlineItem, Destination]

273

274

# PDF fit types for destinations

275

FitType = Literal["/Fit", "/XYZ", "/FitH", "/FitV", "/FitR", "/FitB", "/FitBH", "/FitBV"]

276

277

# Zoom argument types

278

ZoomArgType = Union[NumberObject, NullObject, float]

279

ZoomArgsType = List[ZoomArgType]

280

281

# Complex outline structure type

282

OutlineType = List[Union[OutlineItemType, List]]

283

284

# Page layout types

285

LayoutType = Literal[

286

"/SinglePage", "/OneColumn", "/TwoColumnLeft", "/TwoColumnRight",

287

"/TwoPageLeft", "/TwoPageRight"

288

]

289

290

# Page mode types

291

PagemodeType = Literal[

292

"/UseNone", "/UseOutlines", "/UseThumbs", "/FullScreen",

293

"/UseOC", "/UseAttachments"

294

]

295

296

# Page range specification types

297

PageRangeSpec = Union[str, PageRange, Tuple[int, int], Tuple[int, int, int], List[int]]

298

299

# Dimension type for paper sizes

300

class Dimensions:

301

"""Represents paper dimensions."""

302

303

def __init__(self, width: float, height: float):

304

"""

305

Create dimensions.

306

307

Args:

308

width (float): Width in points

309

height (float): Height in points

310

"""

311

self.width = width

312

self.height = height

313

```

314

315

## Usage Examples

316

317

### Working with Generic Objects

318

319

```python

320

from PyPDF2 import PdfReader

321

from PyPDF2.generic import DictionaryObject, ArrayObject, NameObject

322

323

reader = PdfReader("document.pdf")

324

325

# Access raw PDF objects

326

for page in reader.pages:

327

# Pages are DictionaryObject instances

328

if isinstance(page, DictionaryObject):

329

# Access dictionary entries

330

mediabox = page.get("/MediaBox")

331

if isinstance(mediabox, ArrayObject):

332

print(f"MediaBox: {[float(x) for x in mediabox]}")

333

334

# Check for resources

335

resources = page.get("/Resources")

336

if resources:

337

fonts = resources.get("/Font", {})

338

print(f"Fonts: {list(fonts.keys())}")

339

```

340

341

### Using Page Ranges

342

343

```python

344

from PyPDF2 import PdfMerger, PageRange

345

346

merger = PdfMerger()

347

348

# Various ways to specify page ranges

349

merger.append("doc1.pdf", pages=PageRange("1:5")) # Pages 1-4

350

merger.append("doc2.pdf", pages=PageRange("::2")) # Every other page

351

merger.append("doc3.pdf", pages=PageRange("10:")) # Page 10 to end

352

merger.append("doc4.pdf", pages=PageRange([1, 3, 5])) # Specific pages

353

354

# Validate page range

355

if PageRange.valid("1:10"):

356

print("Valid page range")

357

358

merger.write("output.pdf")

359

merger.close()

360

```

361

362

### Working with Paper Sizes

363

364

```python

365

from PyPDF2 import PdfWriter

366

from PyPDF2.generic import PaperSize

367

368

writer = PdfWriter()

369

370

# Create pages with standard sizes

371

a4_page = writer.add_blank_page(PaperSize.A4.width, PaperSize.A4.height)

372

letter_page = writer.add_blank_page(612, 792) # US Letter

373

a3_page = writer.add_blank_page(PaperSize.A3.width, PaperSize.A3.height)

374

375

print(f"A4 size: {PaperSize.A4.width} x {PaperSize.A4.height} points")

376

print(f"A3 size: {PaperSize.A3.width} x {PaperSize.A3.height} points")

377

378

with open("standard_sizes.pdf", "wb") as output_file:

379

writer.write(output_file)

380

```

381

382

### Creating Custom PDF Objects

383

384

```python

385

from PyPDF2.generic import (

386

DictionaryObject, ArrayObject, NameObject,

387

TextStringObject, NumberObject

388

)

389

390

# Create a custom dictionary object

391

custom_dict = DictionaryObject({

392

NameObject("/Type"): NameObject("/Annotation"),

393

NameObject("/Subtype"): NameObject("/Text"),

394

NameObject("/Contents"): TextStringObject("Custom note"),

395

NameObject("/Rect"): ArrayObject([

396

NumberObject(100), NumberObject(100),

397

NumberObject(200), NumberObject(150)

398

])

399

})

400

401

print(f"Custom object: {custom_dict}")

402

```

403

404

### String Encoding Utilities

405

406

```python

407

from PyPDF2.generic import (

408

create_string_object, encode_pdfdocencoding,

409

decode_pdfdocencoding, hex_to_rgb

410

)

411

412

# Create appropriate string objects

413

text = create_string_object("Hello, World!")

414

binary_text = create_string_object("\\x00\\xff\\x42", "latin-1")

415

416

# Encoding/decoding

417

unicode_text = "Héllo, Wørld!"

418

encoded = encode_pdfdocencoding(unicode_text)

419

decoded = decode_pdfdocencoding(encoded)

420

421

print(f"Original: {unicode_text}")

422

print(f"Decoded: {decoded}")

423

424

# Color conversion

425

red_rgb = hex_to_rgb("#FF0000") # (1.0, 0.0, 0.0)

426

blue_rgb = hex_to_rgb("#0000FF") # (0.0, 0.0, 1.0)

427

print(f"Red RGB: {red_rgb}")

428

print(f"Blue RGB: {blue_rgb}")

429

```

430

431

### Working with Outlines and Destinations

432

433

```python

434

from PyPDF2 import PdfReader

435

from PyPDF2.generic import OutlineItem, Destination

436

437

reader = PdfReader("document.pdf")

438

439

# Access document outline

440

outline = reader.outline

441

if outline:

442

def print_outline(items, level=0):

443

for item in items:

444

if isinstance(item, OutlineItem):

445

indent = " " * level

446

print(f"{indent}{item.title}")

447

if hasattr(item, 'children') and item.children:

448

print_outline(item.children, level + 1)

449

elif isinstance(item, list):

450

print_outline(item, level)

451

452

print_outline(outline)

453

454

# Access named destinations

455

destinations = reader.named_destinations

456

for name, dest in destinations.items():

457

if isinstance(dest, Destination):

458

print(f"Destination '{name}' -> Page {dest.page}, Type: {dest.typ}")

459

```

460

461

### Password Type Checking

462

463

```python

464

from PyPDF2 import PdfReader, PasswordType

465

466

reader = PdfReader("encrypted.pdf")

467

468

if reader.is_encrypted:

469

# Try different password types

470

result = reader.decrypt("user_password")

471

472

if result == PasswordType.USER_PASSWORD:

473

print("Opened with user password - some restrictions may apply")

474

elif result == PasswordType.OWNER_PASSWORD:

475

print("Opened with owner password - full access")

476

elif result == PasswordType.NOT_DECRYPTED:

477

print("Password incorrect or file corrupted")

478

```

479

480

## Constants and Enumerations

481

482

PyPDF2 includes extensive constants from the PDF specification organized in the `constants` module:

483

484

### Key Constants

485

486

```python { .api }

487

# Core PDF constants

488

class Core:

489

OUTLINES = "/Outlines"

490

THREADS = "/Threads"

491

PAGE = "/Page"

492

PAGES = "/Pages"

493

CATALOG = "/Catalog"

494

495

# User access permissions

496

class UserAccessPermissions:

497

PRINT = 1 << 2

498

MODIFY = 1 << 3

499

COPY = 1 << 4

500

ADD_OR_MODIFY = 1 << 5

501

502

# PDF filter types

503

class FilterTypes:

504

FLATE_DECODE = "/FlateDecode"

505

LZW_DECODE = "/LZWDecode"

506

ASCII_HEX_DECODE = "/ASCIIHexDecode"

507

DCT_DECODE = "/DCTDecode"

508

```

509

510

These constants ensure compliance with PDF specification requirements and provide standardized access to PDF dictionary keys and values.