or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/maven-org-apache-pdfbox--pdfbox

The Apache PDFBox library is an open source Java tool for working with PDF documents.

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
mavenpkg:maven/org.apache.pdfbox/pdfbox@3.0.x

To install, run

npx @tessl/cli install tessl/maven-org-apache-pdfbox--pdfbox@3.0.0

0

# Apache PDFBox

1

2

Apache PDFBox is a comprehensive Java library for programmatic manipulation of PDF documents. It provides capabilities for creating new PDFs, parsing existing documents, extracting content, rendering pages to images, and handling advanced features like forms, encryption, and digital signatures.

3

4

## Package Information

5

6

- **Package Name**: pdfbox

7

- **Package Type**: maven

8

- **Language**: Java

9

- **Installation**: Add dependency to your Maven `pom.xml`:

10

11

```xml

12

<dependency>

13

<groupId>org.apache.pdfbox</groupId>

14

<artifactId>pdfbox</artifactId>

15

<version>3.0.5</version>

16

</dependency>

17

```

18

19

## Core Imports

20

21

```java

22

import org.apache.pdfbox.Loader;

23

import org.apache.pdfbox.pdmodel.PDDocument;

24

import org.apache.pdfbox.pdmodel.PDPage;

25

import org.apache.pdfbox.pdmodel.PDPageContentStream;

26

import org.apache.pdfbox.pdmodel.common.PDRectangle;

27

```

28

29

## Basic Usage

30

31

```java

32

import org.apache.pdfbox.Loader;

33

import org.apache.pdfbox.pdmodel.PDDocument;

34

import org.apache.pdfbox.pdmodel.PDPage;

35

import org.apache.pdfbox.pdmodel.PDPageContentStream;

36

import org.apache.pdfbox.pdmodel.font.PDType1Font;

37

import java.io.File;

38

import java.io.IOException;

39

40

// Load an existing PDF

41

PDDocument document = Loader.loadPDF(new File("example.pdf"));

42

43

// Create a new PDF

44

PDDocument newDocument = new PDDocument();

45

PDPage page = new PDPage(PDRectangle.A4);

46

newDocument.addPage(page);

47

48

// Add text to a page

49

PDPageContentStream contentStream = new PDPageContentStream(newDocument, page);

50

contentStream.beginText();

51

contentStream.setFont(PDType1Font.HELVETICA, 12);

52

contentStream.newLineAtOffset(100, 700);

53

contentStream.showText("Hello, PDFBox!");

54

contentStream.endText();

55

contentStream.close();

56

57

// Save and close

58

newDocument.save("output.pdf");

59

newDocument.close();

60

document.close();

61

```

62

63

## Architecture

64

65

PDFBox is structured into several architectural layers:

66

67

- **High-Level API (pdmodel)**: User-friendly document manipulation classes

68

- **Content Stream Processing**: Parsing and generating PDF content streams

69

- **Low-Level COS Layer**: Direct PDF object manipulation (Carousel Object System)

70

- **Parsing Engine**: PDF file format parsing and writing

71

- **Rendering Engine**: Converting PDF pages to images

72

- **Text Processing**: Extracting and analyzing text content

73

74

## Capabilities

75

76

### Document Operations

77

78

Core document loading, creation, saving, and manipulation functionality. Essential for all PDF operations.

79

80

```java { .api }

81

// Document loading

82

public static PDDocument loadPDF(File file) throws IOException;

83

public static PDDocument loadPDF(InputStream input) throws IOException;

84

public static PDDocument loadPDF(File file, String password) throws IOException;

85

86

// Document creation and manipulation

87

public void addPage(PDPage page);

88

public void removePage(int pageIndex);

89

public int getNumberOfPages();

90

public void save(File file) throws IOException;

91

public void close() throws IOException;

92

```

93

94

[Document Operations](./document-operations.md)

95

96

### Text Operations

97

98

Comprehensive text extraction capabilities with support for area-based extraction, text positioning, and formatting control.

99

100

```java { .api }

101

public String getText(PDDocument document) throws IOException;

102

public void setStartPage(int startPage);

103

public void setEndPage(int endPage);

104

public void addRegion(String regionName, Rectangle2D rect);

105

public String getTextForRegion(String regionName);

106

```

107

108

[Text Operations](./text-operations.md)

109

110

### Rendering and Graphics

111

112

Convert PDF pages to images with control over resolution, color spaces, and rendering quality.

113

114

```java { .api }

115

public BufferedImage renderImage(int pageIndex) throws IOException;

116

public BufferedImage renderImageWithDPI(int pageIndex, float dpi) throws IOException;

117

public BufferedImage renderImage(int pageIndex, float scale, ImageType imageType) throws IOException;

118

```

119

120

[Rendering and Graphics](./rendering-graphics.md)

121

122

### Multi-PDF Operations

123

124

Utilities for merging, splitting, and overlaying multiple PDF documents with flexible configuration options.

125

126

```java { .api }

127

public void mergeDocuments(MemoryUsageSetting memUsageSetting) throws IOException;

128

public List<PDDocument> split() throws IOException;

129

public void overlay(Map<Integer, String> overlayGuide) throws IOException;

130

```

131

132

[Multi-PDF Operations](./multi-pdf-operations.md)

133

134

### Interactive Forms

135

136

Handle PDF forms including text fields, checkboxes, radio buttons, and form submission with full AcroForm support.

137

138

```java { .api }

139

public PDAcroForm getAcroForm();

140

public List<PDField> getFields();

141

public void setValue(String value) throws IOException;

142

public String getValue();

143

public void flatten() throws IOException;

144

```

145

146

[Interactive Forms](./interactive-forms.md)

147

148

### Security and Encryption

149

150

PDF encryption, decryption, access permissions, and digital signatures for document security.

151

152

```java { .api }

153

public void encrypt(AccessPermission ap, StandardProtectionPolicy spp) throws IOException;

154

public boolean isEncrypted();

155

public void addSignature(PDSignature signature) throws IOException;

156

public List<PDSignature> getSignatureDictionaries();

157

```

158

159

[Security and Encryption](./security-encryption.md)

160

161

### Content Stream Processing

162

163

Low-level content stream parsing and generation for advanced PDF content manipulation and custom rendering.

164

165

```java { .api }

166

public void processPage(PDPage page) throws IOException;

167

protected void processOperator(Operator operator, List<COSBase> operands) throws IOException;

168

public void processStream(PDContentStream contentStream, PDPage page, PDResources resources) throws IOException;

169

```

170

171

[Content Stream Processing](./content-stream-processing.md)

172

173

### Low-Level COS Operations

174

175

Direct manipulation of PDF objects using the Carousel Object System for advanced use cases and custom PDF structure handling.

176

177

```java { .api }

178

public COSBase getItem(COSName key);

179

public void setItem(COSName key, COSBase value);

180

public void add(COSBase object);

181

public COSBase get(int index);

182

```

183

184

[Low-Level COS Operations](./cos-operations.md)