0
# Apache PDFBox
1
2
Apache PDFBox is a comprehensive Java library for programmatic manipulation of PDF documents. It provides capabilities for creating new PDFs, parsing existing documents, extracting content, rendering pages to images, and handling advanced features like forms, encryption, and digital signatures.
3
4
## Package Information
5
6
- **Package Name**: pdfbox
7
- **Package Type**: maven
8
- **Language**: Java
9
- **Installation**: Add dependency to your Maven `pom.xml`:
10
11
```xml
12
<dependency>
13
<groupId>org.apache.pdfbox</groupId>
14
<artifactId>pdfbox</artifactId>
15
<version>3.0.5</version>
16
</dependency>
17
```
18
19
## Core Imports
20
21
```java
22
import org.apache.pdfbox.Loader;
23
import org.apache.pdfbox.pdmodel.PDDocument;
24
import org.apache.pdfbox.pdmodel.PDPage;
25
import org.apache.pdfbox.pdmodel.PDPageContentStream;
26
import org.apache.pdfbox.pdmodel.common.PDRectangle;
27
```
28
29
## Basic Usage
30
31
```java
32
import org.apache.pdfbox.Loader;
33
import org.apache.pdfbox.pdmodel.PDDocument;
34
import org.apache.pdfbox.pdmodel.PDPage;
35
import org.apache.pdfbox.pdmodel.PDPageContentStream;
36
import org.apache.pdfbox.pdmodel.font.PDType1Font;
37
import java.io.File;
38
import java.io.IOException;
39
40
// Load an existing PDF
41
PDDocument document = Loader.loadPDF(new File("example.pdf"));
42
43
// Create a new PDF
44
PDDocument newDocument = new PDDocument();
45
PDPage page = new PDPage(PDRectangle.A4);
46
newDocument.addPage(page);
47
48
// Add text to a page
49
PDPageContentStream contentStream = new PDPageContentStream(newDocument, page);
50
contentStream.beginText();
51
contentStream.setFont(PDType1Font.HELVETICA, 12);
52
contentStream.newLineAtOffset(100, 700);
53
contentStream.showText("Hello, PDFBox!");
54
contentStream.endText();
55
contentStream.close();
56
57
// Save and close
58
newDocument.save("output.pdf");
59
newDocument.close();
60
document.close();
61
```
62
63
## Architecture
64
65
PDFBox is structured into several architectural layers:
66
67
- **High-Level API (pdmodel)**: User-friendly document manipulation classes
68
- **Content Stream Processing**: Parsing and generating PDF content streams
69
- **Low-Level COS Layer**: Direct PDF object manipulation (Carousel Object System)
70
- **Parsing Engine**: PDF file format parsing and writing
71
- **Rendering Engine**: Converting PDF pages to images
72
- **Text Processing**: Extracting and analyzing text content
73
74
## Capabilities
75
76
### Document Operations
77
78
Core document loading, creation, saving, and manipulation functionality. Essential for all PDF operations.
79
80
```java { .api }
81
// Document loading
82
public static PDDocument loadPDF(File file) throws IOException;
83
public static PDDocument loadPDF(InputStream input) throws IOException;
84
public static PDDocument loadPDF(File file, String password) throws IOException;
85
86
// Document creation and manipulation
87
public void addPage(PDPage page);
88
public void removePage(int pageIndex);
89
public int getNumberOfPages();
90
public void save(File file) throws IOException;
91
public void close() throws IOException;
92
```
93
94
[Document Operations](./document-operations.md)
95
96
### Text Operations
97
98
Comprehensive text extraction capabilities with support for area-based extraction, text positioning, and formatting control.
99
100
```java { .api }
101
public String getText(PDDocument document) throws IOException;
102
public void setStartPage(int startPage);
103
public void setEndPage(int endPage);
104
public void addRegion(String regionName, Rectangle2D rect);
105
public String getTextForRegion(String regionName);
106
```
107
108
[Text Operations](./text-operations.md)
109
110
### Rendering and Graphics
111
112
Convert PDF pages to images with control over resolution, color spaces, and rendering quality.
113
114
```java { .api }
115
public BufferedImage renderImage(int pageIndex) throws IOException;
116
public BufferedImage renderImageWithDPI(int pageIndex, float dpi) throws IOException;
117
public BufferedImage renderImage(int pageIndex, float scale, ImageType imageType) throws IOException;
118
```
119
120
[Rendering and Graphics](./rendering-graphics.md)
121
122
### Multi-PDF Operations
123
124
Utilities for merging, splitting, and overlaying multiple PDF documents with flexible configuration options.
125
126
```java { .api }
127
public void mergeDocuments(MemoryUsageSetting memUsageSetting) throws IOException;
128
public List<PDDocument> split() throws IOException;
129
public void overlay(Map<Integer, String> overlayGuide) throws IOException;
130
```
131
132
[Multi-PDF Operations](./multi-pdf-operations.md)
133
134
### Interactive Forms
135
136
Handle PDF forms including text fields, checkboxes, radio buttons, and form submission with full AcroForm support.
137
138
```java { .api }
139
public PDAcroForm getAcroForm();
140
public List<PDField> getFields();
141
public void setValue(String value) throws IOException;
142
public String getValue();
143
public void flatten() throws IOException;
144
```
145
146
[Interactive Forms](./interactive-forms.md)
147
148
### Security and Encryption
149
150
PDF encryption, decryption, access permissions, and digital signatures for document security.
151
152
```java { .api }
153
public void encrypt(AccessPermission ap, StandardProtectionPolicy spp) throws IOException;
154
public boolean isEncrypted();
155
public void addSignature(PDSignature signature) throws IOException;
156
public List<PDSignature> getSignatureDictionaries();
157
```
158
159
[Security and Encryption](./security-encryption.md)
160
161
### Content Stream Processing
162
163
Low-level content stream parsing and generation for advanced PDF content manipulation and custom rendering.
164
165
```java { .api }
166
public void processPage(PDPage page) throws IOException;
167
protected void processOperator(Operator operator, List<COSBase> operands) throws IOException;
168
public void processStream(PDContentStream contentStream, PDPage page, PDResources resources) throws IOException;
169
```
170
171
[Content Stream Processing](./content-stream-processing.md)
172
173
### Low-Level COS Operations
174
175
Direct manipulation of PDF objects using the Carousel Object System for advanced use cases and custom PDF structure handling.
176
177
```java { .api }
178
public COSBase getItem(COSName key);
179
public void setItem(COSName key, COSBase value);
180
public void add(COSBase object);
181
public COSBase get(int index);
182
```
183
184
[Low-Level COS Operations](./cos-operations.md)