0
# Command Line Interface
1
2
pypdfium2 provides a comprehensive command-line interface for common PDF operations. All tools can be accessed via the `pypdfium2` command or programmatically through the API.
3
4
## Core Import
5
6
```python
7
from pypdfium2 import cli_main
8
```
9
10
## Basic Usage
11
12
```bash
13
# Show version information
14
pypdfium2 --version
15
16
# Get help for all commands
17
pypdfium2 --help
18
19
# Get help for specific command
20
pypdfium2 render --help
21
```
22
23
## Programmatic API Access
24
25
```python { .api }
26
def cli_main(raw_args=None):
27
"""
28
Main CLI entry point for pypdfium2 command-line tools.
29
30
Parameters:
31
- raw_args: list[str] | None, command arguments (defaults to sys.argv[1:])
32
33
Returns:
34
int: Exit code (0 for success, non-zero for errors)
35
36
Provides programmatic access to all CLI functionality,
37
allowing integration of pypdfium2 tools in Python applications.
38
"""
39
40
def api_main(raw_args=None):
41
"""
42
Alternative API entry point with same functionality as cli_main.
43
44
Parameters:
45
- raw_args: list[str] | None, command arguments
46
47
Returns:
48
int: Exit code
49
"""
50
```
51
52
Programmatic usage example:
53
54
```python
55
import pypdfium2
56
57
# Convert images to PDF programmatically
58
exit_code = pypdfium2.cli_main([
59
'imgtopdf',
60
'image1.jpg', 'image2.png',
61
'--output', 'combined.pdf'
62
])
63
64
if exit_code == 0:
65
print("Successfully created PDF")
66
```
67
68
## Available Commands
69
70
### Document Information
71
72
#### pdfinfo - Document and Page Information
73
74
Display comprehensive information about PDF documents including metadata, page details, and document properties.
75
76
```bash
77
pypdfium2 pdfinfo document.pdf
78
pypdfium2 pdfinfo --password secret encrypted.pdf
79
pypdfium2 pdfinfo --pages document.pdf # Include page-level details
80
```
81
82
### Text Operations
83
84
#### extract-text - Text Extraction
85
86
Extract text content from PDF pages with various formatting and output options.
87
88
```bash
89
pypdfium2 extract-text document.pdf
90
pypdfium2 extract-text document.pdf --pages 1-5
91
pypdfium2 extract-text document.pdf --output text.txt
92
pypdfium2 extract-text document.pdf --no-layout # Disable layout preservation
93
```
94
95
### Image Operations
96
97
#### extract-images - Image Extraction
98
99
Extract embedded images from PDF pages to image files.
100
101
```bash
102
pypdfium2 extract-images document.pdf
103
pypdfium2 extract-images document.pdf --pages 1,3,5
104
pypdfium2 extract-images document.pdf --output images/
105
pypdfium2 extract-images document.pdf --format png
106
```
107
108
#### imgtopdf - Image to PDF Conversion
109
110
Convert image files to PDF documents with size and layout options.
111
112
```bash
113
pypdfium2 imgtopdf image1.jpg image2.png --output combined.pdf
114
pypdfium2 imgtopdf *.jpg --size letter --output photos.pdf
115
pypdfium2 imgtopdf image.png --width 8.5 --height 11 --output sized.pdf
116
```
117
118
#### render - Page Rendering
119
120
Render PDF pages to image files with customizable resolution, format, and rendering options.
121
122
```bash
123
pypdfium2 render document.pdf
124
pypdfium2 render document.pdf --scale 2.0 # High resolution
125
pypdfium2 render document.pdf --pages 1-10 --format png
126
pypdfium2 render document.pdf --width 1920 --height 1080
127
pypdfium2 render document.pdf --output rendered/
128
```
129
130
### Page Operations
131
132
#### arrange - Document Arrangement
133
134
Rearrange, merge, and reorganize PDF documents and pages.
135
136
```bash
137
pypdfium2 arrange input1.pdf input2.pdf --output merged.pdf
138
pypdfium2 arrange document.pdf --pages 1,3,5-10 --output selected.pdf
139
pypdfium2 arrange doc1.pdf doc2.pdf --rotate 90 --output rotated.pdf
140
```
141
142
#### tile - Page Tiling (N-up)
143
144
Arrange multiple pages on single pages in various grid layouts.
145
146
```bash
147
pypdfium2 tile document.pdf --grid 2x2 --output tiled.pdf
148
pypdfium2 tile document.pdf --grid 1x2 --pages 1-20 --output booklet.pdf
149
pypdfium2 tile document.pdf --grid 3x3 --scale 0.8 --output ninup.pdf
150
```
151
152
### Document Structure
153
154
#### toc - Table of Contents
155
156
Display and extract PDF document outline/bookmark structure.
157
158
```bash
159
pypdfium2 toc document.pdf
160
pypdfium2 toc document.pdf --max-depth 3
161
pypdfium2 toc document.pdf --output bookmarks.txt
162
```
163
164
#### pageobjects - Page Object Analysis
165
166
Analyze and display information about objects within PDF pages.
167
168
```bash
169
pypdfium2 pageobjects document.pdf
170
pypdfium2 pageobjects document.pdf --pages 1-5
171
pypdfium2 pageobjects document.pdf --type image # Only image objects
172
pypdfium2 pageobjects document.pdf --verbose
173
```
174
175
### File Management
176
177
#### attachments - Embedded File Management
178
179
List, extract, and manage embedded file attachments within PDF documents.
180
181
```bash
182
pypdfium2 attachments document.pdf # List attachments
183
pypdfium2 attachments document.pdf --extract # Extract all
184
pypdfium2 attachments document.pdf --extract --output attachments/
185
pypdfium2 attachments document.pdf --index 0 --extract # Extract specific
186
```
187
188
## Common Options
189
190
Most commands support these common options:
191
192
- `--help, -h`: Show command-specific help
193
- `--pages`: Specify page ranges (e.g., `1-5`, `1,3,5`, `all`)
194
- `--password`: Password for encrypted PDFs
195
- `--output, -o`: Output file or directory path
196
- `--verbose, -v`: Enable verbose output
197
- `--quiet, -q`: Suppress non-error output
198
199
## Error Handling
200
201
CLI tools return standard exit codes:
202
- `0`: Success
203
- `1`: General error
204
- `2`: Invalid arguments
205
- `3`: File not found or access error
206
207
When using programmatically, check exit codes:
208
209
```python
210
import pypdfium2
211
212
result = pypdfium2.cli_main(['pdfinfo', 'nonexistent.pdf'])
213
if result != 0:
214
print("Command failed")
215
```
216
217
## Integration Examples
218
219
### Batch Processing
220
221
```python
222
import pypdfium2
223
import glob
224
225
# Process all PDFs in directory
226
for pdf_file in glob.glob("*.pdf"):
227
# Extract text from each PDF
228
exit_code = pypdfium2.cli_main([
229
'extract-text',
230
pdf_file,
231
'--output', f"{pdf_file}.txt"
232
])
233
234
if exit_code == 0:
235
print(f"Processed {pdf_file}")
236
else:
237
print(f"Failed to process {pdf_file}")
238
```
239
240
### Automated Report Generation
241
242
```python
243
import pypdfium2
244
import subprocess
245
246
def process_document(pdf_path, output_dir):
247
"""Process PDF with multiple operations."""
248
249
operations = [
250
# Get document info
251
['pdfinfo', pdf_path, '--output', f"{output_dir}/info.txt"],
252
253
# Extract text
254
['extract-text', pdf_path, '--output', f"{output_dir}/text.txt"],
255
256
# Render first page as thumbnail
257
['render', pdf_path, '--pages', '1', '--width', '200',
258
'--output', f"{output_dir}/thumbnail.png"],
259
260
# Extract table of contents
261
['toc', pdf_path, '--output', f"{output_dir}/toc.txt"]
262
]
263
264
results = {}
265
for operation in operations:
266
exit_code = pypdfium2.cli_main(operation)
267
operation_name = operation[0]
268
results[operation_name] = exit_code == 0
269
270
return results
271
272
# Process document
273
results = process_document("report.pdf", "output/")
274
print(f"Processing results: {results}")
275
```