0
# Document API
1
2
Core functionality for loading PDF documents from URLs, binary data, or streams. Handles document-level operations like metadata extraction, page navigation, and resource management.
3
4
## Capabilities
5
6
### Document Loading
7
8
The primary entry point for loading PDF documents with comprehensive configuration options.
9
10
```javascript { .api }
11
/**
12
* Loads a PDF document from various sources
13
* @param src - Document source (URL, binary data, or parameters object)
14
* @returns Promise-based loading task for document access
15
*/
16
function getDocument(src: string | Uint8Array | ArrayBuffer | DocumentInitParameters): PDFDocumentLoadingTask;
17
18
interface DocumentInitParameters {
19
/** URL to the PDF document */
20
url?: string;
21
/** Binary PDF data as typed array */
22
data?: Uint8Array | ArrayBuffer;
23
/** HTTP headers for document requests */
24
httpHeaders?: Record<string, string>;
25
/** Include credentials in cross-origin requests */
26
withCredentials?: boolean;
27
/** Password for encrypted PDFs */
28
password?: string;
29
/** Expected document length for optimization */
30
length?: number;
31
/** Custom data range transport */
32
range?: PDFDataRangeTransport;
33
/** Custom worker instance */
34
worker?: PDFWorker;
35
/** Logging verbosity level (0-5) */
36
verbosity?: number;
37
/** Base URL for relative links */
38
docBaseUrl?: string;
39
/** URL path to character mapping files */
40
cMapUrl?: string;
41
/** Whether character maps are binary packed */
42
cMapPacked?: boolean;
43
/** Custom character map reader factory */
44
CMapReaderFactory?: any;
45
/** Use system fonts when available */
46
useSystemFonts?: boolean;
47
/** URL path to standard font data */
48
standardFontDataUrl?: string;
49
/** Custom standard font data factory */
50
StandardFontDataFactory?: any;
51
/** Use worker for fetch operations */
52
useWorkerFetch?: boolean;
53
/** JavaScript evaluation support */
54
isEvalSupported?: boolean;
55
/** OffscreenCanvas support for rendering */
56
isOffscreenCanvasSupported?: boolean;
57
/** Maximum canvas area in bytes */
58
canvasMaxAreaInBytes?: number;
59
/** Disable @font-face rules */
60
disableFontFace?: boolean;
61
/** Include extra font properties */
62
fontExtraProperties?: boolean;
63
/** Enable XFA form support */
64
enableXfa?: boolean;
65
/** Owner document for DOM operations */
66
ownerDocument?: Document;
67
/** Disable byte range requests */
68
disableRange?: boolean;
69
/** Disable streaming */
70
disableStream?: boolean;
71
/** Disable auto-fetch of missing data */
72
disableAutoFetch?: boolean;
73
/** Enable PDF debugging features */
74
pdfBug?: boolean;
75
}
76
```
77
78
**Usage Examples:**
79
80
```javascript
81
import { getDocument } from "pdfjs-dist";
82
83
// Load from URL
84
const loadingTask = getDocument("https://example.com/document.pdf");
85
const pdf = await loadingTask.promise;
86
87
// Load from binary data
88
const arrayBuffer = await fetch("document.pdf").then(r => r.arrayBuffer());
89
const loadingTask2 = getDocument(new Uint8Array(arrayBuffer));
90
const pdf2 = await loadingTask2.promise;
91
92
// Load with configuration
93
const loadingTask3 = getDocument({
94
url: "document.pdf",
95
httpHeaders: { "Authorization": "Bearer token" },
96
cMapUrl: "./cmaps/",
97
cMapPacked: true
98
});
99
const pdf3 = await loadingTask3.promise;
100
```
101
102
### Document Loading Task
103
104
Represents an ongoing document loading operation with progress tracking and cancellation support.
105
106
```javascript { .api }
107
interface PDFDocumentLoadingTask {
108
/** Promise that resolves to the loaded PDF document */
109
promise: Promise<PDFDocumentProxy>;
110
/** Destroy/cancel the loading task */
111
destroy(): void;
112
/** Document loading progress callback */
113
onProgress?: (progressData: OnProgressParameters) => void;
114
/** Password required callback for encrypted documents */
115
onPassword?: (updatePassword: (password: string) => void, reason: number) => void;
116
}
117
118
interface OnProgressParameters {
119
/** Bytes loaded so far */
120
loaded: number;
121
/** Total bytes to load (if known) */
122
total?: number;
123
/** Loading progress percentage */
124
percent?: number;
125
}
126
```
127
128
### PDF Document Proxy
129
130
Main interface for interacting with a loaded PDF document, providing access to pages, metadata, and document-level operations.
131
132
```javascript { .api }
133
interface PDFDocumentProxy {
134
/** Number of pages in the document */
135
numPages: number;
136
/** Document fingerprint for caching */
137
fingerprints: string[];
138
/** Loading parameters used */
139
loadingParams: DocumentInitParameters;
140
/** Loading task that created this document */
141
loadingTask: PDFDocumentLoadingTask;
142
143
/**
144
* Get a specific page by number (1-indexed)
145
* @param pageNumber - Page number (1 to numPages)
146
* @returns Promise resolving to page proxy
147
*/
148
getPage(pageNumber: number): Promise<PDFPageProxy>;
149
150
/**
151
* Get page index from page reference
152
* @param ref - Page reference object
153
* @returns Promise resolving to 0-based page index
154
*/
155
getPageIndex(ref: RefProxy): Promise<number>;
156
157
/**
158
* Get named destinations in the document
159
* @returns Promise resolving to destination mapping
160
*/
161
getDestinations(): Promise<{ [name: string]: any }>;
162
163
/**
164
* Get specific destination by ID
165
* @param id - Destination identifier
166
* @returns Promise resolving to destination array
167
*/
168
getDestination(id: string): Promise<any[] | null>;
169
170
/**
171
* Get document outline/bookmarks
172
* @returns Promise resolving to outline tree
173
*/
174
getOutline(): Promise<any[]>;
175
176
/**
177
* Get document permissions
178
* @returns Promise resolving to permission flags
179
*/
180
getPermissions(): Promise<number[]>;
181
182
/**
183
* Get document metadata
184
* @returns Promise resolving to metadata object
185
*/
186
getMetadata(): Promise<{ info: any; metadata: Metadata | null; contentDispositionFilename?: string }>;
187
188
/**
189
* Get document data as Uint8Array
190
* @returns Promise resolving to document bytes
191
*/
192
getData(): Promise<Uint8Array>;
193
194
/**
195
* Get download info for saving
196
* @returns Promise resolving to download information
197
*/
198
getDownloadInfo(): Promise<{ length: number }>;
199
200
/**
201
* Get document statistics
202
* @returns Promise resolving to stats object
203
*/
204
getStats(): Promise<{ streamTypes: any; fontTypes: any }>;
205
206
/**
207
* Get page labels/numbering information
208
* @returns Promise resolving to label array
209
*/
210
getPageLabels(): Promise<string[] | null>;
211
212
/**
213
* Get page layout setting
214
* @returns Promise resolving to layout name
215
*/
216
getPageLayout(): Promise<string>;
217
218
/**
219
* Get page mode setting
220
* @returns Promise resolving to mode name
221
*/
222
getPageMode(): Promise<string>;
223
224
/**
225
* Get viewer preferences
226
* @returns Promise resolving to preferences object
227
*/
228
getViewerPreferences(): Promise<any>;
229
230
/**
231
* Get document attachments
232
* @returns Promise resolving to attachments object
233
*/
234
getAttachments(): Promise<{ [filename: string]: any }>;
235
236
/**
237
* Get document open action
238
* @returns Promise resolving to open action
239
*/
240
getOpenAction(): Promise<any>;
241
242
/**
243
* Get optional content configuration
244
* @param params - Configuration parameters
245
* @returns Promise resolving to config object
246
*/
247
getOptionalContentConfig(params?: { intent?: string }): Promise<OptionalContentConfig>;
248
249
/**
250
* Get mark info for accessibility
251
* @returns Promise resolving to mark info object
252
*/
253
getMarkInfo(): Promise<any>;
254
255
/**
256
* Get annotations filtered by type
257
* @param types - Array of annotation types to include
258
* @param pageIndexesToSkip - Page indices to skip
259
* @returns Promise resolving to annotations array
260
*/
261
getAnnotationsByType(types: number[], pageIndexesToSkip?: number[]): Promise<any[]>;
262
263
/**
264
* Get JavaScript actions in document
265
* @returns Promise resolving to actions object
266
*/
267
getJSActions(): Promise<{ [name: string]: any }>;
268
269
/**
270
* Get field objects for forms
271
* @returns Promise resolving to field mapping
272
*/
273
getFieldObjects(): Promise<{ [id: string]: any }>;
274
275
/**
276
* Check if document has JavaScript actions
277
* @returns Promise resolving to boolean
278
*/
279
hasJSActions(): Promise<boolean>;
280
281
/**
282
* Get calculate order for form fields
283
* @returns Promise resolving to field order array
284
*/
285
getCalculationOrderIds(): Promise<string[]>;
286
287
/**
288
* Clean up document resources
289
* @param keepLoadedFonts - Keep loaded fonts in memory
290
*/
291
cleanup(keepLoadedFonts?: boolean): void;
292
293
/**
294
* Destroy document and release all resources
295
*/
296
destroy(): void;
297
298
/**
299
* Get structure tree for accessibility
300
* @returns Promise resolving to structure tree
301
*/
302
getStructTree(pageIndex: number): Promise<any>;
303
304
/**
305
* Save document with annotations
306
* @param annotationStorage - Annotation storage to include
307
* @param filename - Filename for saved document
308
* @param options - Save options
309
* @returns Promise resolving to saved document bytes
310
*/
311
saveDocument(annotationStorage?: AnnotationStorage, filename?: string, options?: any): Promise<Uint8Array>;
312
313
/**
314
* Get cached page number for reference
315
* @param ref - Page reference object
316
* @returns Cached page number or null
317
*/
318
cachedPageNumber(ref: RefProxy): number | null;
319
}
320
```
321
322
**Usage Examples:**
323
324
```javascript
325
import { getDocument } from "pdfjs-dist";
326
327
// Load and inspect document
328
const pdf = await getDocument("document.pdf").promise;
329
330
console.log(`Document has ${pdf.numPages} pages`);
331
332
// Get metadata
333
const metadata = await pdf.getMetadata();
334
console.log("Title:", metadata.info.Title);
335
console.log("Author:", metadata.info.Author);
336
337
// Get first page
338
const page = await pdf.getPage(1);
339
340
// Get outline
341
const outline = await pdf.getOutline();
342
if (outline) {
343
console.log("Document has bookmarks");
344
}
345
346
// Check permissions
347
const permissions = await pdf.getPermissions();
348
const canPrint = permissions.includes(4); // PRINT permission
349
```
350
351
### Data Range Transport
352
353
Custom transport mechanism for handling byte-range requests, useful for streaming large documents or custom data sources.
354
355
```javascript { .api }
356
class PDFDataRangeTransport {
357
/**
358
* Constructor for custom data range transport
359
* @param length - Total data length
360
* @param initialData - Initial chunk of data
361
* @param progressiveDone - Whether progressive loading is complete
362
* @param contentDispositionFilename - Suggested filename
363
*/
364
constructor(
365
length: number,
366
initialData: Uint8Array,
367
progressiveDone?: boolean,
368
contentDispositionFilename?: string
369
);
370
371
/**
372
* Request a specific data range
373
* @param begin - Start byte position
374
* @param end - End byte position
375
*/
376
requestDataRange(begin: number, end: number): void;
377
378
/**
379
* Abort all pending requests
380
* @param reason - Abort reason
381
*/
382
abort(reason?: any): void;
383
}
384
```
385
386
### Document Build Information
387
388
Version and build information for the PDF.js library.
389
390
```javascript { .api }
391
const build: {
392
version: string;
393
date: string;
394
};
395
396
const version: string;
397
```
398
399
**Usage Examples:**
400
401
```javascript
402
import { build, version } from "pdfjs-dist";
403
404
console.log(`PDF.js version: ${version}`);
405
console.log(`Build date: ${build.date}`);
406
```
407
408
## Error Handling
409
410
```javascript { .api }
411
class InvalidPDFException extends Error {
412
constructor(msg: string);
413
}
414
415
class MissingPDFException extends Error {
416
constructor(msg: string);
417
}
418
419
class PasswordException extends Error {
420
constructor(msg: string, code: number);
421
}
422
423
class UnexpectedResponseException extends Error {
424
constructor(msg: string, status: number);
425
}
426
```
427
428
Common error scenarios:
429
- Invalid PDF files throw `InvalidPDFException`
430
- Missing or network-inaccessible files throw `MissingPDFException`
431
- Password-protected documents throw `PasswordException`
432
- HTTP errors throw `UnexpectedResponseException`