0
# Exceptions
1
2
Exception hierarchy for handling various error conditions in Tika operations, including document encryption, unsupported formats, corruption detection, and processing limitations.
3
4
## Capabilities
5
6
### Base Exception Classes
7
8
#### TikaException
9
10
Base exception class for all Tika-specific errors providing common error handling patterns and cause chaining.
11
12
```java { .api }
13
/**
14
* Base exception for all Tika-related errors and exceptional conditions
15
*/
16
public class TikaException extends Exception {
17
/**
18
* Creates TikaException with error message
19
* @param message Description of the error condition
20
*/
21
public TikaException(String message);
22
23
/**
24
* Creates TikaException with error message and cause
25
* @param message Description of the error condition
26
* @param cause Underlying exception that caused this error
27
*/
28
public TikaException(String message, Throwable cause);
29
30
/**
31
* Creates TikaException with cause only
32
* @param cause Underlying exception that caused this error
33
*/
34
public TikaException(Throwable cause);
35
36
/**
37
* Gets the error message including cause information
38
* @return String containing detailed error description
39
*/
40
@Override
41
public String getMessage();
42
}
43
```
44
45
### Document Security Exceptions
46
47
#### EncryptedDocumentException
48
49
Exception thrown when attempting to process password-protected or encrypted documents without proper credentials.
50
51
```java { .api }
52
/**
53
* Exception for encrypted or password-protected documents
54
*/
55
public class EncryptedDocumentException extends TikaException {
56
/**
57
* Creates exception for encrypted document
58
* @param message Description of encryption issue
59
*/
60
public EncryptedDocumentException(String message);
61
62
/**
63
* Creates exception with message and underlying cause
64
* @param message Description of encryption issue
65
* @param cause Underlying exception from encryption handling
66
*/
67
public EncryptedDocumentException(String message, Throwable cause);
68
69
/**
70
* Creates exception from underlying encryption error
71
* @param cause Underlying exception from encryption system
72
*/
73
public EncryptedDocumentException(Throwable cause);
74
}
75
```
76
77
#### AccessPermissionException
78
79
Exception for documents that deny access permissions for specific operations like text extraction or printing.
80
81
```java { .api }
82
/**
83
* Exception for documents with access permission restrictions
84
*/
85
public class AccessPermissionException extends TikaException {
86
/**
87
* Creates exception for access permission denial
88
* @param message Description of permission restriction
89
*/
90
public AccessPermissionException(String message);
91
92
/**
93
* Creates exception with message and underlying cause
94
* @param message Description of permission restriction
95
* @param cause Underlying exception from permission system
96
*/
97
public AccessPermissionException(String message, Throwable cause);
98
99
/**
100
* Gets the specific permission that was denied
101
* @return String describing denied permission (e.g., "text extraction", "printing")
102
*/
103
public String getDeniedPermission();
104
}
105
```
106
107
### Format and Corruption Exceptions
108
109
#### UnsupportedFormatException
110
111
Exception thrown when encountering document formats that are not supported by available parsers.
112
113
```java { .api }
114
/**
115
* Exception for unsupported or unrecognized document formats
116
*/
117
public class UnsupportedFormatException extends TikaException {
118
/**
119
* Creates exception for unsupported format
120
* @param message Description of format issue
121
*/
122
public UnsupportedFormatException(String message);
123
124
/**
125
* Creates exception with detected media type
126
* @param mediaType MediaType that was detected but not supported
127
*/
128
public UnsupportedFormatException(MediaType mediaType);
129
130
/**
131
* Creates exception with message and underlying cause
132
* @param message Description of format issue
133
* @param cause Underlying exception from format handling
134
*/
135
public UnsupportedFormatException(String message, Throwable cause);
136
137
/**
138
* Gets the media type that caused the exception
139
* @return MediaType that was unsupported, or null if unknown
140
*/
141
public MediaType getMediaType();
142
}
143
```
144
145
#### CorruptedFileException
146
147
Exception for documents that are corrupted, malformed, or contain invalid data structures.
148
149
```java { .api }
150
/**
151
* Exception for corrupted, malformed, or invalid documents
152
*/
153
public class CorruptedFileException extends TikaException {
154
/**
155
* Creates exception for corrupted file
156
* @param message Description of corruption detected
157
*/
158
public CorruptedFileException(String message);
159
160
/**
161
* Creates exception with message and underlying cause
162
* @param message Description of corruption detected
163
* @param cause Underlying exception from corruption detection
164
*/
165
public CorruptedFileException(String message, Throwable cause);
166
167
/**
168
* Creates exception from underlying parsing error indicating corruption
169
* @param cause Underlying exception from parser
170
*/
171
public CorruptedFileException(Throwable cause);
172
173
/**
174
* Gets the type of corruption detected
175
* @return String describing corruption type (e.g., "invalid header", "truncated")
176
*/
177
public String getCorruptionType();
178
}
179
```
180
181
#### ZeroByteFileException
182
183
Exception for files that contain no data or are completely empty.
184
185
```java { .api }
186
/**
187
* Exception for empty files with zero bytes of content
188
*/
189
public class ZeroByteFileException extends TikaException {
190
/**
191
* Creates exception for zero-byte file
192
* @param message Description of empty file condition
193
*/
194
public ZeroByteFileException(String message);
195
196
/**
197
* Creates exception with file path information
198
* @param message Description of empty file condition
199
* @param filePath Path to the empty file
200
*/
201
public ZeroByteFileException(String message, String filePath);
202
203
/**
204
* Gets the path of the empty file
205
* @return String containing file path, or null if unknown
206
*/
207
public String getFilePath();
208
}
209
```
210
211
### Processing Limit Exceptions
212
213
#### WriteLimitReachedException
214
215
Exception thrown when output size exceeds configured limits during content extraction or conversion.
216
217
```java { .api }
218
/**
219
* Exception for exceeding configured write limits during processing
220
*/
221
public class WriteLimitReachedException extends TikaException {
222
/**
223
* Creates exception for write limit exceeded
224
* @param message Description of limit violation
225
*/
226
public WriteLimitReachedException(String message);
227
228
/**
229
* Creates exception with limit information
230
* @param message Description of limit violation
231
* @param limit Maximum allowed size in characters/bytes
232
*/
233
public WriteLimitReachedException(String message, long limit);
234
235
/**
236
* Creates exception with limit and actual size
237
* @param message Description of limit violation
238
* @param limit Maximum allowed size
239
* @param actualSize Actual size that exceeded limit
240
*/
241
public WriteLimitReachedException(String message, long limit, long actualSize);
242
243
/**
244
* Gets the configured write limit
245
* @return Maximum allowed size in characters/bytes
246
*/
247
public long getLimit();
248
249
/**
250
* Gets the actual size that exceeded the limit
251
* @return Actual size processed when limit was exceeded
252
*/
253
public long getActualSize();
254
}
255
```
256
257
### Embedded Document Exceptions
258
259
#### EmbeddedDocumentExtractorException
260
261
Exception for errors occurring during extraction and processing of embedded documents.
262
263
```java { .api }
264
/**
265
* Exception for errors in embedded document extraction and processing
266
*/
267
public class EmbeddedDocumentExtractorException extends TikaException {
268
/**
269
* Creates exception for embedded document error
270
* @param message Description of extraction issue
271
*/
272
public EmbeddedDocumentExtractorException(String message);
273
274
/**
275
* Creates exception with message and underlying cause
276
* @param message Description of extraction issue
277
* @param cause Underlying exception from embedded document processing
278
*/
279
public EmbeddedDocumentExtractorException(String message, Throwable cause);
280
281
/**
282
* Creates exception with embedded document information
283
* @param message Description of extraction issue
284
* @param embeddedName Name or identifier of embedded document
285
* @param cause Underlying exception
286
*/
287
public EmbeddedDocumentExtractorException(String message, String embeddedName, Throwable cause);
288
289
/**
290
* Gets the name of the embedded document that caused the error
291
* @return String containing embedded document name/identifier
292
*/
293
public String getEmbeddedDocumentName();
294
}
295
```
296
297
### Configuration Exceptions
298
299
#### TikaConfigException
300
301
Exception for configuration-related errors during Tika setup and initialization.
302
303
```java { .api }
304
/**
305
* Exception for configuration errors during Tika setup and initialization
306
*/
307
public class TikaConfigException extends TikaException {
308
/**
309
* Creates exception for configuration error
310
* @param message Description of configuration issue
311
*/
312
public TikaConfigException(String message);
313
314
/**
315
* Creates exception with message and underlying cause
316
* @param message Description of configuration issue
317
* @param cause Underlying exception from configuration processing
318
*/
319
public TikaConfigException(String message, Throwable cause);
320
321
/**
322
* Creates exception from underlying configuration error
323
* @param cause Underlying exception from configuration system
324
*/
325
public TikaConfigException(Throwable cause);
326
327
/**
328
* Gets the configuration parameter that caused the error
329
* @return String containing parameter name, or null if not parameter-specific
330
*/
331
public String getConfigParameter();
332
}
333
```
334
335
### Parser-Specific Exceptions
336
337
#### UnsupportedOperationException
338
339
Exception for operations that are not supported by specific parser implementations.
340
341
```java { .api }
342
/**
343
* Exception for unsupported operations in parser implementations
344
*/
345
public class UnsupportedOperationException extends TikaException {
346
/**
347
* Creates exception for unsupported operation
348
* @param message Description of unsupported operation
349
*/
350
public UnsupportedOperationException(String message);
351
352
/**
353
* Creates exception with operation and parser information
354
* @param operation Description of attempted operation
355
* @param parserClass Class of parser that doesn't support operation
356
*/
357
public UnsupportedOperationException(String operation, Class<?> parserClass);
358
359
/**
360
* Gets the operation that was not supported
361
* @return String describing attempted operation
362
*/
363
public String getOperation();
364
365
/**
366
* Gets the parser class that doesn't support the operation
367
* @return Class of parser implementation
368
*/
369
public Class<?> getParserClass();
370
}
371
```
372
373
## Exception Handling Patterns
374
375
### Basic Exception Handling
376
377
```java { .api }
378
// Basic parsing with exception handling
379
try {
380
AutoDetectParser parser = new AutoDetectParser();
381
BodyContentHandler handler = new BodyContentHandler();
382
Metadata metadata = new Metadata();
383
384
parser.parse(inputStream, handler, metadata, new ParseContext());
385
String content = handler.toString();
386
387
} catch (EncryptedDocumentException e) {
388
System.err.println("Document is password protected: " + e.getMessage());
389
390
} catch (UnsupportedFormatException e) {
391
System.err.println("Unsupported format: " + e.getMediaType());
392
393
} catch (CorruptedFileException e) {
394
System.err.println("Corrupted file detected: " + e.getCorruptionType());
395
396
} catch (WriteLimitReachedException e) {
397
System.err.println("Output size exceeded limit: " + e.getLimit() + " bytes");
398
399
} catch (TikaException e) {
400
System.err.println("General Tika error: " + e.getMessage());
401
402
} catch (IOException e) {
403
System.err.println("I/O error: " + e.getMessage());
404
405
} catch (SAXException e) {
406
System.err.println("XML parsing error: " + e.getMessage());
407
}
408
```
409
410
### Comprehensive Error Analysis
411
412
```java { .api }
413
public class DocumentProcessor {
414
415
public ProcessingResult processDocument(InputStream input, String filename) {
416
ProcessingResult result = new ProcessingResult();
417
418
try {
419
AutoDetectParser parser = new AutoDetectParser();
420
BodyContentHandler handler = new BodyContentHandler(1000000); // 1MB limit
421
Metadata metadata = new Metadata();
422
metadata.set(Metadata.RESOURCE_NAME_KEY, filename);
423
424
parser.parse(input, handler, metadata, new ParseContext());
425
426
result.setContent(handler.toString());
427
result.setMetadata(metadata);
428
result.setSuccess(true);
429
430
if (handler.isWriteLimitReached()) {
431
result.addWarning("Content truncated due to size limit");
432
}
433
434
} catch (EncryptedDocumentException e) {
435
result.setError("ENCRYPTED", "Document requires password: " + e.getMessage());
436
437
} catch (UnsupportedFormatException e) {
438
MediaType type = e.getMediaType();
439
result.setError("UNSUPPORTED_FORMAT",
440
"Format not supported: " + (type != null ? type.toString() : "unknown"));
441
442
} catch (CorruptedFileException e) {
443
result.setError("CORRUPTED", "File corruption detected: " + e.getCorruptionType());
444
445
} catch (ZeroByteFileException e) {
446
result.setError("EMPTY", "File is empty: " + e.getFilePath());
447
448
} catch (AccessPermissionException e) {
449
result.setError("PERMISSION_DENIED",
450
"Access denied for: " + e.getDeniedPermission());
451
452
} catch (WriteLimitReachedException e) {
453
result.setError("SIZE_LIMIT",
454
String.format("Size limit exceeded: %d > %d", e.getActualSize(), e.getLimit()));
455
456
} catch (EmbeddedDocumentExtractorException e) {
457
result.setError("EMBEDDED_ERROR",
458
"Embedded document error: " + e.getEmbeddedDocumentName());
459
460
} catch (TikaException e) {
461
result.setError("TIKA_ERROR", "Processing error: " + e.getMessage());
462
463
} catch (Exception e) {
464
result.setError("UNKNOWN", "Unexpected error: " + e.getMessage());
465
}
466
467
return result;
468
}
469
}
470
```
471
472
### Exception Recovery Strategies
473
474
```java { .api }
475
public class RobustDocumentParser {
476
477
public String extractText(InputStream input) throws TikaException {
478
// Try with full parser first
479
try {
480
return parseWithFullFeatures(input);
481
482
} catch (EncryptedDocumentException e) {
483
// Try common passwords
484
for (String password : getCommonPasswords()) {
485
try {
486
return parseWithPassword(input, password);
487
} catch (Exception ignored) {
488
// Continue trying other passwords
489
}
490
}
491
throw e; // Re-throw if no password worked
492
493
} catch (CorruptedFileException e) {
494
// Try lenient parsing mode
495
return parseWithLenientMode(input);
496
497
} catch (WriteLimitReachedException e) {
498
// Return partial content with warning
499
return e.getMessage() + "\n[Content truncated]";
500
501
} catch (UnsupportedFormatException e) {
502
// Try text extraction fallback
503
return tryTextExtractionFallback(input);
504
}
505
}
506
507
private String parseWithFullFeatures(InputStream input) throws Exception {
508
AutoDetectParser parser = new AutoDetectParser();
509
BodyContentHandler handler = new BodyContentHandler();
510
parser.parse(input, handler, new Metadata(), new ParseContext());
511
return handler.toString();
512
}
513
514
private String parseWithLenientMode(InputStream input) throws Exception {
515
// Implementation with lenient parsing settings
516
ParseContext context = new ParseContext();
517
// Add lenient configuration to context
518
return parseWithFullFeatures(input);
519
}
520
}
521
```
522
523
### Custom Exception Creation
524
525
```java { .api }
526
// Creating custom exceptions extending Tika hierarchy
527
public class CustomFormatException extends UnsupportedFormatException {
528
private final String formatVersion;
529
530
public CustomFormatException(String message, String formatVersion) {
531
super(message);
532
this.formatVersion = formatVersion;
533
}
534
535
public String getFormatVersion() {
536
return formatVersion;
537
}
538
}
539
540
// Usage in custom parser
541
public class CustomParser extends AbstractParser {
542
543
@Override
544
public void parse(InputStream stream, ContentHandler handler,
545
Metadata metadata, ParseContext context)
546
throws IOException, SAXException, TikaException {
547
548
try {
549
String version = detectFormatVersion(stream);
550
551
if (!isSupportedVersion(version)) {
552
throw new CustomFormatException(
553
"Unsupported format version: " + version, version);
554
}
555
556
// Continue with parsing...
557
558
} catch (IOException e) {
559
if (isCorruptionError(e)) {
560
throw new CorruptedFileException("Invalid file structure", e);
561
}
562
throw e; // Re-throw other I/O errors
563
}
564
}
565
}
566
```