0
# Core OCR Engine
1
2
The TessBaseAPI class provides the primary interface for optical character recognition operations. It handles engine initialization, image processing, text recognition, and result extraction with comprehensive configuration options.
3
4
## Capabilities
5
6
### Engine Initialization
7
8
Set up the Tesseract OCR engine with language models and configuration parameters.
9
10
```java { .api }
11
public class TessBaseAPI {
12
// Constructor
13
public TessBaseAPI();
14
15
// Version Information
16
public static String Version();
17
18
// Initialization Methods
19
public int Init(String datapath, String language, int oem);
20
public int Init(String datapath, String language);
21
public void InitForAnalysePage();
22
23
// Cleanup
24
public void End();
25
}
26
```
27
28
**Init Parameters:**
29
- `datapath`: Path to tessdata directory (null for default location)
30
- `language`: ISO 639-3 language code (e.g., "eng", "fra", "deu")
31
- `oem`: OCR Engine Mode (OEM_LSTM_ONLY recommended)
32
33
**Return Values:**
34
- `0`: Success
35
- `-1`: Initialization failed
36
37
#### Usage Example
38
39
```java
40
TessBaseAPI api = new TessBaseAPI();
41
42
// Initialize with English language and LSTM engine
43
int result = api.Init(null, "eng", OEM_LSTM_ONLY);
44
if (result != 0) {
45
System.err.println("Tesseract initialization failed");
46
return;
47
}
48
49
// Use API for OCR operations...
50
51
// Always cleanup when done
52
api.End();
53
```
54
55
### Image Input Methods
56
57
Provide images to the OCR engine from various sources and formats.
58
59
```java { .api }
60
public class TessBaseAPI {
61
// Set image from Leptonica PIX object
62
public void SetImage(PIX pix);
63
64
// Set image from raw byte array
65
public void SetImage(byte[] imagedata, int width, int height,
66
int bytes_per_pixel, int bytes_per_line);
67
68
// Set rectangular region of interest
69
public void SetRectangle(int left, int top, int width, int height);
70
71
// Input image management
72
public void SetInputImage(PIX pix);
73
public PIX GetInputImage();
74
public void SetInputName(String name);
75
public String GetInputName();
76
77
// Output configuration
78
public void SetOutputName(String name);
79
80
// Resolution metadata
81
public void SetSourceResolution(int ppi);
82
public int GetSourceYResolution();
83
}
84
```
85
86
**Image Format Support:**
87
- **bytes_per_pixel**: 1 (grayscale), 3 (RGB), 4 (RGBA)
88
- **bytes_per_line**: Row stride including padding
89
- **Supported formats**: PNG, JPEG, TIFF, BMP, GIF (via Leptonica)
90
91
#### Usage Example
92
93
```java
94
// Method 1: Using Leptonica (recommended)
95
PIX image = pixRead("/path/to/image.png");
96
api.SetImage(image);
97
98
// Method 2: Using raw byte data
99
byte[] imageData = loadImageBytes();
100
api.SetImage(imageData, width, height, 3, width * 3);
101
102
// Method 3: Process only part of the image
103
api.SetImage(image);
104
api.SetRectangle(100, 50, 300, 200); // x, y, width, height
105
```
106
107
### Text Recognition
108
109
Perform OCR recognition and extract text results in various formats.
110
111
```java { .api }
112
public class TessBaseAPI {
113
// Full recognition process
114
public int Recognize(ETEXT_DESC monitor);
115
116
// Simple rectangle OCR
117
public String TesseractRect(byte[] imagedata, int bytes_per_pixel,
118
int bytes_per_line, int left, int top,
119
int width, int height);
120
121
// Text extraction methods
122
public String GetUTF8Text();
123
public String GetHOCRText(int page_number);
124
public String GetAltoText(int page_number);
125
public String GetTSVText(int page_number);
126
public String GetBoxText(int page_number);
127
public String GetUNLVText();
128
}
129
```
130
131
**Output Formats:**
132
- **UTF8**: Plain text with line breaks
133
- **hOCR**: HTML with word coordinates and confidence
134
- **ALTO**: XML document structure standard
135
- **TSV**: Tab-separated values with coordinates
136
- **Box**: Character coordinates for training
137
138
#### Usage Example
139
140
```java
141
// Basic text extraction
142
api.SetImage(image);
143
String text = api.GetUTF8Text();
144
System.out.println("Extracted text: " + text);
145
146
// Advanced recognition with monitoring
147
ETEXT_DESC monitor = new ETEXT_DESC();
148
monitor.set_deadline_msecs(10000); // 10 second timeout
149
150
int result = api.Recognize(monitor);
151
if (result == 0) {
152
String text = api.GetUTF8Text();
153
String hocr = api.GetHOCRText(0);
154
}
155
156
// Simple one-call OCR for rectangular region
157
String rectText = api.TesseractRect(imageBytes, 3, width * 3,
158
100, 50, 300, 200);
159
```
160
161
### Confidence and Quality Metrics
162
163
Access recognition confidence scores and quality metrics.
164
165
```java { .api }
166
public class TessBaseAPI {
167
// Overall confidence
168
public int MeanTextConf();
169
170
// Word-level confidence scores
171
public int[] AllWordConfidences();
172
}
173
```
174
175
**Confidence Values:**
176
- **Range**: 0-100 (higher values indicate better confidence)
177
- **Interpretation**:
178
- 90-100: Excellent recognition
179
- 70-89: Good recognition
180
- 50-69: Fair recognition
181
- 0-49: Poor recognition
182
183
#### Usage Example
184
185
```java
186
api.SetImage(image);
187
BytePointer textPtr = api.GetUTF8Text();
188
String text = textPtr.getString();
189
textPtr.deallocate();
190
191
// Check overall confidence
192
int meanConf = api.MeanTextConf();
193
System.out.println("Average confidence: " + meanConf + "%");
194
195
// Get per-word confidence scores
196
int[] wordConfidences = api.AllWordConfidences();
197
for (int i = 0; i < wordConfidences.length; i++) {
198
System.out.println("Word " + i + " confidence: " + wordConfidences[i] + "%");
199
}
200
```
201
202
### Image Processing
203
204
Access processed images and thresholding results.
205
206
```java { .api }
207
public class TessBaseAPI {
208
// Get processed binary image
209
public PIX GetThresholdedImage();
210
211
// Datapath information
212
public String GetDatapath();
213
}
214
```
215
216
#### Usage Example
217
218
```java
219
api.SetImage(originalImage);
220
221
// Get the binary/thresholded image used for OCR
222
PIX thresholded = api.GetThresholdedImage();
223
pixWrite("/tmp/thresholded.png", thresholded, IFF_PNG);
224
225
// Cleanup
226
pixDestroy(thresholded);
227
```
228
229
### Batch Processing
230
231
Process multiple pages or documents efficiently.
232
233
```java { .api }
234
public class TessBaseAPI {
235
// Process multiple pages with renderer pipeline
236
public boolean ProcessPages(String filename, String retry_config,
237
int timeout_millisec, TessResultRenderer renderer);
238
239
// Process single page with renderer
240
public boolean ProcessPage(PIX pix, int page_index, String filename,
241
String retry_config, int timeout_millisec,
242
TessResultRenderer renderer);
243
244
// Clear previous results
245
public void Clear();
246
}
247
```
248
249
#### Usage Example
250
251
```java
252
// Setup renderer chain for multiple output formats
253
TessResultRenderer textRenderer = TessTextRendererCreate("output");
254
TessResultRenderer pdfRenderer = TessPDFRendererCreate("output", "/usr/share/tessdata", false);
255
textRenderer.insert(pdfRenderer);
256
257
// Process multi-page document
258
boolean success = api.ProcessPages("document.pdf", null, 60000, textRenderer);
259
260
if (success) {
261
System.out.println("Document processed successfully");
262
// Output files: output.txt, output.pdf
263
}
264
265
// Cleanup renderers
266
TessDeleteResultRenderer(textRenderer);
267
```
268
269
## Error Handling
270
271
### Common Error Conditions
272
273
- **Initialization Failure**: Invalid tessdata path or missing language files
274
- **Image Loading**: Unsupported format or corrupted image data
275
- **Memory Issues**: Large images or insufficient system memory
276
- **Timeout**: Recognition takes longer than specified deadline
277
278
### Best Practices
279
280
```java
281
public class RobustOCR {
282
public static String extractText(String imagePath) {
283
TessBaseAPI api = new TessBaseAPI();
284
PIX image = null;
285
String result = null;
286
287
try {
288
// Initialize with error checking
289
if (api.Init(null, "eng") != 0) {
290
throw new RuntimeException("Tesseract initialization failed");
291
}
292
293
// Load image with validation
294
image = pixRead(imagePath);
295
if (image == null) {
296
throw new RuntimeException("Failed to load image: " + imagePath);
297
}
298
299
// Set image and extract text
300
api.SetImage(image);
301
result = api.GetUTF8Text();
302
303
} finally {
304
// Always cleanup resources
305
if (image != null) {
306
pixDestroy(image);
307
}
308
api.End();
309
}
310
311
return result;
312
}
313
}
314
```
315
316
## Types
317
318
### Progress Monitoring
319
320
```java { .api }
321
public class ETEXT_DESC {
322
public short progress(); // Progress 0-100
323
public boolean more_to_come(); // More work pending
324
public boolean ocr_alive(); // Engine is active
325
public byte err_code(); // Error code if failed
326
public void set_deadline_msecs(int deadline_msecs);
327
public boolean deadline_exceeded();
328
}
329
```
330
331
### Version Information
332
333
```java { .api }
334
// Tesseract version constants
335
public static final int TESSERACT_MAJOR_VERSION = 5;
336
public static final int TESSERACT_MINOR_VERSION = 5;
337
public static final int TESSERACT_MICRO_VERSION = 1;
338
public static final String TESSERACT_VERSION_STR = "5.5.1";
339
```