0
# Tesseract Platform
1
2
JavaCPP platform aggregator for Tesseract OCR native libraries, providing comprehensive optical character recognition capabilities in Java applications. This package bundles cross-platform native libraries for Tesseract 5.5.1, enabling text extraction from images across Linux, macOS, Windows, and Android platforms.
3
4
## Package Information
5
6
- **Package Name**: tesseract-platform
7
- **Package Type**: Maven
8
- **Group ID**: org.bytedeco
9
- **Language**: Java
10
- **Installation**: `org.bytedeco:tesseract-platform:5.5.1-1.5.12`
11
12
## Core Imports
13
14
```java
15
import org.bytedeco.javacpp.*;
16
import org.bytedeco.tesseract.*;
17
import org.bytedeco.leptonica.*;
18
import static org.bytedeco.tesseract.global.tesseract.*;
19
import static org.bytedeco.leptonica.global.leptonica.*;
20
```
21
22
## Basic Usage
23
24
```java
25
import org.bytedeco.javacpp.*;
26
import org.bytedeco.leptonica.*;
27
import org.bytedeco.tesseract.*;
28
import static org.bytedeco.leptonica.global.leptonica.*;
29
import static org.bytedeco.tesseract.global.tesseract.*;
30
31
public class BasicOCR {
32
public static void main(String[] args) {
33
TessBaseAPI api = new TessBaseAPI();
34
35
// Initialize tesseract with English language
36
if (api.Init(null, "eng") != 0) {
37
System.err.println("Could not initialize tesseract.");
38
return;
39
}
40
41
// Load image using Leptonica
42
PIX image = pixRead("image.png");
43
api.SetImage(image);
44
45
// Extract text
46
BytePointer outText = api.GetUTF8Text();
47
System.out.println("OCR Result: " + outText.getString());
48
49
// Cleanup
50
api.End();
51
outText.deallocate();
52
image.close();
53
}
54
}
55
```
56
57
## Architecture
58
59
The Tesseract platform provides a comprehensive OCR solution built on the JavaCPP framework:
60
61
- **TessBaseAPI**: Main OCR engine interface providing initialization, configuration, and text extraction
62
- **Iterator Hierarchy**: Structured navigation through recognition results (PageIterator → LTRResultIterator → ResultIterator)
63
- **Renderer Pipeline**: Multiple output format generators for text, HTML, PDF, XML, and training data
64
- **Native Integration**: Seamless integration with Leptonica image processing library
65
- **Cross-Platform**: Platform-specific native libraries automatically loaded at runtime
66
67
## Capabilities
68
69
### Core OCR Engine
70
71
Primary OCR functionality including initialization, image processing, text recognition, and result extraction. The TessBaseAPI class serves as the main entry point for all OCR operations.
72
73
```java { .api }
74
public class TessBaseAPI {
75
// Initialization
76
public TessBaseAPI();
77
public static native @Cast("const char*") BytePointer Version();
78
public int Init(String datapath, String language, int oem);
79
public int Init(String datapath, String language);
80
public void End();
81
82
// Image Processing
83
public void SetImage(PIX pix);
84
public void SetImage(byte[] imagedata, int width, int height, int bytes_per_pixel, int bytes_per_line);
85
public void SetRectangle(int left, int top, int width, int height);
86
public PIX GetThresholdedImage();
87
88
// Recognition
89
public int Recognize(ETEXT_DESC monitor);
90
public native @Cast("char*") BytePointer TesseractRect(@Cast("const unsigned char*") byte[] imagedata, int bytes_per_pixel, int bytes_per_line,
91
int left, int top, int width, int height);
92
93
// Text Output
94
public native @Cast("char*") BytePointer GetUTF8Text();
95
public native @Cast("char*") BytePointer GetHOCRText(int page_number);
96
public native @Cast("char*") BytePointer GetTSVText(int page_number);
97
public int MeanTextConf();
98
public int[] AllWordConfidences();
99
}
100
```
101
102
[Core OCR Engine](./core-ocr-engine.md)
103
104
### Result Navigation
105
106
Hierarchical iterators for navigating recognition results from page level down to individual characters. Provides access to bounding boxes, confidence scores, text formatting, and layout information.
107
108
```java { .api }
109
public class PageIterator {
110
public void Begin();
111
public boolean Next(int level);
112
public boolean BoundingBox(int level, int[] left, int[] top, int[] right, int[] bottom);
113
public boolean Baseline(int level, int[] x1, int[] y1, int[] x2, int[] y2);
114
public PIX GetBinaryImage(int level);
115
public int BlockType();
116
public void Orientation(int[] orientation, int[] writing_direction,
117
int[] textline_order, float[] deskew_angle);
118
}
119
120
public class ResultIterator extends LTRResultIterator {
121
public String GetUTF8Text(int level);
122
public float Confidence(int level);
123
public boolean ParagraphIsLtr();
124
public String WordFontAttributes(boolean[] is_bold, boolean[] is_italic,
125
boolean[] is_underlined, boolean[] is_monospace,
126
boolean[] is_serif, boolean[] is_smallcaps,
127
int[] pointsize, int[] font_id);
128
}
129
```
130
131
[Result Navigation](./result-navigation.md)
132
133
### Output Renderers
134
135
Configurable pipeline for generating output in multiple formats including plain text, structured markup (hOCR, ALTO, PAGE), searchable PDF, and training data formats.
136
137
```java { .api }
138
public abstract class TessResultRenderer {
139
public void insert(TessResultRenderer next);
140
public boolean BeginDocument(String title);
141
public boolean AddImage(TessBaseAPI api);
142
public boolean EndDocument();
143
public String file_extension();
144
}
145
146
// Concrete renderer classes
147
public class TessTextRenderer extends TessResultRenderer;
148
public class TessHOcrRenderer extends TessResultRenderer;
149
public class TessPDFRenderer extends TessResultRenderer;
150
public class TessAltoRenderer extends TessResultRenderer;
151
public class TessTsvRenderer extends TessResultRenderer;
152
```
153
154
[Output Renderers](./output-renderers.md)
155
156
### Layout Analysis
157
158
Advanced page structure analysis including text block detection, reading order determination, and geometric layout information. Supports complex document layouts with tables, columns, and mixed content.
159
160
```java { .api }
161
public class TessBaseAPI {
162
public PageIterator AnalyseLayout();
163
public BOXA GetRegions(PIXA[] pixa);
164
public BOXA GetTextlines(PIXA[] pixa, int[][] blockids);
165
public BOXA GetWords(PIXA[] pixa);
166
public BOXA GetComponentImages(int level, boolean text_only, PIXA[] pixa, int[][] blockids);
167
}
168
169
// Layout analysis constants
170
public static final int PSM_AUTO = 3; // Fully automatic page segmentation
171
public static final int PSM_SINGLE_COLUMN = 4; // Single column of text
172
public static final int PSM_SINGLE_BLOCK = 6; // Single uniform block of text
173
public static final int PSM_SINGLE_LINE = 7; // Single text line
174
```
175
176
[Layout Analysis](./layout-analysis.md)
177
178
### Configuration and Parameters
179
180
Comprehensive configuration system with hundreds of parameters controlling OCR behavior, page segmentation, character recognition, and output formatting.
181
182
```java { .api }
183
public class TessBaseAPI {
184
// Parameter Management
185
public boolean SetVariable(String name, String value);
186
public boolean GetIntVariable(String name, int[] value);
187
public boolean GetBoolVariable(String name, boolean[] value);
188
public boolean GetDoubleVariable(String name, double[] value);
189
public String GetStringVariable(String name);
190
191
// Page Segmentation
192
public void SetPageSegMode(int mode);
193
public int GetPageSegMode();
194
195
// OCR Engine Mode
196
public static final int OEM_TESSERACT_ONLY = 0;
197
public static final int OEM_LSTM_ONLY = 1;
198
public static final int OEM_DEFAULT = 3;
199
}
200
```
201
202
[Configuration](./configuration.md)
203
204
### Language Support
205
206
Multi-language OCR with support for 100+ languages, custom language models, and language detection capabilities.
207
208
```java { .api }
209
public class TessBaseAPI {
210
public String GetInitLanguagesAsString();
211
public void GetLoadedLanguagesAsVector(StringVector langs);
212
public void GetAvailableLanguagesAsVector(StringVector langs);
213
}
214
215
// Language initialization examples:
216
// "eng" - English
217
// "fra" - French
218
// "deu" - German
219
// "chi_sim" - Simplified Chinese
220
// "ara" - Arabic
221
// "eng+fra+deu" - Multiple languages
222
```
223
224
[Language Support](./language-support.md)
225
226
## Types
227
228
### Core Data Structures
229
230
```java { .api }
231
// Progress monitoring and cancellation
232
public class ETEXT_DESC {
233
public short progress(); // Progress percentage (0-100)
234
public boolean more_to_come(); // More processing pending
235
public boolean ocr_alive(); // OCR engine active
236
public byte err_code(); // Error code
237
public void set_deadline_msecs(int deadline_msecs);
238
public boolean deadline_exceeded();
239
}
240
241
// Unicode character handling
242
public class UNICHAR {
243
public UNICHAR(String utf8_str, int len);
244
public UNICHAR(int unicode);
245
public int first_uni(); // Get first character as UCS-4
246
public int utf8_len(); // Get UTF-8 byte length
247
public String utf8_str(); // Get UTF-8 string
248
public static int[] UTF8ToUTF32(String utf8_str);
249
public static String UTF32ToUTF8(int[] str32);
250
}
251
252
// Collection types
253
public class StringVector {
254
public StringVector();
255
public long size();
256
public String get(long i);
257
public StringVector put(long i, String value);
258
public StringVector push_back(String value);
259
public void clear();
260
}
261
```
262
263
### Iterator Level Constants
264
265
```java { .api }
266
// Page hierarchy levels for iteration
267
public static final int RIL_BLOCK = 0; // Block level
268
public static final int RIL_PARA = 1; // Paragraph level
269
public static final int RIL_TEXTLINE = 2; // Text line level
270
public static final int RIL_WORD = 3; // Word level
271
public static final int RIL_SYMBOL = 4; // Character/symbol level
272
```
273
274
### Block Type Constants
275
276
```java { .api }
277
// Layout block types
278
public static final int PT_UNKNOWN = 0; // Unknown block type
279
public static final int PT_FLOWING_TEXT = 1; // Flowing text
280
public static final int PT_HEADING_TEXT = 2; // Heading text
281
public static final int PT_PULLOUT_TEXT = 3; // Pull-out text
282
public static final int PT_EQUATION = 4; // Mathematical equation
283
public static final int PT_TABLE = 6; // Table
284
public static final int PT_VERTICAL_TEXT = 7; // Vertical text
285
public static final int PT_CAPTION_TEXT = 8; // Caption text
286
public static final int PT_FLOWING_IMAGE = 9; // Flowing image
287
public static final int PT_NOISE = 14; // Noise/artifacts
288
```