or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

configuration.mdcore-ocr-engine.mdindex.mdlanguage-support.mdlayout-analysis.mdoutput-renderers.mdresult-navigation.md

index.mddocs/

0

# Tesseract Platform

1

2

JavaCPP platform aggregator for Tesseract OCR native libraries, providing comprehensive optical character recognition capabilities in Java applications. This package bundles cross-platform native libraries for Tesseract 5.5.1, enabling text extraction from images across Linux, macOS, Windows, and Android platforms.

3

4

## Package Information

5

6

- **Package Name**: tesseract-platform

7

- **Package Type**: Maven

8

- **Group ID**: org.bytedeco

9

- **Language**: Java

10

- **Installation**: `org.bytedeco:tesseract-platform:5.5.1-1.5.12`

11

12

## Core Imports

13

14

```java

15

import org.bytedeco.javacpp.*;

16

import org.bytedeco.tesseract.*;

17

import org.bytedeco.leptonica.*;

18

import static org.bytedeco.tesseract.global.tesseract.*;

19

import static org.bytedeco.leptonica.global.leptonica.*;

20

```

21

22

## Basic Usage

23

24

```java

25

import org.bytedeco.javacpp.*;

26

import org.bytedeco.leptonica.*;

27

import org.bytedeco.tesseract.*;

28

import static org.bytedeco.leptonica.global.leptonica.*;

29

import static org.bytedeco.tesseract.global.tesseract.*;

30

31

public class BasicOCR {

32

public static void main(String[] args) {

33

TessBaseAPI api = new TessBaseAPI();

34

35

// Initialize tesseract with English language

36

if (api.Init(null, "eng") != 0) {

37

System.err.println("Could not initialize tesseract.");

38

return;

39

}

40

41

// Load image using Leptonica

42

PIX image = pixRead("image.png");

43

api.SetImage(image);

44

45

// Extract text

46

BytePointer outText = api.GetUTF8Text();

47

System.out.println("OCR Result: " + outText.getString());

48

49

// Cleanup

50

api.End();

51

outText.deallocate();

52

image.close();

53

}

54

}

55

```

56

57

## Architecture

58

59

The Tesseract platform provides a comprehensive OCR solution built on the JavaCPP framework:

60

61

- **TessBaseAPI**: Main OCR engine interface providing initialization, configuration, and text extraction

62

- **Iterator Hierarchy**: Structured navigation through recognition results (PageIterator → LTRResultIterator → ResultIterator)

63

- **Renderer Pipeline**: Multiple output format generators for text, HTML, PDF, XML, and training data

64

- **Native Integration**: Seamless integration with Leptonica image processing library

65

- **Cross-Platform**: Platform-specific native libraries automatically loaded at runtime

66

67

## Capabilities

68

69

### Core OCR Engine

70

71

Primary OCR functionality including initialization, image processing, text recognition, and result extraction. The TessBaseAPI class serves as the main entry point for all OCR operations.

72

73

```java { .api }

74

public class TessBaseAPI {

75

// Initialization

76

public TessBaseAPI();

77

public static native @Cast("const char*") BytePointer Version();

78

public int Init(String datapath, String language, int oem);

79

public int Init(String datapath, String language);

80

public void End();

81

82

// Image Processing

83

public void SetImage(PIX pix);

84

public void SetImage(byte[] imagedata, int width, int height, int bytes_per_pixel, int bytes_per_line);

85

public void SetRectangle(int left, int top, int width, int height);

86

public PIX GetThresholdedImage();

87

88

// Recognition

89

public int Recognize(ETEXT_DESC monitor);

90

public native @Cast("char*") BytePointer TesseractRect(@Cast("const unsigned char*") byte[] imagedata, int bytes_per_pixel, int bytes_per_line,

91

int left, int top, int width, int height);

92

93

// Text Output

94

public native @Cast("char*") BytePointer GetUTF8Text();

95

public native @Cast("char*") BytePointer GetHOCRText(int page_number);

96

public native @Cast("char*") BytePointer GetTSVText(int page_number);

97

public int MeanTextConf();

98

public int[] AllWordConfidences();

99

}

100

```

101

102

[Core OCR Engine](./core-ocr-engine.md)

103

104

### Result Navigation

105

106

Hierarchical iterators for navigating recognition results from page level down to individual characters. Provides access to bounding boxes, confidence scores, text formatting, and layout information.

107

108

```java { .api }

109

public class PageIterator {

110

public void Begin();

111

public boolean Next(int level);

112

public boolean BoundingBox(int level, int[] left, int[] top, int[] right, int[] bottom);

113

public boolean Baseline(int level, int[] x1, int[] y1, int[] x2, int[] y2);

114

public PIX GetBinaryImage(int level);

115

public int BlockType();

116

public void Orientation(int[] orientation, int[] writing_direction,

117

int[] textline_order, float[] deskew_angle);

118

}

119

120

public class ResultIterator extends LTRResultIterator {

121

public String GetUTF8Text(int level);

122

public float Confidence(int level);

123

public boolean ParagraphIsLtr();

124

public String WordFontAttributes(boolean[] is_bold, boolean[] is_italic,

125

boolean[] is_underlined, boolean[] is_monospace,

126

boolean[] is_serif, boolean[] is_smallcaps,

127

int[] pointsize, int[] font_id);

128

}

129

```

130

131

[Result Navigation](./result-navigation.md)

132

133

### Output Renderers

134

135

Configurable pipeline for generating output in multiple formats including plain text, structured markup (hOCR, ALTO, PAGE), searchable PDF, and training data formats.

136

137

```java { .api }

138

public abstract class TessResultRenderer {

139

public void insert(TessResultRenderer next);

140

public boolean BeginDocument(String title);

141

public boolean AddImage(TessBaseAPI api);

142

public boolean EndDocument();

143

public String file_extension();

144

}

145

146

// Concrete renderer classes

147

public class TessTextRenderer extends TessResultRenderer;

148

public class TessHOcrRenderer extends TessResultRenderer;

149

public class TessPDFRenderer extends TessResultRenderer;

150

public class TessAltoRenderer extends TessResultRenderer;

151

public class TessTsvRenderer extends TessResultRenderer;

152

```

153

154

[Output Renderers](./output-renderers.md)

155

156

### Layout Analysis

157

158

Advanced page structure analysis including text block detection, reading order determination, and geometric layout information. Supports complex document layouts with tables, columns, and mixed content.

159

160

```java { .api }

161

public class TessBaseAPI {

162

public PageIterator AnalyseLayout();

163

public BOXA GetRegions(PIXA[] pixa);

164

public BOXA GetTextlines(PIXA[] pixa, int[][] blockids);

165

public BOXA GetWords(PIXA[] pixa);

166

public BOXA GetComponentImages(int level, boolean text_only, PIXA[] pixa, int[][] blockids);

167

}

168

169

// Layout analysis constants

170

public static final int PSM_AUTO = 3; // Fully automatic page segmentation

171

public static final int PSM_SINGLE_COLUMN = 4; // Single column of text

172

public static final int PSM_SINGLE_BLOCK = 6; // Single uniform block of text

173

public static final int PSM_SINGLE_LINE = 7; // Single text line

174

```

175

176

[Layout Analysis](./layout-analysis.md)

177

178

### Configuration and Parameters

179

180

Comprehensive configuration system with hundreds of parameters controlling OCR behavior, page segmentation, character recognition, and output formatting.

181

182

```java { .api }

183

public class TessBaseAPI {

184

// Parameter Management

185

public boolean SetVariable(String name, String value);

186

public boolean GetIntVariable(String name, int[] value);

187

public boolean GetBoolVariable(String name, boolean[] value);

188

public boolean GetDoubleVariable(String name, double[] value);

189

public String GetStringVariable(String name);

190

191

// Page Segmentation

192

public void SetPageSegMode(int mode);

193

public int GetPageSegMode();

194

195

// OCR Engine Mode

196

public static final int OEM_TESSERACT_ONLY = 0;

197

public static final int OEM_LSTM_ONLY = 1;

198

public static final int OEM_DEFAULT = 3;

199

}

200

```

201

202

[Configuration](./configuration.md)

203

204

### Language Support

205

206

Multi-language OCR with support for 100+ languages, custom language models, and language detection capabilities.

207

208

```java { .api }

209

public class TessBaseAPI {

210

public String GetInitLanguagesAsString();

211

public void GetLoadedLanguagesAsVector(StringVector langs);

212

public void GetAvailableLanguagesAsVector(StringVector langs);

213

}

214

215

// Language initialization examples:

216

// "eng" - English

217

// "fra" - French

218

// "deu" - German

219

// "chi_sim" - Simplified Chinese

220

// "ara" - Arabic

221

// "eng+fra+deu" - Multiple languages

222

```

223

224

[Language Support](./language-support.md)

225

226

## Types

227

228

### Core Data Structures

229

230

```java { .api }

231

// Progress monitoring and cancellation

232

public class ETEXT_DESC {

233

public short progress(); // Progress percentage (0-100)

234

public boolean more_to_come(); // More processing pending

235

public boolean ocr_alive(); // OCR engine active

236

public byte err_code(); // Error code

237

public void set_deadline_msecs(int deadline_msecs);

238

public boolean deadline_exceeded();

239

}

240

241

// Unicode character handling

242

public class UNICHAR {

243

public UNICHAR(String utf8_str, int len);

244

public UNICHAR(int unicode);

245

public int first_uni(); // Get first character as UCS-4

246

public int utf8_len(); // Get UTF-8 byte length

247

public String utf8_str(); // Get UTF-8 string

248

public static int[] UTF8ToUTF32(String utf8_str);

249

public static String UTF32ToUTF8(int[] str32);

250

}

251

252

// Collection types

253

public class StringVector {

254

public StringVector();

255

public long size();

256

public String get(long i);

257

public StringVector put(long i, String value);

258

public StringVector push_back(String value);

259

public void clear();

260

}

261

```

262

263

### Iterator Level Constants

264

265

```java { .api }

266

// Page hierarchy levels for iteration

267

public static final int RIL_BLOCK = 0; // Block level

268

public static final int RIL_PARA = 1; // Paragraph level

269

public static final int RIL_TEXTLINE = 2; // Text line level

270

public static final int RIL_WORD = 3; // Word level

271

public static final int RIL_SYMBOL = 4; // Character/symbol level

272

```

273

274

### Block Type Constants

275

276

```java { .api }

277

// Layout block types

278

public static final int PT_UNKNOWN = 0; // Unknown block type

279

public static final int PT_FLOWING_TEXT = 1; // Flowing text

280

public static final int PT_HEADING_TEXT = 2; // Heading text

281

public static final int PT_PULLOUT_TEXT = 3; // Pull-out text

282

public static final int PT_EQUATION = 4; // Mathematical equation

283

public static final int PT_TABLE = 6; // Table

284

public static final int PT_VERTICAL_TEXT = 7; // Vertical text

285

public static final int PT_CAPTION_TEXT = 8; // Caption text

286

public static final int PT_FLOWING_IMAGE = 9; // Flowing image

287

public static final int PT_NOISE = 14; // Noise/artifacts

288

```