CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/maven-org-bytedeco--tesseract

JavaCPP Presets for Tesseract - Java wrapper library providing JNI bindings to the native Tesseract OCR library version 5.5.1, enabling optical character recognition capabilities in Java applications

Pending
Overview
Eval results
Files

configuration.mddocs/

Configuration and Parameters

Extensive configuration system providing fine-grained control over OCR behavior including page segmentation modes, OCR engine modes, variable settings, language management, and performance tuning options.

Capabilities

Page Segmentation Modes

Control how Tesseract analyzes page layout and identifies text regions.

// Page Segmentation Mode Constants
public static final int PSM_OSD_ONLY = 0;              // Orientation and script detection only
public static final int PSM_AUTO_OSD = 1;              // Automatic page segmentation with OSD
public static final int PSM_AUTO_ONLY = 2;             // Automatic page segmentation, no OSD  
public static final int PSM_AUTO = 3;                  // Fully automatic page segmentation (default)
public static final int PSM_SINGLE_COLUMN = 4;         // Single column of text
public static final int PSM_SINGLE_BLOCK_VERT_TEXT = 5; // Single uniform block of vertical text
public static final int PSM_SINGLE_BLOCK = 6;          // Single uniform block of text
public static final int PSM_SINGLE_LINE = 7;           // Single text line
public static final int PSM_SINGLE_WORD = 8;           // Single word
public static final int PSM_CIRCLE_WORD = 9;           // Single word in a circle
public static final int PSM_SINGLE_CHAR = 10;          // Single character
public static final int PSM_SPARSE_TEXT = 11;          // Sparse text in no particular order
public static final int PSM_SPARSE_TEXT_OSD = 12;      // Sparse text with OSD
public static final int PSM_RAW_LINE = 13;             // Raw line, bypass word detection

/**
 * Set page segmentation mode
 * @param mode PSM constant (PSM_AUTO, PSM_SINGLE_BLOCK, etc.)
 */
public void SetPageSegMode(int mode);

/**
 * Get current page segmentation mode
 * @return Current PSM mode
 */
public int GetPageSegMode();

Page Segmentation Example:

import static org.bytedeco.tesseract.global.tesseract.*;

TessBaseAPI api = new TessBaseAPI();
api.Init(null, "eng");

// Configure for single line of text (faster, more accurate for simple cases)
api.SetPageSegMode(PSM_SINGLE_LINE);

// Configure for automatic layout detection (good for complex documents)
api.SetPageSegMode(PSM_AUTO);

// Configure for single word (useful for form fields)
api.SetPageSegMode(PSM_SINGLE_WORD);

PIX image = pixRead("single-line.png");
api.SetImage(image);
BytePointer text = api.GetUTF8Text();
System.out.println("Text: " + text.getString());

text.deallocate();
pixDestroy(image);
api.End();

OCR Engine Modes

Select the OCR engine and neural network configuration.

// OCR Engine Mode Constants  
public static final int OEM_TESSERACT_ONLY = 0;           // Legacy Tesseract only (deprecated)
public static final int OEM_LSTM_ONLY = 1;                // LSTM neural network only (recommended)
public static final int OEM_TESSERACT_LSTM_COMBINED = 2;  // Combined legacy + LSTM (deprecated)
public static final int OEM_DEFAULT = 3;                  // Default (auto-detect best available)

/**
 * Initialize with specific OCR engine mode
 * @param datapath Path to tessdata directory
 * @param language Language code
 * @param oem OCR Engine Mode
 * @return 0 on success, -1 on failure
 */
public int Init(String datapath, String language, int oem);

Engine Mode Selection Example:

// Use LSTM-only engine for best accuracy (recommended)
if (api.Init(null, "eng", OEM_LSTM_ONLY) != 0) {
    System.err.println("Could not initialize with LSTM engine");
}

// Use default engine (auto-detect)
if (api.Init(null, "eng", OEM_DEFAULT) != 0) {
    System.err.println("Could not initialize with default engine");
}

// Check available engines programmatically
// (engine availability depends on installed tessdata files)

Page Iterator Level Constants

Control iterator navigation granularity for result analysis.

// Page Iterator Level Constants
public static final int RIL_BLOCK = 0;           // Block of text/image/separator line
public static final int RIL_PARA = 1;            // Paragraph within a block
public static final int RIL_TEXTLINE = 2;        // Line within a paragraph
public static final int RIL_WORD = 3;            // Word within a textline
public static final int RIL_SYMBOL = 4;          // Symbol/character within a word

Block Type Constants

Identify different types of page layout elements during analysis.

// Block Type Constants (PolyBlockType)
public static final int PT_UNKNOWN = 0;               // Type is not yet known
public static final int PT_FLOWING_TEXT = 1;          // Text that lives inside a column
public static final int PT_HEADING_TEXT = 2;          // Text that spans more than one column
public static final int PT_PULLOUT_TEXT = 3;          // Text in a cross-column pull-out region
public static final int PT_EQUATION = 4;              // Partition belonging to an equation region
public static final int PT_INLINE_EQUATION = 5;       // Partition has inline equation
public static final int PT_TABLE = 6;                 // Partition belonging to a table region
public static final int PT_VERTICAL_TEXT = 7;         // Text-line runs vertically
public static final int PT_CAPTION_TEXT = 8;          // Text that belongs to an image
public static final int PT_FLOWING_IMAGE = 9;         // Image that lives inside a column
public static final int PT_HEADING_IMAGE = 10;        // Image that spans more than one column
public static final int PT_PULLOUT_IMAGE = 11;        // Image in a cross-column pull-out region
public static final int PT_HORZ_LINE = 12;            // Horizontal Line
public static final int PT_VERT_LINE = 13;            // Vertical Line
public static final int PT_NOISE = 14;                // Lies outside of any column
public static final int PT_COUNT = 15;                // Total number of block types

Orientation and Direction Constants

Document orientation, writing direction, and text line ordering.

// Page Orientation Constants
public static final int ORIENTATION_PAGE_UP = 0;      // Normal upright page
public static final int ORIENTATION_PAGE_RIGHT = 1;   // Page rotated 90° clockwise
public static final int ORIENTATION_PAGE_DOWN = 2;    // Page rotated 180°
public static final int ORIENTATION_PAGE_LEFT = 3;    // Page rotated 90° counter-clockwise

// Writing Direction Constants  
public static final int WRITING_DIRECTION_LEFT_TO_RIGHT = 0;  // Left-to-right text (Latin, etc.)
public static final int WRITING_DIRECTION_RIGHT_TO_LEFT = 1;  // Right-to-left text (Arabic, Hebrew)
public static final int WRITING_DIRECTION_TOP_TO_BOTTOM = 2;  // Top-to-bottom text (Chinese, etc.)

// Text Line Order Constants
public static final int TEXTLINE_ORDER_LEFT_TO_RIGHT = 0;     // Lines ordered left-to-right
public static final int TEXTLINE_ORDER_RIGHT_TO_LEFT = 1;     // Lines ordered right-to-left  
public static final int TEXTLINE_ORDER_TOP_TO_BOTTOM = 2;     // Lines ordered top-to-bottom

// Text Justification Constants
public static final int JUSTIFICATION_UNKNOWN = 0;            // Justification not determined
public static final int JUSTIFICATION_LEFT = 1;               // Left-justified text
public static final int JUSTIFICATION_CENTER = 2;             // Center-justified text
public static final int JUSTIFICATION_RIGHT = 3;              // Right-justified text

// Script Direction Constants
public static final int DIR_NEUTRAL = 0;                      // Text contains only neutral characters
public static final int DIR_LEFT_TO_RIGHT = 1;                // No right-to-left characters
public static final int DIR_RIGHT_TO_LEFT = 2;                // No left-to-right characters  
public static final int DIR_MIX = 3;                          // Mixed left-to-right and right-to-left

Variable Configuration

Fine-tune OCR behavior using Tesseract's extensive variable system.

/**
 * Set configuration variable
 * @param name Variable name
 * @param value Variable value as string
 * @return true if variable was set successfully
 */
public boolean SetVariable(String name, String value);

/**
 * Set debug-specific variable
 * @param name Debug variable name  
 * @param value Variable value as string
 * @return true if variable was set successfully
 */
public boolean SetDebugVariable(String name, String value);

/**
 * Get integer variable value
 * @param name Variable name
 * @param value Output: variable value
 * @return true if variable exists
 */
public boolean GetIntVariable(String name, IntPointer value);

/**
 * Get boolean variable value
 * @param name Variable name
 * @param value Output: variable value
 * @return true if variable exists
 */
public boolean GetBoolVariable(String name, BoolPointer value);

/**
 * Get double variable value
 * @param name Variable name  
 * @param value Output: variable value
 * @return true if variable exists
 */
public boolean GetDoubleVariable(String name, DoublePointer value);

/**
 * Get string variable value
 * @param name Variable name
 * @return Variable value or null if not found
 */
public String GetStringVariable(String name);

Variable Configuration Examples:

TessBaseAPI api = new TessBaseAPI();
api.Init(null, "eng");

// Character blacklist (ignore specific characters)
api.SetVariable("tessedit_char_blacklist", "xyz");

// Character whitelist (only recognize specific characters)
api.SetVariable("tessedit_char_whitelist", "0123456789");

// Numeric-only mode
api.SetVariable("classify_bln_numeric_mode", "1");

// Minimum word confidence threshold
api.SetVariable("tessedit_reject_mode", "2");

// Preserve spaces in output
api.SetVariable("preserve_interword_spaces", "1");

// Enable/disable dictionary checking
api.SetVariable("load_system_dawg", "0");     // Disable system dictionary
api.SetVariable("load_freq_dawg", "0");       // Disable frequency dictionary
api.SetVariable("load_unambig_dawg", "0");    // Disable unambiguous dictionary

// Performance tuning
api.SetVariable("tessedit_pageseg_mode", "6"); // Alternative to SetPageSegMode()
api.SetVariable("textord_min_linesize", "2.5"); // Minimum line size

// Debug output
api.SetDebugVariable("tessedit_write_images", "1");  // Save debug images
api.SetDebugVariable("textord_debug_tabfind", "1");  // Debug table finding

// Process image with custom configuration
PIX image = pixRead("numbers-only.png");
api.SetImage(image);
BytePointer text = api.GetUTF8Text();
System.out.println("Numbers: " + text.getString());

// Check current variable values
IntPointer numericMode = new IntPointer(1);
if (api.GetIntVariable("classify_bln_numeric_mode", numericMode)) {
    System.out.println("Numeric mode: " + numericMode.get());
}

text.deallocate();
pixDestroy(image);
api.End();

Language Management

Multi-language support and language detection configuration.

/**
 * Get initialized languages as string
 * @return Comma-separated list of initialized languages
 */
public String GetInitLanguagesAsString();

/**
 * Get loaded languages into vector
 * @param langs Output vector to populate with language codes
 */
public void GetLoadedLanguagesAsVector(StringVector langs);

/**
 * Get available languages into vector  
 * @param langs Output vector to populate with available language codes
 */
public void GetAvailableLanguagesAsVector(StringVector langs);

Multi-language Examples:

// Initialize with multiple languages
TessBaseAPI api = new TessBaseAPI();
if (api.Init(null, "eng+fra+deu") != 0) {  // English + French + German
    System.err.println("Could not initialize multi-language");
}

// Check what languages are loaded
String loadedLangs = api.GetInitLanguagesAsString();
System.out.println("Loaded languages: " + loadedLangs);

// Get available languages
StringVector availableLangs = new StringVector();
api.GetAvailableLanguagesAsVector(availableLangs);
System.out.println("Available languages:");
for (int i = 0; i < availableLangs.size(); i++) {
    System.out.println("  " + availableLangs.get(i).getString());
}

// Process multilingual document
PIX image = pixRead("multilingual-doc.png");
api.SetImage(image);
BytePointer text = api.GetUTF8Text();
System.out.println("Multilingual text: " + text.getString());

text.deallocate();
pixDestroy(image);
api.End();

Testing and Utility Functions

Helper functions for testing configuration modes and engine capabilities.

// PSM Testing Functions
public static boolean PSM_OSD_ENABLED(int pageseg_mode);        // Test if OSD enabled
public static boolean PSM_ORIENTATION_ENABLED(int mode);        // Test if orientation detection enabled
public static boolean PSM_COL_FIND_ENABLED(int mode);          // Test if column finding enabled  
public static boolean PSM_SPARSE(int mode);                    // Test if sparse mode
public static boolean PSM_BLOCK_FIND_ENABLED(int mode);        // Test if block finding enabled
public static boolean PSM_LINE_FIND_ENABLED(int mode);         // Test if line finding enabled
public static boolean PSM_WORD_FIND_ENABLED(int mode);         // Test if word finding enabled

// PolyBlock Type Testing Functions
public static boolean PTIsLineType(int type);                  // Test if PolyBlockType is line
public static boolean PTIsImageType(int type);                 // Test if PolyBlockType is image
public static boolean PTIsTextType(int type);                  // Test if PolyBlockType is text
public static boolean PTIsPulloutType(int type);               // Test if PolyBlockType is pullout

Configuration Testing Examples:

import static org.bytedeco.tesseract.global.tesseract.*;

// Test if a PSM mode supports specific features
int psm = PSM_AUTO;

if (PSM_OSD_ENABLED(psm)) {
    System.out.println("Orientation and script detection enabled");
}

if (PSM_WORD_FIND_ENABLED(psm)) {
    System.out.println("Word-level analysis enabled");
}

if (PSM_SPARSE(psm)) {
    System.out.println("Sparse text mode enabled");
}

// Test block types during iteration
PageIterator pi = api.AnalyseLayout();
pi.Begin();
do {
    int blockType = pi.BlockType();
    
    if (PTIsTextType(blockType)) {
        System.out.println("Found text block");
    } else if (PTIsImageType(blockType)) {
        System.out.println("Found image block");
    } else if (PTIsLineType(blockType)) {
        System.out.println("Found line block");
    }
} while (pi.Next(RIL_BLOCK));

Common Configuration Patterns

Forms and Structured Documents

// Optimize for form processing
api.SetPageSegMode(PSM_SINGLE_BLOCK);
api.SetVariable("tessedit_char_whitelist", "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789");
api.SetVariable("preserve_interword_spaces", "1");

Numbers and Codes

// Optimize for numeric content
api.SetPageSegMode(PSM_SINGLE_LINE);
api.SetVariable("classify_bln_numeric_mode", "1");
api.SetVariable("tessedit_char_whitelist", "0123456789.-");

Poor Quality Images

// Settings for low-quality scans
api.SetVariable("tessedit_reject_mode", "0");  // Don't reject low-confidence words
api.SetVariable("textord_min_linesize", "1.0"); // Accept smaller text
api.SetVariable("edges_max_children_per_outline", "50"); // More edge detection

High Accuracy Mode

// Maximum accuracy (slower processing)
api.SetPageSegMode(PSM_AUTO_OSD);  // Full layout analysis with orientation detection
api.SetVariable("tessedit_enable_dict_correction", "1");  // Dictionary correction
api.SetVariable("classify_enable_learning", "1");         // Enable learning
api.SetVariable("classify_enable_adaptive_matcher", "1"); // Adaptive matching

Performance Optimization

// Faster processing (lower accuracy)
api.SetPageSegMode(PSM_SINGLE_BLOCK);
api.SetVariable("load_system_dawg", "0");      // Skip dictionary loading
api.SetVariable("load_freq_dawg", "0");
api.SetVariable("tessedit_enable_dict_correction", "0");
api.SetVariable("classify_enable_learning", "0");

Advanced Configuration Variables

Text Detection and Layout

  • textord_min_linesize - Minimum line size threshold
  • textord_max_noise_size - Maximum noise blob size
  • edges_max_children_per_outline - Edge detection sensitivity
  • textord_debug_tabfind - Table detection debugging

Character Recognition

  • classify_bln_numeric_mode - Numeric-only recognition mode
  • classify_enable_learning - Enable adaptive learning
  • classify_enable_adaptive_matcher - Use adaptive matching
  • tessedit_enable_dict_correction - Dictionary-based correction

Output Control

  • tessedit_char_blacklist - Characters to ignore
  • tessedit_char_whitelist - Only recognize these characters
  • preserve_interword_spaces - Maintain spacing in output
  • tessedit_reject_mode - Word rejection strategy

Debug and Development

  • tessedit_write_images - Save intermediate processing images
  • tessedit_dump_pageseg_images - Save page segmentation debug images
  • classify_debug_level - Character classification debug level

Install with Tessl CLI

npx tessl i tessl/maven-org-bytedeco--tesseract

docs

basic-ocr.md

configuration.md

data-structures.md

index.md

iterators.md

renderers.md

tile.json