CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/maven-org-bytedeco--tesseract-platform

JavaCPP platform aggregator for Tesseract OCR native libraries providing cross-platform OCR capabilities in Java applications

Pending
Overview
Eval results
Files

core-ocr-engine.mddocs/

Core OCR Engine

The TessBaseAPI class provides the primary interface for optical character recognition operations. It handles engine initialization, image processing, text recognition, and result extraction with comprehensive configuration options.

Capabilities

Engine Initialization

Set up the Tesseract OCR engine with language models and configuration parameters.

public class TessBaseAPI {
    // Constructor
    public TessBaseAPI();
    
    // Version Information
    public static String Version();
    
    // Initialization Methods
    public int Init(String datapath, String language, int oem);
    public int Init(String datapath, String language);
    public void InitForAnalysePage();
    
    // Cleanup
    public void End();
}

Init Parameters:

  • datapath: Path to tessdata directory (null for default location)
  • language: ISO 639-3 language code (e.g., "eng", "fra", "deu")
  • oem: OCR Engine Mode (OEM_LSTM_ONLY recommended)

Return Values:

  • 0: Success
  • -1: Initialization failed

Usage Example

TessBaseAPI api = new TessBaseAPI();

// Initialize with English language and LSTM engine
int result = api.Init(null, "eng", OEM_LSTM_ONLY);
if (result != 0) {
    System.err.println("Tesseract initialization failed");
    return;
}

// Use API for OCR operations...

// Always cleanup when done
api.End();

Image Input Methods

Provide images to the OCR engine from various sources and formats.

public class TessBaseAPI {
    // Set image from Leptonica PIX object
    public void SetImage(PIX pix);
    
    // Set image from raw byte array
    public void SetImage(byte[] imagedata, int width, int height, 
                        int bytes_per_pixel, int bytes_per_line);
    
    // Set rectangular region of interest  
    public void SetRectangle(int left, int top, int width, int height);
    
    // Input image management
    public void SetInputImage(PIX pix);
    public PIX GetInputImage();
    public void SetInputName(String name);
    public String GetInputName();
    
    // Output configuration
    public void SetOutputName(String name);
    
    // Resolution metadata
    public void SetSourceResolution(int ppi);
    public int GetSourceYResolution();
}

Image Format Support:

  • bytes_per_pixel: 1 (grayscale), 3 (RGB), 4 (RGBA)
  • bytes_per_line: Row stride including padding
  • Supported formats: PNG, JPEG, TIFF, BMP, GIF (via Leptonica)

Usage Example

// Method 1: Using Leptonica (recommended)
PIX image = pixRead("/path/to/image.png");
api.SetImage(image);

// Method 2: Using raw byte data
byte[] imageData = loadImageBytes();
api.SetImage(imageData, width, height, 3, width * 3);

// Method 3: Process only part of the image
api.SetImage(image);
api.SetRectangle(100, 50, 300, 200); // x, y, width, height

Text Recognition

Perform OCR recognition and extract text results in various formats.

public class TessBaseAPI {
    // Full recognition process
    public int Recognize(ETEXT_DESC monitor);
    
    // Simple rectangle OCR
    public String TesseractRect(byte[] imagedata, int bytes_per_pixel, 
                              int bytes_per_line, int left, int top, 
                              int width, int height);
    
    // Text extraction methods
    public String GetUTF8Text();
    public String GetHOCRText(int page_number);
    public String GetAltoText(int page_number);
    public String GetTSVText(int page_number);
    public String GetBoxText(int page_number);
    public String GetUNLVText();
}

Output Formats:

  • UTF8: Plain text with line breaks
  • hOCR: HTML with word coordinates and confidence
  • ALTO: XML document structure standard
  • TSV: Tab-separated values with coordinates
  • Box: Character coordinates for training

Usage Example

// Basic text extraction
api.SetImage(image);
String text = api.GetUTF8Text();
System.out.println("Extracted text: " + text);

// Advanced recognition with monitoring
ETEXT_DESC monitor = new ETEXT_DESC();
monitor.set_deadline_msecs(10000); // 10 second timeout

int result = api.Recognize(monitor);
if (result == 0) {
    String text = api.GetUTF8Text();
    String hocr = api.GetHOCRText(0);
}

// Simple one-call OCR for rectangular region
String rectText = api.TesseractRect(imageBytes, 3, width * 3, 
                                   100, 50, 300, 200);

Confidence and Quality Metrics

Access recognition confidence scores and quality metrics.

public class TessBaseAPI {
    // Overall confidence
    public int MeanTextConf();
    
    // Word-level confidence scores
    public int[] AllWordConfidences();
}

Confidence Values:

  • Range: 0-100 (higher values indicate better confidence)
  • Interpretation:
    • 90-100: Excellent recognition
    • 70-89: Good recognition
    • 50-69: Fair recognition
    • 0-49: Poor recognition

Usage Example

api.SetImage(image);
BytePointer textPtr = api.GetUTF8Text();
String text = textPtr.getString();
textPtr.deallocate();

// Check overall confidence
int meanConf = api.MeanTextConf();
System.out.println("Average confidence: " + meanConf + "%");

// Get per-word confidence scores
int[] wordConfidences = api.AllWordConfidences();
for (int i = 0; i < wordConfidences.length; i++) {
    System.out.println("Word " + i + " confidence: " + wordConfidences[i] + "%");
}

Image Processing

Access processed images and thresholding results.

public class TessBaseAPI {
    // Get processed binary image
    public PIX GetThresholdedImage();
    
    // Datapath information
    public String GetDatapath();
}

Usage Example

api.SetImage(originalImage);

// Get the binary/thresholded image used for OCR
PIX thresholded = api.GetThresholdedImage();
pixWrite("/tmp/thresholded.png", thresholded, IFF_PNG);

// Cleanup
pixDestroy(thresholded);

Batch Processing

Process multiple pages or documents efficiently.

public class TessBaseAPI {
    // Process multiple pages with renderer pipeline
    public boolean ProcessPages(String filename, String retry_config, 
                               int timeout_millisec, TessResultRenderer renderer);
    
    // Process single page with renderer
    public boolean ProcessPage(PIX pix, int page_index, String filename, 
                              String retry_config, int timeout_millisec, 
                              TessResultRenderer renderer);
    
    // Clear previous results
    public void Clear();
}

Usage Example

// Setup renderer chain for multiple output formats
TessResultRenderer textRenderer = TessTextRendererCreate("output");
TessResultRenderer pdfRenderer = TessPDFRendererCreate("output", "/usr/share/tessdata", false);
textRenderer.insert(pdfRenderer);

// Process multi-page document
boolean success = api.ProcessPages("document.pdf", null, 60000, textRenderer);

if (success) {
    System.out.println("Document processed successfully");
    // Output files: output.txt, output.pdf
}

// Cleanup renderers
TessDeleteResultRenderer(textRenderer);

Error Handling

Common Error Conditions

  • Initialization Failure: Invalid tessdata path or missing language files
  • Image Loading: Unsupported format or corrupted image data
  • Memory Issues: Large images or insufficient system memory
  • Timeout: Recognition takes longer than specified deadline

Best Practices

public class RobustOCR {
    public static String extractText(String imagePath) {
        TessBaseAPI api = new TessBaseAPI();
        PIX image = null;
        String result = null;
        
        try {
            // Initialize with error checking
            if (api.Init(null, "eng") != 0) {
                throw new RuntimeException("Tesseract initialization failed");
            }
            
            // Load image with validation
            image = pixRead(imagePath);
            if (image == null) {
                throw new RuntimeException("Failed to load image: " + imagePath);
            }
            
            // Set image and extract text
            api.SetImage(image);
            result = api.GetUTF8Text();
            
        } finally {
            // Always cleanup resources
            if (image != null) {
                pixDestroy(image);
            }
            api.End();
        }
        
        return result;
    }
}

Types

Progress Monitoring

public class ETEXT_DESC {
    public short progress();           // Progress 0-100
    public boolean more_to_come();     // More work pending
    public boolean ocr_alive();        // Engine is active
    public byte err_code();            // Error code if failed
    public void set_deadline_msecs(int deadline_msecs);
    public boolean deadline_exceeded();
}

Version Information

// Tesseract version constants
public static final int TESSERACT_MAJOR_VERSION = 5;
public static final int TESSERACT_MINOR_VERSION = 5;
public static final int TESSERACT_MICRO_VERSION = 1;
public static final String TESSERACT_VERSION_STR = "5.5.1";

Install with Tessl CLI

npx tessl i tessl/maven-org-bytedeco--tesseract-platform

docs

configuration.md

core-ocr-engine.md

index.md

language-support.md

layout-analysis.md

output-renderers.md

result-navigation.md

tile.json