or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

configuration.mdcore-ocr-engine.mdindex.mdlanguage-support.mdlayout-analysis.mdoutput-renderers.mdresult-navigation.md

output-renderers.mddocs/

0

# Output Renderers

1

2

Configurable pipeline system for generating OCR results in multiple output formats including plain text, structured markup (hOCR, ALTO, PAGE), searchable PDF, and specialized training data formats. Renderers can be chained together to produce multiple outputs simultaneously.

3

4

## Capabilities

5

6

### Renderer Base Class

7

8

All output renderers inherit from the TessResultRenderer base class which provides common functionality for document processing and output generation.

9

10

```java { .api }

11

public abstract class TessResultRenderer {

12

// Renderer chaining

13

public void insert(TessResultRenderer next);

14

public TessResultRenderer next();

15

16

// Document processing lifecycle

17

public boolean BeginDocument(String title);

18

public boolean AddImage(TessBaseAPI api);

19

public boolean EndDocument();

20

21

// Renderer properties

22

public String file_extension();

23

public String title();

24

public boolean happy(); // Check if renderer is in good state

25

public int imagenum(); // Get current image number

26

}

27

```

28

29

#### Usage Pattern

30

31

```java

32

// Create renderer chain

33

TessResultRenderer textRenderer = new TessTextRenderer("output");

34

TessResultRenderer pdfRenderer = new TessPDFRenderer("output", "/usr/share/tessdata", false);

35

textRenderer.insert(pdfRenderer);

36

37

// Process document

38

if (textRenderer.BeginDocument("Document Title")) {

39

textRenderer.AddImage(api); // Add each page/image

40

textRenderer.EndDocument();

41

}

42

```

43

44

### Text Output Renderers

45

46

Generate plain text and formatted text outputs from OCR results.

47

48

```java { .api }

49

// Plain text output

50

public class TessTextRenderer extends TessResultRenderer {

51

public TessTextRenderer(String outputbase);

52

}

53

54

// Tab-separated values with coordinates and confidence

55

public class TessTsvRenderer extends TessResultRenderer {

56

public TessTsvRenderer(String outputbase);

57

}

58

59

// UNLV format for research and evaluation

60

public class TessUnlvRenderer extends TessResultRenderer {

61

public TessUnlvRenderer(String outputbase);

62

}

63

```

64

65

**Output Formats:**

66

- **Text (.txt)**: Plain UTF-8 text with line breaks

67

- **TSV (.tsv)**: Tab-separated values with level, page, block, paragraph, line, word, confidence, x, y, width, height, text columns

68

- **UNLV (.unlv)**: University of Nevada Las Vegas evaluation format

69

70

#### Usage Example

71

72

```java

73

// Generate plain text output

74

TessTextRenderer textRenderer = new TessTextRenderer("document");

75

76

api.SetImage(image);

77

if (textRenderer.BeginDocument("My Document")) {

78

textRenderer.AddImage(api);

79

textRenderer.EndDocument();

80

}

81

// Creates: document.txt

82

83

// Generate TSV with coordinates

84

TessTsvRenderer tsvRenderer = new TessTsvRenderer("analysis");

85

// Creates: analysis.tsv with detailed position data

86

```

87

88

### Structured Markup Renderers

89

90

Generate XML and HTML outputs with detailed structure and metadata.

91

92

```java { .api }

93

// hOCR HTML format with word coordinates

94

public class TessHOcrRenderer extends TessResultRenderer {

95

public TessHOcrRenderer(String outputbase);

96

public TessHOcrRenderer(String outputbase, boolean font_info);

97

}

98

99

// ALTO XML format (Library of Congress standard)

100

public class TessAltoRenderer extends TessResultRenderer {

101

public TessAltoRenderer(String outputbase);

102

}

103

104

// PAGE XML format (European digitization standard)

105

public class TessPAGERenderer extends TessResultRenderer {

106

public TessPAGERenderer(String outputbase);

107

}

108

```

109

110

**Format Descriptions:**

111

- **hOCR (.hocr)**: HTML with microformat annotations, word bounding boxes, confidence scores

112

- **ALTO (.xml)**: Library standard with detailed layout structure, fonts, styles

113

- **PAGE (.xml)**: European standard for digitized documents with regions, baselines, reading order

114

115

#### Usage Example

116

117

```java

118

// Generate hOCR with font information

119

TessHOcrRenderer hocrRenderer = new TessHOcrRenderer("webpage", true);

120

121

api.SetImage(image);

122

if (hocrRenderer.BeginDocument("Web Page Content")) {

123

hocrRenderer.AddImage(api);

124

hocrRenderer.EndDocument();

125

}

126

// Creates: webpage.hocr

127

128

// Sample hOCR structure:

129

// <div class='ocr_page' id='page_1' title='bbox 0 0 800 600'>

130

// <div class='ocr_par' id='par_1_1' title='bbox 50 50 750 100'>

131

// <span class='ocr_line' id='line_1_1' title='bbox 50 50 750 80'>

132

// <span class='ocrx_word' id='word_1_1' title='bbox 50 50 100 80; x_size 20; x_conf 95'>

133

// Hello

134

// </span>

135

// </span>

136

// </div>

137

// </div>

138

```

139

140

### PDF Renderer

141

142

Generate searchable PDF documents with selectable text overlay.

143

144

```java { .api }

145

public class TessPDFRenderer extends TessResultRenderer {

146

public TessPDFRenderer(String outputbase, String datadir);

147

public TessPDFRenderer(String outputbase, String datadir, boolean textonly);

148

}

149

```

150

151

**Parameters:**

152

- `outputbase`: Output filename prefix

153

- `datadir`: Path to tessdata directory for font information

154

- `textonly`: If true, create text-only PDF without original image

155

156

#### Usage Example

157

158

```java

159

// Generate searchable PDF with original image

160

TessPDFRenderer pdfRenderer = new TessPDFRenderer("document",

161

"/usr/share/tessdata",

162

false);

163

164

api.SetImage(image);

165

if (pdfRenderer.BeginDocument("Scanned Document")) {

166

pdfRenderer.AddImage(api);

167

pdfRenderer.EndDocument();

168

}

169

// Creates: document.pdf (searchable, with image background)

170

171

// Generate text-only PDF

172

TessPDFRenderer textPdf = new TessPDFRenderer("textonly",

173

"/usr/share/tessdata",

174

true);

175

// Creates PDF with only text, no background image

176

```

177

178

### Training Data Renderers

179

180

Generate specialized formats for training and improving OCR models.

181

182

```java { .api }

183

// Box files for training (character coordinates)

184

public class TessBoxTextRenderer extends TessResultRenderer {

185

public TessBoxTextRenderer(String outputbase);

186

}

187

188

// LSTM box files for neural network training

189

public class TessLSTMBoxRenderer extends TessResultRenderer {

190

public TessLSTMBoxRenderer(String outputbase);

191

}

192

193

// Word string box files

194

public class TessWordStrBoxRenderer extends TessResultRenderer {

195

public TessWordStrBoxRenderer(String outputbase);

196

}

197

198

// Orientation and script detection output

199

public class TessOsdRenderer extends TessResultRenderer {

200

public TessOsdRenderer(String outputbase);

201

}

202

```

203

204

**Training Formats:**

205

- **Box (.box)**: Character-level coordinates for traditional training

206

- **LSTM Box (.lstmbox)**: Optimized format for LSTM neural network training

207

- **Word String Box (.wordstrbox)**: Word-level training data

208

- **OSD (.osd)**: Orientation and script detection results

209

210

#### Usage Example

211

212

```java

213

// Generate training data

214

TessBoxTextRenderer boxRenderer = new TessBoxTextRenderer("training");

215

216

api.SetImage(trainingImage);

217

if (boxRenderer.BeginDocument("Training Data")) {

218

boxRenderer.AddImage(api);

219

boxRenderer.EndDocument();

220

}

221

// Creates: training.box

222

223

// Sample box format:

224

// H 50 750 75 780 0

225

// e 75 750 90 780 0

226

// l 90 750 95 780 0

227

// l 95 750 100 780 0

228

// o 100 750 120 780 0

229

// (char x1 y1 x2 y2 page)

230

```

231

232

### Renderer Chaining

233

234

Combine multiple renderers to generate multiple output formats simultaneously.

235

236

#### Usage Example

237

238

```java

239

// Create comprehensive output pipeline

240

TessTextRenderer textOut = new TessTextRenderer("document");

241

TessHOcrRenderer hocrOut = new TessHOcrRenderer("document", true);

242

TessPDFRenderer pdfOut = new TessPDFRenderer("document", "/usr/share/tessdata", false);

243

TessTsvRenderer tsvOut = new TessTsvRenderer("document");

244

245

// Chain renderers

246

textOut.insert(hocrOut);

247

hocrOut.insert(pdfOut);

248

pdfOut.insert(tsvOut);

249

250

// Process multiple images with all renderers

251

if (textOut.BeginDocument("Multi-page Document")) {

252

for (String imagePath : imageFiles) {

253

PIX image = pixRead(imagePath);

254

api.SetImage(image);

255

textOut.AddImage(api);

256

pixDestroy(image);

257

}

258

textOut.EndDocument();

259

}

260

261

// Creates:

262

// - document.txt (plain text)

263

// - document.hocr (HTML with coordinates)

264

// - document.pdf (searchable PDF)

265

// - document.tsv (tab-separated analysis)

266

```

267

268

### Batch Processing Integration

269

270

Combine renderers with TessBaseAPI batch processing methods.

271

272

```java { .api }

273

public class TessBaseAPI {

274

// Process multiple pages from file

275

public boolean ProcessPages(String filename, String retry_config,

276

int timeout_millisec, TessResultRenderer renderer);

277

278

// Process single page

279

public boolean ProcessPage(PIX pix, int page_index, String filename,

280

String retry_config, int timeout_millisec,

281

TessResultRenderer renderer);

282

}

283

```

284

285

#### Usage Example

286

287

```java

288

// Setup renderer chain

289

TessResultRenderer mainRenderer = new TessTextRenderer("output");

290

mainRenderer.insert(new TessPDFRenderer("output", "/usr/share/tessdata", false));

291

292

// Process multi-page TIFF or PDF

293

boolean success = api.ProcessPages("document.tiff",

294

null, // no retry config

295

60000, // 60 second timeout

296

mainRenderer);

297

298

if (success) {

299

System.out.println("Successfully processed document");

300

// Output: output.txt, output.pdf

301

} else {

302

System.err.println("Processing failed");

303

}

304

```

305

306

### Error Handling and Validation

307

308

Check renderer status and handle processing errors.

309

310

```java { .api }

311

public abstract class TessResultRenderer {

312

public boolean happy(); // Check if renderer is functional

313

public String title(); // Get document title

314

public int imagenum(); // Get current image number

315

}

316

```

317

318

#### Usage Example

319

320

```java

321

TessResultRenderer renderer = new TessPDFRenderer("output", "/usr/share/tessdata", false);

322

323

if (!renderer.happy()) {

324

System.err.println("Renderer initialization failed");

325

return;

326

}

327

328

if (renderer.BeginDocument("Test Document")) {

329

api.SetImage(image);

330

331

if (!renderer.AddImage(api)) {

332

System.err.println("Failed to add image " + renderer.imagenum());

333

}

334

335

if (!renderer.EndDocument()) {

336

System.err.println("Failed to finalize document");

337

}

338

} else {

339

System.err.println("Failed to begin document");

340

}

341

```

342

343

## C API Functions

344

345

Direct C API access for renderer management (alternative to object-oriented approach).

346

347

```java { .api }

348

// Renderer creation functions

349

public static TessResultRenderer TessTextRendererCreate(String outputbase);

350

public static TessResultRenderer TessHOcrRendererCreate(String outputbase);

351

public static TessResultRenderer TessAltoRendererCreate(String outputbase);

352

public static TessResultRenderer TessPAGERendererCreate(String outputbase);

353

public static TessResultRenderer TessTsvRendererCreate(String outputbase);

354

public static TessResultRenderer TessPDFRendererCreate(String outputbase, String datadir, boolean textonly);

355

public static TessResultRenderer TessUnlvRendererCreate(String outputbase);

356

public static TessResultRenderer TessBoxTextRendererCreate(String outputbase);

357

public static TessResultRenderer TessLSTMBoxRendererCreate(String outputbase);

358

public static TessResultRenderer TessWordStrBoxRendererCreate(String outputbase);

359

public static TessResultRenderer TessOsdRendererCreate(String outputbase);

360

361

// Renderer management

362

public static void TessDeleteResultRenderer(TessResultRenderer renderer);

363

public static boolean TessResultRendererBeginDocument(TessResultRenderer renderer, String title);

364

public static boolean TessResultRendererAddImage(TessResultRenderer renderer, TessBaseAPI api);

365

public static boolean TessResultRendererEndDocument(TessResultRenderer renderer);

366

```

367

368

#### Usage Example

369

370

```java

371

// C API style renderer usage

372

TessResultRenderer renderer = TessHOcrRendererCreate("webpage");

373

374

if (TessResultRendererBeginDocument(renderer, "Web Content")) {

375

api.SetImage(image);

376

TessResultRendererAddImage(renderer, api);

377

TessResultRendererEndDocument(renderer);

378

}

379

380

TessDeleteResultRenderer(renderer);

381

```

382

383

## Types

384

385

### Renderer Chain Structure

386

387

```java { .api }

388

// Base renderer interface

389

public abstract class TessResultRenderer {

390

// Internal renderer chain linkage (opaque)

391

// Each renderer maintains reference to next renderer in chain

392

}

393

```

394

395

### Output File Extensions

396

397

```java { .api }

398

// File extensions for different renderer types

399

// TessTextRenderer: ".txt"

400

// TessHOcrRenderer: ".hocr"

401

// TessAltoRenderer: ".xml"

402

// TessPAGERenderer: ".xml"

403

// TessTsvRenderer: ".tsv"

404

// TessPDFRenderer: ".pdf"

405

// TessUnlvRenderer: ".unlv"

406

// TessBoxTextRenderer: ".box"

407

// TessLSTMBoxRenderer: ".lstmbox"

408

// TessWordStrBoxRenderer: ".wordstrbox"

409

// TessOsdRenderer: ".osd"

410

```