or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

area-of-interest.mddomain-analysis.mdimage-analysis.mdimage-description.mdimage-tagging.mdindex.mdobject-detection.mdocr-text-recognition.mdthumbnail-generation.md

ocr-text-recognition.mddocs/

0

# OCR and Text Recognition

1

2

Extract text from images using both synchronous OCR for printed text and asynchronous Read API for comprehensive text recognition including handwritten text. The service supports multiple languages and provides detailed text layout information.

3

4

## Capabilities

5

6

### Synchronous OCR (Printed Text)

7

8

Immediate text extraction from images containing printed text with language detection and orientation analysis.

9

10

```python { .api }

11

def recognize_printed_text(detect_orientation, url, language=None, custom_headers=None, raw=False, **operation_config):

12

"""

13

Perform OCR on printed text in images.

14

15

Args:

16

detect_orientation (bool): Whether to detect and correct text orientation

17

url (str): Publicly reachable URL of an image

18

language (str, optional): OCR language code. If not specified, auto-detect is used.

19

Supported languages include: en, zh-Hans, zh-Hant, cs, da, nl, fi, fr, de,

20

el, hu, it, ja, ko, nb, pl, pt, ru, es, sv, tr, ar, ro, sr-Cyrl, sr-Latn, sk

21

custom_headers (dict, optional): Custom HTTP headers

22

raw (bool, optional): Return raw response. Default: False

23

24

Returns:

25

OcrResult: OCR results with text regions, lines, and words

26

27

Raises:

28

ComputerVisionOcrErrorException: OCR operation error

29

"""

30

31

def recognize_printed_text_in_stream(detect_orientation, image, language=None, custom_headers=None, raw=False, **operation_config):

32

"""

33

Perform OCR on printed text from binary stream.

34

35

Args:

36

detect_orientation (bool): Whether to detect text orientation

37

image (Generator): Binary image data stream

38

language (str, optional): OCR language code

39

40

Returns:

41

OcrResult: OCR results with text layout information

42

"""

43

```

44

45

### Asynchronous Text Reading

46

47

Advanced text recognition supporting both printed and handwritten text with high accuracy. This is a two-step process requiring operation polling.

48

49

```python { .api }

50

def read(url, language=None, pages=None, model_version="latest", reading_order=None, custom_headers=None, raw=False, **operation_config):

51

"""

52

Start asynchronous text reading operation.

53

54

Args:

55

url (str): Publicly reachable URL of an image or PDF

56

language (str, optional): BCP-47 language code for text recognition.

57

Supported languages include extensive list for OCR detection

58

pages (list[int], optional): Page numbers to process (for multi-page documents)

59

model_version (str, optional): Model version. Default: "latest"

60

reading_order (str, optional): Reading order algorithm ('basic' or 'natural')

61

62

Returns:

63

str: Operation location URL for polling status

64

65

Note:

66

This starts an asynchronous operation. Use get_read_result() to retrieve results.

67

"""

68

69

def read_in_stream(image, language=None, pages=None, model_version="latest", reading_order=None, custom_headers=None, raw=False, **operation_config):

70

"""

71

Start text reading from binary stream.

72

73

Args:

74

image (Generator): Binary image data stream

75

language (str, optional): Text language for recognition

76

pages (list[int], optional): Page numbers to process

77

model_version (str, optional): Model version. Default: "latest"

78

reading_order (str, optional): Reading order algorithm ('basic' or 'natural')

79

80

Returns:

81

str: Operation location URL for polling

82

"""

83

84

def get_read_result(operation_id, custom_headers=None, raw=False, **operation_config):

85

"""

86

Get result of asynchronous read operation.

87

88

Args:

89

operation_id (str): Operation ID extracted from read operation location URL

90

91

Returns:

92

ReadOperationResult: Text recognition results with status

93

94

Note:

95

Poll this endpoint until status is 'succeeded' or 'failed'.

96

Status values: notStarted, running, succeeded, failed

97

"""

98

```

99

100

## Usage Examples

101

102

### Basic OCR (Printed Text)

103

104

```python

105

from azure.cognitiveservices.vision.computervision import ComputerVisionClient

106

from msrest.authentication import CognitiveServicesCredentials

107

108

# Initialize client

109

credentials = CognitiveServicesCredentials("your-api-key")

110

client = ComputerVisionClient("https://your-endpoint.cognitiveservices.azure.com/", credentials)

111

112

# Perform OCR on printed text

113

image_url = "https://example.com/document.jpg"

114

ocr_result = client.recognize_printed_text(detect_orientation=True, url=image_url)

115

116

print(f"Language: {ocr_result.language}")

117

print(f"Text angle: {ocr_result.text_angle}")

118

print(f"Orientation: {ocr_result.orientation}")

119

120

# Extract text by regions, lines, and words

121

for region in ocr_result.regions:

122

for line in region.lines:

123

line_text = " ".join([word.text for word in line.words])

124

print(f"Line: {line_text}")

125

126

# Individual word details

127

for word in line.words:

128

print(f" Word: '{word.text}' at {word.bounding_box}")

129

```

130

131

### Advanced Text Reading (Async)

132

133

```python

134

import time

135

from azure.cognitiveservices.vision.computervision.models import OperationStatusCodes

136

137

# Start read operation

138

image_url = "https://example.com/handwritten-note.jpg"

139

read_response = client.read(image_url, raw=True)

140

141

# Extract operation ID from location header

142

operation_location = read_response.headers["Operation-Location"]

143

operation_id = operation_location.split("/")[-1]

144

145

# Poll for completion

146

while True:

147

read_result = client.get_read_result(operation_id)

148

149

if read_result.status == OperationStatusCodes.succeeded:

150

break

151

elif read_result.status == OperationStatusCodes.failed:

152

print("Text recognition failed")

153

break

154

155

time.sleep(1)

156

157

# Process results

158

for page in read_result.analyze_result.read_results:

159

print(f"Page {page.page}:")

160

161

for line in page.lines:

162

print(f" Line: '{line.text}'")

163

print(f" Bounding box: {line.bounding_box}")

164

165

# Check for handwriting

166

if line.appearance and line.appearance.style:

167

if line.appearance.style.name == "handwriting":

168

print(f" Style: Handwriting (confidence: {line.appearance.style.confidence})")

169

170

# Individual words

171

for word in line.words:

172

print(f" Word: '{word.text}' (confidence: {word.confidence})")

173

```

174

175

### Local File OCR

176

177

```python

178

# OCR from local file

179

with open("local_document.jpg", "rb") as image_stream:

180

ocr_result = client.recognize_printed_text_in_stream(

181

detect_orientation=True,

182

image=image_stream,

183

language="en"

184

)

185

186

# Extract all text

187

all_text = []

188

for region in ocr_result.regions:

189

for line in region.lines:

190

line_text = " ".join([word.text for word in line.words])

191

all_text.append(line_text)

192

193

print("\n".join(all_text))

194

```

195

196

### Multi-page Document Processing

197

198

```python

199

# Process specific pages of a multi-page document

200

pdf_url = "https://example.com/multi-page-document.pdf"

201

pages_to_process = [1, 3, 5] # Process pages 1, 3, and 5

202

203

read_response = client.read(pdf_url, pages=pages_to_process, raw=True)

204

operation_id = read_response.headers["Operation-Location"].split("/")[-1]

205

206

# Poll and get results (same as above)

207

# ... polling code ...

208

209

# Results will contain only the specified pages

210

for page in read_result.analyze_result.read_results:

211

print(f"Processing page {page.page}")

212

# ... process page content ...

213

```

214

215

## Response Data Types

216

217

### OcrResult

218

219

```python { .api }

220

class OcrResult:

221

"""

222

OCR operation result for printed text.

223

224

Attributes:

225

language (str): Detected or specified language code

226

text_angle (float): Text angle in degrees (-180 to 180)

227

orientation (str): Text orientation (Up, Down, Left, Right)

228

regions (list[OcrRegion]): Text regions in the image

229

"""

230

```

231

232

### OcrRegion

233

234

```python { .api }

235

class OcrRegion:

236

"""

237

OCR text region containing multiple lines.

238

239

Attributes:

240

bounding_box (str): Comma-separated bounding box coordinates (left,top,width,height)

241

lines (list[OcrLine]): Text lines within the region

242

"""

243

```

244

245

### OcrLine

246

247

```python { .api }

248

class OcrLine:

249

"""

250

OCR text line containing multiple words.

251

252

Attributes:

253

bounding_box (str): Comma-separated bounding box coordinates

254

words (list[OcrWord]): Words within the line

255

"""

256

```

257

258

### OcrWord

259

260

```python { .api }

261

class OcrWord:

262

"""

263

Individual OCR word result.

264

265

Attributes:

266

bounding_box (str): Comma-separated bounding box coordinates

267

text (str): Recognized word text

268

"""

269

```

270

271

### ReadOperationResult

272

273

```python { .api }

274

class ReadOperationResult:

275

"""

276

Result of asynchronous read operation.

277

278

Attributes:

279

status (OperationStatusCodes): Operation status (notStarted, running, succeeded, failed)

280

created_date_time (datetime): Operation creation timestamp

281

last_updated_date_time (datetime): Last update timestamp

282

analyze_result (AnalyzeResults): Text analysis results (when succeeded)

283

"""

284

```

285

286

### AnalyzeResults

287

288

```python { .api }

289

class AnalyzeResults:

290

"""

291

Text analysis results from read operation.

292

293

Attributes:

294

version (str): Schema version

295

model_version (str): OCR model version used

296

read_results (list[ReadResult]): Text extraction results per page

297

"""

298

```

299

300

### ReadResult

301

302

```python { .api }

303

class ReadResult:

304

"""

305

Text reading result for a single page.

306

307

Attributes:

308

page (int): Page number (1-indexed)

309

language (str): Detected language

310

angle (float): Text angle in degrees

311

width (float): Page width

312

height (float): Page height

313

unit (TextRecognitionResultDimensionUnit): Dimension unit (pixel, inch)

314

lines (list[Line]): Extracted text lines

315

"""

316

```

317

318

### Line

319

320

```python { .api }

321

class Line:

322

"""

323

Text line with layout and style information.

324

325

Attributes:

326

language (str): Line language

327

bounding_box (list[float]): Bounding box coordinates [x1,y1,x2,y2,x3,y3,x4,y4]

328

appearance (Appearance): Style information (handwriting detection)

329

text (str): Combined text of all words in the line

330

words (list[Word]): Individual words within the line

331

"""

332

```

333

334

### Word

335

336

```python { .api }

337

class Word:

338

"""

339

Individual word with position and confidence.

340

341

Attributes:

342

bounding_box (list[float]): Word bounding box coordinates

343

text (str): Recognized word text

344

confidence (float): Recognition confidence score (0.0 to 1.0)

345

"""

346

```

347

348

### Appearance

349

350

```python { .api }

351

class Appearance:

352

"""

353

Text appearance and style information.

354

355

Attributes:

356

style (Style): Text style classification

357

"""

358

```

359

360

### Style

361

362

```python { .api }

363

class Style:

364

"""

365

Text style classification.

366

367

Attributes:

368

name (TextStyle): Style type (other, handwriting)

369

confidence (float): Style detection confidence (0.0 to 1.0)

370

"""

371

```

372

373

## Enumerations

374

375

### OperationStatusCodes

376

377

```python { .api }

378

class OperationStatusCodes(str, Enum):

379

"""Asynchronous operation status codes."""

380

381

not_started = "notStarted"

382

running = "running"

383

failed = "failed"

384

succeeded = "succeeded"

385

```

386

387

### TextStyle

388

389

```python { .api }

390

class TextStyle(str, Enum):

391

"""Text style classification values."""

392

393

other = "other"

394

handwriting = "handwriting"

395

```

396

397

### TextRecognitionResultDimensionUnit

398

399

```python { .api }

400

class TextRecognitionResultDimensionUnit(str, Enum):

401

"""Dimension units for text recognition results."""

402

403

pixel = "pixel"

404

inch = "inch"

405

```