0
# OCR and Text Recognition
1
2
Extract text from images using both synchronous OCR for printed text and asynchronous Read API for comprehensive text recognition including handwritten text. The service supports multiple languages and provides detailed text layout information.
3
4
## Capabilities
5
6
### Synchronous OCR (Printed Text)
7
8
Immediate text extraction from images containing printed text with language detection and orientation analysis.
9
10
```python { .api }
11
def recognize_printed_text(detect_orientation, url, language=None, custom_headers=None, raw=False, **operation_config):
12
"""
13
Perform OCR on printed text in images.
14
15
Args:
16
detect_orientation (bool): Whether to detect and correct text orientation
17
url (str): Publicly reachable URL of an image
18
language (str, optional): OCR language code. If not specified, auto-detect is used.
19
Supported languages include: en, zh-Hans, zh-Hant, cs, da, nl, fi, fr, de,
20
el, hu, it, ja, ko, nb, pl, pt, ru, es, sv, tr, ar, ro, sr-Cyrl, sr-Latn, sk
21
custom_headers (dict, optional): Custom HTTP headers
22
raw (bool, optional): Return raw response. Default: False
23
24
Returns:
25
OcrResult: OCR results with text regions, lines, and words
26
27
Raises:
28
ComputerVisionOcrErrorException: OCR operation error
29
"""
30
31
def recognize_printed_text_in_stream(detect_orientation, image, language=None, custom_headers=None, raw=False, **operation_config):
32
"""
33
Perform OCR on printed text from binary stream.
34
35
Args:
36
detect_orientation (bool): Whether to detect text orientation
37
image (Generator): Binary image data stream
38
language (str, optional): OCR language code
39
40
Returns:
41
OcrResult: OCR results with text layout information
42
"""
43
```
44
45
### Asynchronous Text Reading
46
47
Advanced text recognition supporting both printed and handwritten text with high accuracy. This is a two-step process requiring operation polling.
48
49
```python { .api }
50
def read(url, language=None, pages=None, model_version="latest", reading_order=None, custom_headers=None, raw=False, **operation_config):
51
"""
52
Start asynchronous text reading operation.
53
54
Args:
55
url (str): Publicly reachable URL of an image or PDF
56
language (str, optional): BCP-47 language code for text recognition.
57
Supported languages include extensive list for OCR detection
58
pages (list[int], optional): Page numbers to process (for multi-page documents)
59
model_version (str, optional): Model version. Default: "latest"
60
reading_order (str, optional): Reading order algorithm ('basic' or 'natural')
61
62
Returns:
63
str: Operation location URL for polling status
64
65
Note:
66
This starts an asynchronous operation. Use get_read_result() to retrieve results.
67
"""
68
69
def read_in_stream(image, language=None, pages=None, model_version="latest", reading_order=None, custom_headers=None, raw=False, **operation_config):
70
"""
71
Start text reading from binary stream.
72
73
Args:
74
image (Generator): Binary image data stream
75
language (str, optional): Text language for recognition
76
pages (list[int], optional): Page numbers to process
77
model_version (str, optional): Model version. Default: "latest"
78
reading_order (str, optional): Reading order algorithm ('basic' or 'natural')
79
80
Returns:
81
str: Operation location URL for polling
82
"""
83
84
def get_read_result(operation_id, custom_headers=None, raw=False, **operation_config):
85
"""
86
Get result of asynchronous read operation.
87
88
Args:
89
operation_id (str): Operation ID extracted from read operation location URL
90
91
Returns:
92
ReadOperationResult: Text recognition results with status
93
94
Note:
95
Poll this endpoint until status is 'succeeded' or 'failed'.
96
Status values: notStarted, running, succeeded, failed
97
"""
98
```
99
100
## Usage Examples
101
102
### Basic OCR (Printed Text)
103
104
```python
105
from azure.cognitiveservices.vision.computervision import ComputerVisionClient
106
from msrest.authentication import CognitiveServicesCredentials
107
108
# Initialize client
109
credentials = CognitiveServicesCredentials("your-api-key")
110
client = ComputerVisionClient("https://your-endpoint.cognitiveservices.azure.com/", credentials)
111
112
# Perform OCR on printed text
113
image_url = "https://example.com/document.jpg"
114
ocr_result = client.recognize_printed_text(detect_orientation=True, url=image_url)
115
116
print(f"Language: {ocr_result.language}")
117
print(f"Text angle: {ocr_result.text_angle}")
118
print(f"Orientation: {ocr_result.orientation}")
119
120
# Extract text by regions, lines, and words
121
for region in ocr_result.regions:
122
for line in region.lines:
123
line_text = " ".join([word.text for word in line.words])
124
print(f"Line: {line_text}")
125
126
# Individual word details
127
for word in line.words:
128
print(f" Word: '{word.text}' at {word.bounding_box}")
129
```
130
131
### Advanced Text Reading (Async)
132
133
```python
134
import time
135
from azure.cognitiveservices.vision.computervision.models import OperationStatusCodes
136
137
# Start read operation
138
image_url = "https://example.com/handwritten-note.jpg"
139
read_response = client.read(image_url, raw=True)
140
141
# Extract operation ID from location header
142
operation_location = read_response.headers["Operation-Location"]
143
operation_id = operation_location.split("/")[-1]
144
145
# Poll for completion
146
while True:
147
read_result = client.get_read_result(operation_id)
148
149
if read_result.status == OperationStatusCodes.succeeded:
150
break
151
elif read_result.status == OperationStatusCodes.failed:
152
print("Text recognition failed")
153
break
154
155
time.sleep(1)
156
157
# Process results
158
for page in read_result.analyze_result.read_results:
159
print(f"Page {page.page}:")
160
161
for line in page.lines:
162
print(f" Line: '{line.text}'")
163
print(f" Bounding box: {line.bounding_box}")
164
165
# Check for handwriting
166
if line.appearance and line.appearance.style:
167
if line.appearance.style.name == "handwriting":
168
print(f" Style: Handwriting (confidence: {line.appearance.style.confidence})")
169
170
# Individual words
171
for word in line.words:
172
print(f" Word: '{word.text}' (confidence: {word.confidence})")
173
```
174
175
### Local File OCR
176
177
```python
178
# OCR from local file
179
with open("local_document.jpg", "rb") as image_stream:
180
ocr_result = client.recognize_printed_text_in_stream(
181
detect_orientation=True,
182
image=image_stream,
183
language="en"
184
)
185
186
# Extract all text
187
all_text = []
188
for region in ocr_result.regions:
189
for line in region.lines:
190
line_text = " ".join([word.text for word in line.words])
191
all_text.append(line_text)
192
193
print("\n".join(all_text))
194
```
195
196
### Multi-page Document Processing
197
198
```python
199
# Process specific pages of a multi-page document
200
pdf_url = "https://example.com/multi-page-document.pdf"
201
pages_to_process = [1, 3, 5] # Process pages 1, 3, and 5
202
203
read_response = client.read(pdf_url, pages=pages_to_process, raw=True)
204
operation_id = read_response.headers["Operation-Location"].split("/")[-1]
205
206
# Poll and get results (same as above)
207
# ... polling code ...
208
209
# Results will contain only the specified pages
210
for page in read_result.analyze_result.read_results:
211
print(f"Processing page {page.page}")
212
# ... process page content ...
213
```
214
215
## Response Data Types
216
217
### OcrResult
218
219
```python { .api }
220
class OcrResult:
221
"""
222
OCR operation result for printed text.
223
224
Attributes:
225
language (str): Detected or specified language code
226
text_angle (float): Text angle in degrees (-180 to 180)
227
orientation (str): Text orientation (Up, Down, Left, Right)
228
regions (list[OcrRegion]): Text regions in the image
229
"""
230
```
231
232
### OcrRegion
233
234
```python { .api }
235
class OcrRegion:
236
"""
237
OCR text region containing multiple lines.
238
239
Attributes:
240
bounding_box (str): Comma-separated bounding box coordinates (left,top,width,height)
241
lines (list[OcrLine]): Text lines within the region
242
"""
243
```
244
245
### OcrLine
246
247
```python { .api }
248
class OcrLine:
249
"""
250
OCR text line containing multiple words.
251
252
Attributes:
253
bounding_box (str): Comma-separated bounding box coordinates
254
words (list[OcrWord]): Words within the line
255
"""
256
```
257
258
### OcrWord
259
260
```python { .api }
261
class OcrWord:
262
"""
263
Individual OCR word result.
264
265
Attributes:
266
bounding_box (str): Comma-separated bounding box coordinates
267
text (str): Recognized word text
268
"""
269
```
270
271
### ReadOperationResult
272
273
```python { .api }
274
class ReadOperationResult:
275
"""
276
Result of asynchronous read operation.
277
278
Attributes:
279
status (OperationStatusCodes): Operation status (notStarted, running, succeeded, failed)
280
created_date_time (datetime): Operation creation timestamp
281
last_updated_date_time (datetime): Last update timestamp
282
analyze_result (AnalyzeResults): Text analysis results (when succeeded)
283
"""
284
```
285
286
### AnalyzeResults
287
288
```python { .api }
289
class AnalyzeResults:
290
"""
291
Text analysis results from read operation.
292
293
Attributes:
294
version (str): Schema version
295
model_version (str): OCR model version used
296
read_results (list[ReadResult]): Text extraction results per page
297
"""
298
```
299
300
### ReadResult
301
302
```python { .api }
303
class ReadResult:
304
"""
305
Text reading result for a single page.
306
307
Attributes:
308
page (int): Page number (1-indexed)
309
language (str): Detected language
310
angle (float): Text angle in degrees
311
width (float): Page width
312
height (float): Page height
313
unit (TextRecognitionResultDimensionUnit): Dimension unit (pixel, inch)
314
lines (list[Line]): Extracted text lines
315
"""
316
```
317
318
### Line
319
320
```python { .api }
321
class Line:
322
"""
323
Text line with layout and style information.
324
325
Attributes:
326
language (str): Line language
327
bounding_box (list[float]): Bounding box coordinates [x1,y1,x2,y2,x3,y3,x4,y4]
328
appearance (Appearance): Style information (handwriting detection)
329
text (str): Combined text of all words in the line
330
words (list[Word]): Individual words within the line
331
"""
332
```
333
334
### Word
335
336
```python { .api }
337
class Word:
338
"""
339
Individual word with position and confidence.
340
341
Attributes:
342
bounding_box (list[float]): Word bounding box coordinates
343
text (str): Recognized word text
344
confidence (float): Recognition confidence score (0.0 to 1.0)
345
"""
346
```
347
348
### Appearance
349
350
```python { .api }
351
class Appearance:
352
"""
353
Text appearance and style information.
354
355
Attributes:
356
style (Style): Text style classification
357
"""
358
```
359
360
### Style
361
362
```python { .api }
363
class Style:
364
"""
365
Text style classification.
366
367
Attributes:
368
name (TextStyle): Style type (other, handwriting)
369
confidence (float): Style detection confidence (0.0 to 1.0)
370
"""
371
```
372
373
## Enumerations
374
375
### OperationStatusCodes
376
377
```python { .api }
378
class OperationStatusCodes(str, Enum):
379
"""Asynchronous operation status codes."""
380
381
not_started = "notStarted"
382
running = "running"
383
failed = "failed"
384
succeeded = "succeeded"
385
```
386
387
### TextStyle
388
389
```python { .api }
390
class TextStyle(str, Enum):
391
"""Text style classification values."""
392
393
other = "other"
394
handwriting = "handwriting"
395
```
396
397
### TextRecognitionResultDimensionUnit
398
399
```python { .api }
400
class TextRecognitionResultDimensionUnit(str, Enum):
401
"""Dimension units for text recognition results."""
402
403
pixel = "pixel"
404
inch = "inch"
405
```