Tessl Tile for pypi/azure-cognitiveservices-vision-computervision@0.9.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

area-of-interest.md domain-analysis.md image-analysis.md image-description.md image-tagging.md index.md object-detection.md ocr-text-recognition.md thumbnail-generation.md

ocr-text-recognition.mddocs/

0
# OCR and Text Recognition
1

2
Extract text from images using both synchronous OCR for printed text and asynchronous Read API for comprehensive text recognition including handwritten text. The service supports multiple languages and provides detailed text layout information.
3

4
## Capabilities
5

6
### Synchronous OCR (Printed Text)
7

8
Immediate text extraction from images containing printed text with language detection and orientation analysis.
9

10
```python { .api }
11
def recognize_printed_text(detect_orientation, url, language=None, custom_headers=None, raw=False, **operation_config):
12
    """
13
    Perform OCR on printed text in images.
14
    
15
    Args:
16
        detect_orientation (bool): Whether to detect and correct text orientation
17
        url (str): Publicly reachable URL of an image
18
        language (str, optional): OCR language code. If not specified, auto-detect is used.
19
            Supported languages include: en, zh-Hans, zh-Hant, cs, da, nl, fi, fr, de, 
20
            el, hu, it, ja, ko, nb, pl, pt, ru, es, sv, tr, ar, ro, sr-Cyrl, sr-Latn, sk
21
        custom_headers (dict, optional): Custom HTTP headers
22
        raw (bool, optional): Return raw response. Default: False
23
        
24
    Returns:
25
        OcrResult: OCR results with text regions, lines, and words
26
        
27
    Raises:
28
        ComputerVisionOcrErrorException: OCR operation error
29
    """
30

31
def recognize_printed_text_in_stream(detect_orientation, image, language=None, custom_headers=None, raw=False, **operation_config):
32
    """
33
    Perform OCR on printed text from binary stream.
34
    
35
    Args:
36
        detect_orientation (bool): Whether to detect text orientation
37
        image (Generator): Binary image data stream
38
        language (str, optional): OCR language code
39
        
40
    Returns:
41
        OcrResult: OCR results with text layout information
42
    """
43
```
44

45
### Asynchronous Text Reading
46

47
Advanced text recognition supporting both printed and handwritten text with high accuracy. This is a two-step process requiring operation polling.
48

49
```python { .api }
50
def read(url, language=None, pages=None, model_version="latest", reading_order=None, custom_headers=None, raw=False, **operation_config):
51
    """
52
    Start asynchronous text reading operation.
53
    
54
    Args:
55
        url (str): Publicly reachable URL of an image or PDF
56
        language (str, optional): BCP-47 language code for text recognition.
57
            Supported languages include extensive list for OCR detection
58
        pages (list[int], optional): Page numbers to process (for multi-page documents)
59
        model_version (str, optional): Model version. Default: "latest"
60
        reading_order (str, optional): Reading order algorithm ('basic' or 'natural')
61
        
62
    Returns:
63
        str: Operation location URL for polling status
64
        
65
    Note:
66
        This starts an asynchronous operation. Use get_read_result() to retrieve results.
67
    """
68

69
def read_in_stream(image, language=None, pages=None, model_version="latest", reading_order=None, custom_headers=None, raw=False, **operation_config):
70
    """
71
    Start text reading from binary stream.
72
    
73
    Args:
74
        image (Generator): Binary image data stream
75
        language (str, optional): Text language for recognition
76
        pages (list[int], optional): Page numbers to process
77
        model_version (str, optional): Model version. Default: "latest"
78
        reading_order (str, optional): Reading order algorithm ('basic' or 'natural')
79
        
80
    Returns:
81
        str: Operation location URL for polling
82
    """
83

84
def get_read_result(operation_id, custom_headers=None, raw=False, **operation_config):
85
    """
86
    Get result of asynchronous read operation.
87
    
88
    Args:
89
        operation_id (str): Operation ID extracted from read operation location URL
90
        
91
    Returns:
92
        ReadOperationResult: Text recognition results with status
93
        
94
    Note:
95
        Poll this endpoint until status is 'succeeded' or 'failed'.
96
        Status values: notStarted, running, succeeded, failed
97
    """
98
```
99

100
## Usage Examples
101

102
### Basic OCR (Printed Text)
103

104
```python
105
from azure.cognitiveservices.vision.computervision import ComputerVisionClient
106
from msrest.authentication import CognitiveServicesCredentials
107

108
# Initialize client
109
credentials = CognitiveServicesCredentials("your-api-key")
110
client = ComputerVisionClient("https://your-endpoint.cognitiveservices.azure.com/", credentials)
111

112
# Perform OCR on printed text
113
image_url = "https://example.com/document.jpg"
114
ocr_result = client.recognize_printed_text(detect_orientation=True, url=image_url)
115

116
print(f"Language: {ocr_result.language}")
117
print(f"Text angle: {ocr_result.text_angle}")
118
print(f"Orientation: {ocr_result.orientation}")
119

120
# Extract text by regions, lines, and words
121
for region in ocr_result.regions:
122
    for line in region.lines:
123
        line_text = " ".join([word.text for word in line.words])
124
        print(f"Line: {line_text}")
125
        
126
        # Individual word details  
127
        for word in line.words:
128
            print(f"  Word: '{word.text}' at {word.bounding_box}")
129
```
130

131
### Advanced Text Reading (Async)
132

133
```python
134
import time
135
from azure.cognitiveservices.vision.computervision.models import OperationStatusCodes
136

137
# Start read operation
138
image_url = "https://example.com/handwritten-note.jpg"
139
read_response = client.read(image_url, raw=True)
140

141
# Extract operation ID from location header
142
operation_location = read_response.headers["Operation-Location"]
143
operation_id = operation_location.split("/")[-1]
144

145
# Poll for completion
146
while True:
147
    read_result = client.get_read_result(operation_id)
148
    
149
    if read_result.status == OperationStatusCodes.succeeded:
150
        break
151
    elif read_result.status == OperationStatusCodes.failed:
152
        print("Text recognition failed")
153
        break
154
    
155
    time.sleep(1)
156

157
# Process results
158
for page in read_result.analyze_result.read_results:
159
    print(f"Page {page.page}:")
160
    
161
    for line in page.lines:
162
        print(f"  Line: '{line.text}'")
163
        print(f"    Bounding box: {line.bounding_box}")
164
        
165
        # Check for handwriting
166
        if line.appearance and line.appearance.style:
167
            if line.appearance.style.name == "handwriting":
168
                print(f"    Style: Handwriting (confidence: {line.appearance.style.confidence})")
169
        
170
        # Individual words
171
        for word in line.words:
172
            print(f"    Word: '{word.text}' (confidence: {word.confidence})")
173
```
174

175
### Local File OCR
176

177
```python
178
# OCR from local file
179
with open("local_document.jpg", "rb") as image_stream:
180
    ocr_result = client.recognize_printed_text_in_stream(
181
        detect_orientation=True,
182
        image=image_stream,
183
        language="en"
184
    )
185
    
186
    # Extract all text
187
    all_text = []
188
    for region in ocr_result.regions:
189
        for line in region.lines:
190
            line_text = " ".join([word.text for word in line.words])
191
            all_text.append(line_text)
192
    
193
    print("\n".join(all_text))
194
```
195

196
### Multi-page Document Processing
197

198
```python
199
# Process specific pages of a multi-page document
200
pdf_url = "https://example.com/multi-page-document.pdf"
201
pages_to_process = [1, 3, 5]  # Process pages 1, 3, and 5
202

203
read_response = client.read(pdf_url, pages=pages_to_process, raw=True)
204
operation_id = read_response.headers["Operation-Location"].split("/")[-1]
205

206
# Poll and get results (same as above)
207
# ... polling code ...
208

209
# Results will contain only the specified pages
210
for page in read_result.analyze_result.read_results:
211
    print(f"Processing page {page.page}")
212
    # ... process page content ...
213
```
214

215
## Response Data Types
216

217
### OcrResult
218

219
```python { .api }
220
class OcrResult:
221
    """
222
    OCR operation result for printed text.
223
    
224
    Attributes:
225
        language (str): Detected or specified language code
226
        text_angle (float): Text angle in degrees (-180 to 180)
227
        orientation (str): Text orientation (Up, Down, Left, Right)
228
        regions (list[OcrRegion]): Text regions in the image
229
    """
230
```
231

232
### OcrRegion
233

234
```python { .api }
235
class OcrRegion:
236
    """
237
    OCR text region containing multiple lines.
238
    
239
    Attributes:
240
        bounding_box (str): Comma-separated bounding box coordinates (left,top,width,height)
241
        lines (list[OcrLine]): Text lines within the region
242
    """
243
```
244

245
### OcrLine
246

247
```python { .api }
248
class OcrLine:
249
    """
250
    OCR text line containing multiple words.
251
    
252
    Attributes:
253
        bounding_box (str): Comma-separated bounding box coordinates
254
        words (list[OcrWord]): Words within the line
255
    """
256
```
257

258
### OcrWord
259

260
```python { .api }
261
class OcrWord:
262
    """
263
    Individual OCR word result.
264
    
265
    Attributes:
266
        bounding_box (str): Comma-separated bounding box coordinates
267
        text (str): Recognized word text
268
    """
269
```
270

271
### ReadOperationResult
272

273
```python { .api }
274
class ReadOperationResult:
275
    """
276
    Result of asynchronous read operation.
277
    
278
    Attributes:
279
        status (OperationStatusCodes): Operation status (notStarted, running, succeeded, failed)
280
        created_date_time (datetime): Operation creation timestamp
281
        last_updated_date_time (datetime): Last update timestamp
282
        analyze_result (AnalyzeResults): Text analysis results (when succeeded)
283
    """
284
```
285

286
### AnalyzeResults
287

288
```python { .api }
289
class AnalyzeResults:
290
    """
291
    Text analysis results from read operation.
292
    
293
    Attributes:
294
        version (str): Schema version
295
        model_version (str): OCR model version used
296
        read_results (list[ReadResult]): Text extraction results per page
297
    """
298
```
299

300
### ReadResult
301

302
```python { .api }
303
class ReadResult:
304
    """
305
    Text reading result for a single page.
306
    
307
    Attributes:
308
        page (int): Page number (1-indexed)
309
        language (str): Detected language
310
        angle (float): Text angle in degrees
311
        width (float): Page width
312
        height (float): Page height
313
        unit (TextRecognitionResultDimensionUnit): Dimension unit (pixel, inch)
314
        lines (list[Line]): Extracted text lines
315
    """
316
```
317

318
### Line
319

320
```python { .api }
321
class Line:
322
    """
323
    Text line with layout and style information.
324
    
325
    Attributes:
326
        language (str): Line language
327
        bounding_box (list[float]): Bounding box coordinates [x1,y1,x2,y2,x3,y3,x4,y4]
328
        appearance (Appearance): Style information (handwriting detection)
329
        text (str): Combined text of all words in the line
330
        words (list[Word]): Individual words within the line
331
    """
332
```
333

334
### Word
335

336
```python { .api }
337
class Word:
338
    """
339
    Individual word with position and confidence.
340
    
341
    Attributes:
342
        bounding_box (list[float]): Word bounding box coordinates
343
        text (str): Recognized word text
344
        confidence (float): Recognition confidence score (0.0 to 1.0)
345
    """
346
```
347

348
### Appearance
349

350
```python { .api }
351
class Appearance:
352
    """
353
    Text appearance and style information.
354
    
355
    Attributes:
356
        style (Style): Text style classification
357
    """
358
```
359

360
### Style
361

362
```python { .api }
363
class Style:
364
    """
365
    Text style classification.
366
    
367
    Attributes:
368
        name (TextStyle): Style type (other, handwriting)
369
        confidence (float): Style detection confidence (0.0 to 1.0)
370
    """
371
```
372

373
## Enumerations
374

375
### OperationStatusCodes
376

377
```python { .api }
378
class OperationStatusCodes(str, Enum):
379
    """Asynchronous operation status codes."""
380
    
381
    not_started = "notStarted"
382
    running = "running"
383
    failed = "failed" 
384
    succeeded = "succeeded"
385
```
386

387
### TextStyle
388

389
```python { .api }
390
class TextStyle(str, Enum):
391
    """Text style classification values."""
392
    
393
    other = "other"
394
    handwriting = "handwriting"
395
```
396

397
### TextRecognitionResultDimensionUnit
398

399
```python { .api }
400
class TextRecognitionResultDimensionUnit(str, Enum):
401
    """Dimension units for text recognition results."""
402
    
403
    pixel = "pixel"
404
    inch = "inch"
405
```

Version

Tile

Files

ocr-text-recognition.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

ocr-text-recognition.mddocs/