0
# Form Recognition (Legacy API)
1
2
Traditional form processing capabilities using the legacy Form Recognizer API (v2.0, v2.1). This API provides prebuilt models for common document types and basic custom form training functionality. While still supported, the modern Document Analysis API is recommended for new applications.
3
4
## Capabilities
5
6
### Receipt Recognition
7
8
Extracts key information from receipts including merchant details, transaction amounts, dates, and line items using the prebuilt receipt model.
9
10
```python { .api }
11
def begin_recognize_receipts(receipt: Union[bytes, IO[bytes]], **kwargs) -> LROPoller[List[RecognizedForm]]:
12
"""
13
Recognize receipt data from documents.
14
15
Parameters:
16
- receipt: Receipt document as bytes or file stream
17
- locale: Optional locale hint (e.g., "en-US")
18
- include_field_elements: Include field elements in response
19
- content_type: MIME type of the document
20
21
Returns:
22
LROPoller that yields List[RecognizedForm] with extracted receipt data
23
"""
24
25
def begin_recognize_receipts_from_url(receipt_url: str, **kwargs) -> LROPoller[List[RecognizedForm]]:
26
"""
27
Recognize receipt data from document URL.
28
29
Parameters:
30
- receipt_url: Publicly accessible URL to receipt document
31
- locale: Optional locale hint
32
- include_field_elements: Include field elements in response
33
34
Returns:
35
LROPoller that yields List[RecognizedForm] with extracted receipt data
36
"""
37
```
38
39
#### Usage Example
40
41
```python
42
from azure.ai.formrecognizer import FormRecognizerClient
43
from azure.core.credentials import AzureKeyCredential
44
45
client = FormRecognizerClient(endpoint, AzureKeyCredential("key"))
46
47
# From local file
48
with open("receipt.jpg", "rb") as receipt_file:
49
poller = client.begin_recognize_receipts(receipt_file, locale="en-US")
50
receipts = poller.result()
51
52
# Access extracted data
53
for receipt in receipts:
54
merchant_name = receipt.fields.get("MerchantName")
55
if merchant_name:
56
print(f"Merchant: {merchant_name.value}")
57
58
total = receipt.fields.get("Total")
59
if total:
60
print(f"Total: {total.value}")
61
62
# Access line items
63
items = receipt.fields.get("Items")
64
if items:
65
for item in items.value:
66
name = item.value.get("Name")
67
price = item.value.get("TotalPrice")
68
if name and price:
69
print(f"Item: {name.value} - ${price.value}")
70
```
71
72
### Business Card Recognition
73
74
Extracts contact information from business cards including names, job titles, organizations, phone numbers, and email addresses.
75
76
```python { .api }
77
def begin_recognize_business_cards(business_card: Union[bytes, IO[bytes]], **kwargs) -> LROPoller[List[RecognizedForm]]:
78
"""
79
Extract business card information.
80
81
Parameters:
82
- business_card: Business card document as bytes or file stream
83
- locale: Optional locale hint
84
- include_field_elements: Include field elements in response
85
- content_type: MIME type of the document
86
87
Returns:
88
LROPoller that yields List[RecognizedForm] with contact information
89
"""
90
91
def begin_recognize_business_cards_from_url(business_card_url: str, **kwargs) -> LROPoller[List[RecognizedForm]]:
92
"""
93
Extract business card information from URL.
94
95
Parameters:
96
- business_card_url: Publicly accessible URL to business card
97
- locale: Optional locale hint
98
- include_field_elements: Include field elements in response
99
100
Returns:
101
LROPoller that yields List[RecognizedForm] with contact information
102
"""
103
```
104
105
### Invoice Recognition
106
107
Processes invoices to extract vendor information, customer details, invoice amounts, due dates, and line item details.
108
109
```python { .api }
110
def begin_recognize_invoices(invoice: Union[bytes, IO[bytes]], **kwargs) -> LROPoller[List[RecognizedForm]]:
111
"""
112
Extract invoice information using prebuilt model.
113
114
Parameters:
115
- invoice: Invoice document as bytes or file stream
116
- locale: Optional locale hint
117
- include_field_elements: Include field elements in response
118
- content_type: MIME type of the document
119
120
Returns:
121
LROPoller that yields List[RecognizedForm] with invoice data
122
"""
123
124
def begin_recognize_invoices_from_url(invoice_url: str, **kwargs) -> LROPoller[List[RecognizedForm]]:
125
"""
126
Extract invoice information from URL.
127
128
Parameters:
129
- invoice_url: Publicly accessible URL to invoice document
130
- locale: Optional locale hint
131
- include_field_elements: Include field elements in response
132
133
Returns:
134
LROPoller that yields List[RecognizedForm] with invoice data
135
"""
136
```
137
138
### Identity Document Recognition
139
140
Extracts information from identity documents such as driver's licenses and passports, including personal details, document numbers, and expiration dates.
141
142
```python { .api }
143
def begin_recognize_identity_documents(identity_document: Union[bytes, IO[bytes]], **kwargs) -> LROPoller[List[RecognizedForm]]:
144
"""
145
Extract identity document information.
146
147
Parameters:
148
- identity_document: ID document as bytes or file stream
149
- include_field_elements: Include field elements in response
150
- content_type: MIME type of the document
151
152
Returns:
153
LROPoller that yields List[RecognizedForm] with identity information
154
"""
155
156
def begin_recognize_identity_documents_from_url(identity_document_url: str, **kwargs) -> LROPoller[List[RecognizedForm]]:
157
"""
158
Extract identity document information from URL.
159
160
Parameters:
161
- identity_document_url: Publicly accessible URL to ID document
162
- include_field_elements: Include field elements in response
163
164
Returns:
165
LROPoller that yields List[RecognizedForm] with identity information
166
"""
167
```
168
169
### Content Recognition
170
171
Extracts layout information including text, tables, and selection marks without using a specific model. Useful for general document layout analysis.
172
173
```python { .api }
174
def begin_recognize_content(form: Union[bytes, IO[bytes]], **kwargs) -> LROPoller[List[FormPage]]:
175
"""
176
Extract layout information from documents.
177
178
Parameters:
179
- form: Document as bytes or file stream
180
- language: Language code for text recognition
181
- pages: Specific page numbers to analyze
182
- reading_order: Reading order algorithm
183
- content_type: MIME type of the document
184
185
Returns:
186
LROPoller that yields List[FormPage] with layout information
187
"""
188
189
def begin_recognize_content_from_url(form_url: str, **kwargs) -> LROPoller[List[FormPage]]:
190
"""
191
Extract layout information from document URL.
192
193
Parameters:
194
- form_url: Publicly accessible URL to document
195
- language: Language code for text recognition
196
- pages: Specific page numbers to analyze
197
- reading_order: Reading order algorithm
198
199
Returns:
200
LROPoller that yields List[FormPage] with layout information
201
"""
202
```
203
204
### Custom Form Recognition
205
206
Uses custom trained models to extract information from domain-specific forms and documents.
207
208
```python { .api }
209
def begin_recognize_custom_forms(model_id: str, form: Union[bytes, IO[bytes]], **kwargs) -> LROPoller[List[RecognizedForm]]:
210
"""
211
Recognize forms using custom trained model.
212
213
Parameters:
214
- model_id: ID of custom trained model
215
- form: Form document as bytes or file stream
216
- include_field_elements: Include field elements in response
217
- content_type: MIME type of the document
218
219
Returns:
220
LROPoller that yields List[RecognizedForm] with extracted custom form data
221
"""
222
223
def begin_recognize_custom_forms_from_url(model_id: str, form_url: str, **kwargs) -> LROPoller[List[RecognizedForm]]:
224
"""
225
Recognize forms from URL using custom model.
226
227
Parameters:
228
- model_id: ID of custom trained model
229
- form_url: Publicly accessible URL to form document
230
- include_field_elements: Include field elements in response
231
232
Returns:
233
LROPoller that yields List[RecognizedForm] with extracted custom form data
234
"""
235
```
236
237
#### Custom Form Usage Example
238
239
```python
240
# Recognize custom form
241
model_id = "your-custom-model-id"
242
243
with open("custom_form.pdf", "rb") as form_file:
244
poller = client.begin_recognize_custom_forms(model_id, form_file)
245
forms = poller.result()
246
247
# Process results
248
for form in forms:
249
print(f"Form type: {form.form_type}")
250
print(f"Confidence: {form.form_type_confidence}")
251
252
for field_name, field in form.fields.items():
253
print(f"{field_name}: {field.value} (confidence: {field.confidence})")
254
```
255
256
## FormRecognizerClient
257
258
```python { .api }
259
class FormRecognizerClient:
260
"""
261
Client for analyzing forms using Form Recognizer API v2.1 and below.
262
"""
263
264
def __init__(
265
self,
266
endpoint: str,
267
credential: Union[AzureKeyCredential, TokenCredential],
268
**kwargs
269
):
270
"""
271
Initialize FormRecognizerClient.
272
273
Parameters:
274
- endpoint: Cognitive Services endpoint URL
275
- credential: Authentication credential
276
- api_version: API version (default: FormRecognizerApiVersion.V2_1)
277
"""
278
279
def close(self) -> None:
280
"""Close client and release resources."""
281
282
# Async version
283
class AsyncFormRecognizerClient:
284
"""
285
Async client for analyzing forms using Form Recognizer API v2.1 and below.
286
287
Provides the same methods as FormRecognizerClient but with async/await support.
288
"""
289
290
def __init__(
291
self,
292
endpoint: str,
293
credential: Union[AzureKeyCredential, AsyncTokenCredential],
294
**kwargs
295
):
296
"""
297
Initialize AsyncFormRecognizerClient.
298
299
Parameters:
300
- endpoint: Cognitive Services endpoint URL
301
- credential: Authentication credential (must support async operations)
302
- api_version: API version (default: FormRecognizerApiVersion.V2_1)
303
"""
304
305
async def begin_recognize_receipts(self, receipt: Union[bytes, IO[bytes]], **kwargs) -> AsyncLROPoller[List[RecognizedForm]]: ...
306
async def begin_recognize_receipts_from_url(self, receipt_url: str, **kwargs) -> AsyncLROPoller[List[RecognizedForm]]: ...
307
async def begin_recognize_business_cards(self, business_card: Union[bytes, IO[bytes]], **kwargs) -> AsyncLROPoller[List[RecognizedForm]]: ...
308
async def begin_recognize_business_cards_from_url(self, business_card_url: str, **kwargs) -> AsyncLROPoller[List[RecognizedForm]]: ...
309
async def begin_recognize_identity_documents(self, identity_document: Union[bytes, IO[bytes]], **kwargs) -> AsyncLROPoller[List[RecognizedForm]]: ...
310
async def begin_recognize_identity_documents_from_url(self, identity_document_url: str, **kwargs) -> AsyncLROPoller[List[RecognizedForm]]: ...
311
async def begin_recognize_invoices(self, invoice: Union[bytes, IO[bytes]], **kwargs) -> AsyncLROPoller[List[RecognizedForm]]: ...
312
async def begin_recognize_invoices_from_url(self, invoice_url: str, **kwargs) -> AsyncLROPoller[List[RecognizedForm]]: ...
313
async def begin_recognize_content(self, form: Union[bytes, IO[bytes]], **kwargs) -> AsyncLROPoller[List[FormPage]]: ...
314
async def begin_recognize_content_from_url(self, form_url: str, **kwargs) -> AsyncLROPoller[List[FormPage]]: ...
315
async def begin_recognize_custom_forms(self, model_id: str, form: Union[bytes, IO[bytes]], **kwargs) -> AsyncLROPoller[List[RecognizedForm]]: ...
316
async def begin_recognize_custom_forms_from_url(self, model_id: str, form_url: str, **kwargs) -> AsyncLROPoller[List[RecognizedForm]]: ...
317
318
async def close(self) -> None:
319
"""Close client and release resources."""
320
```
321
322
## Common Parameters
323
324
### Content Types
325
```python { .api }
326
class FormContentType(str, Enum):
327
APPLICATION_PDF = "application/pdf"
328
IMAGE_JPEG = "image/jpeg"
329
IMAGE_PNG = "image/png"
330
IMAGE_TIFF = "image/tiff"
331
IMAGE_BMP = "image/bmp"
332
```
333
334
### Language Codes
335
Common locale values for enhanced recognition:
336
- `"en-US"` - English (United States)
337
- `"en-AU"` - English (Australia)
338
- `"en-CA"` - English (Canada)
339
- `"en-GB"` - English (Great Britain)
340
- `"en-IN"` - English (India)
341
342
## Error Handling
343
344
```python { .api }
345
from azure.ai.formrecognizer import FormRecognizerError
346
347
try:
348
poller = client.begin_recognize_receipts(receipt_data)
349
result = poller.result()
350
except FormRecognizerError as e:
351
print(f"Recognition failed: {e.error_code} - {e.message}")
352
if hasattr(e, 'details'):
353
for detail in e.details:
354
print(f"Detail: {detail}")
355
```
356
357
## Polling Operations
358
359
All recognition operations return Long Running Operation (LRO) pollers:
360
361
```python
362
# Start operation
363
poller = client.begin_recognize_receipts(receipt_data)
364
365
# Check status
366
print(f"Status: {poller.status()}")
367
368
# Wait for completion (blocking)
369
result = poller.result()
370
371
# Poll with custom interval
372
result = poller.result(timeout=300) # 5 minute timeout
373
```