0
# Model Management
1
2
Comprehensive model lifecycle management capabilities across both legacy Form Recognizer API and modern Document Intelligence API. This includes building custom models, training classifiers, copying models between resources, model composition, and operation monitoring.
3
4
## Capabilities
5
6
### Legacy Model Management (FormTrainingClient)
7
8
Traditional model training and management for Form Recognizer API v2.1 and below, focusing on custom form models with supervised and unsupervised training.
9
10
#### Model Training
11
12
```python { .api }
13
def begin_training(training_files_url: str, use_training_labels: bool, **kwargs) -> LROPoller[CustomFormModel]:
14
"""
15
Train custom form model from training data.
16
17
Parameters:
18
- training_files_url: Azure Blob Storage URL containing training documents
19
- use_training_labels: Whether to use labeled training data (supervised)
20
- model_name: Optional name for the model
21
- prefix: Filter training files by prefix
22
23
Returns:
24
LROPoller that yields CustomFormModel when training completes
25
"""
26
```
27
28
#### Usage Example
29
30
```python
31
from azure.ai.formrecognizer import FormTrainingClient
32
from azure.core.credentials import AzureKeyCredential
33
34
training_client = FormTrainingClient(endpoint, AzureKeyCredential("key"))
35
36
# Train with labeled data (supervised)
37
training_files_url = "https://yourstorageaccount.blob.core.windows.net/training-data?sas-token"
38
39
poller = training_client.begin_training(
40
training_files_url=training_files_url,
41
use_training_labels=True,
42
model_name="Invoice Model v1"
43
)
44
45
model = poller.result()
46
print(f"Model ID: {model.model_id}")
47
print(f"Status: {model.status}")
48
print(f"Accuracy: {model.training_documents[0].page_count}")
49
50
# Use trained model
51
from azure.ai.formrecognizer import FormRecognizerClient
52
53
form_client = FormRecognizerClient(endpoint, AzureKeyCredential("key"))
54
with open("invoice.pdf", "rb") as invoice:
55
poller = form_client.begin_recognize_custom_forms(model.model_id, invoice)
56
result = poller.result()
57
```
58
59
#### Model Information and Listing
60
61
```python { .api }
62
def get_custom_model(model_id: str, **kwargs) -> CustomFormModel:
63
"""
64
Get detailed information about a custom model.
65
66
Parameters:
67
- model_id: ID of the custom model
68
69
Returns:
70
CustomFormModel with complete model details
71
"""
72
73
def list_custom_models(**kwargs) -> ItemPaged[CustomFormModelInfo]:
74
"""
75
List all custom models in the resource.
76
77
Returns:
78
ItemPaged iterator of CustomFormModelInfo objects
79
"""
80
81
def get_account_properties(**kwargs) -> AccountProperties:
82
"""
83
Get account information including model quotas.
84
85
Returns:
86
AccountProperties with quota and usage information
87
"""
88
```
89
90
#### Model Operations
91
92
```python { .api }
93
def delete_model(model_id: str, **kwargs) -> None:
94
"""
95
Delete a custom model.
96
97
Parameters:
98
- model_id: ID of model to delete
99
"""
100
101
def get_copy_authorization(**kwargs) -> Dict[str, str]:
102
"""
103
Generate authorization for copying model to another resource.
104
105
Parameters:
106
- resource_id: Target resource ID
107
- resource_region: Target resource region
108
109
Returns:
110
Dictionary with copy authorization details
111
"""
112
113
def begin_copy_model(model_id: str, target: Dict[str, str], **kwargs) -> LROPoller[CustomFormModelInfo]:
114
"""
115
Copy model to another Form Recognizer resource.
116
117
Parameters:
118
- model_id: Source model ID
119
- target: Copy authorization from target resource
120
121
Returns:
122
LROPoller that yields CustomFormModelInfo for copied model
123
"""
124
125
def begin_create_composed_model(model_ids: List[str], **kwargs) -> LROPoller[CustomFormModel]:
126
"""
127
Create composed model from multiple trained models.
128
129
Parameters:
130
- model_ids: List of model IDs to compose
131
- model_name: Optional name for composed model
132
133
Returns:
134
LROPoller that yields CustomFormModel for composed model
135
"""
136
```
137
138
### Modern Model Management (DocumentModelAdministrationClient)
139
140
Advanced model management for Document Intelligence API 2022-08-31 and later, supporting neural and template-based training modes with enhanced capabilities.
141
142
#### Model Building
143
144
```python { .api }
145
def begin_build_document_model(build_mode: Union[str, ModelBuildMode], **kwargs) -> DocumentModelAdministrationLROPoller[DocumentModelDetails]:
146
"""
147
Build custom document model from training data.
148
149
Parameters:
150
- build_mode: "template" or "neural" (ModelBuildMode enum)
151
- blob_container_url: Azure Blob Storage URL with training documents
152
- prefix: Filter training files by prefix
153
- model_id: Optional custom model ID
154
- description: Model description
155
- tags: Dictionary of custom tags
156
157
Returns:
158
DocumentModelAdministrationLROPoller that yields DocumentModelDetails
159
"""
160
161
def begin_compose_document_model(model_ids: List[str], **kwargs) -> DocumentModelAdministrationLROPoller[DocumentModelDetails]:
162
"""
163
Create composed model from multiple document models.
164
165
Parameters:
166
- model_ids: List of model IDs to compose (max 100)
167
- model_id: Optional custom model ID for composed model
168
- description: Model description
169
- tags: Dictionary of custom tags
170
171
Returns:
172
DocumentModelAdministrationLROPoller that yields DocumentModelDetails
173
"""
174
```
175
176
#### Build Modes
177
178
```python { .api }
179
class ModelBuildMode(str, Enum):
180
"""Model building approaches for different use cases."""
181
TEMPLATE = "template" # Fast training, structured forms with consistent layout
182
NEURAL = "neural" # Slower training, better for varied layouts and complex documents
183
```
184
185
#### Usage Example
186
187
```python
188
from azure.ai.formrecognizer import DocumentModelAdministrationClient, ModelBuildMode
189
from azure.core.credentials import AzureKeyCredential
190
191
admin_client = DocumentModelAdministrationClient(endpoint, AzureKeyCredential("key"))
192
193
# Build neural model for complex documents
194
blob_container_url = "https://yourstorageaccount.blob.core.windows.net/training?sas-token"
195
196
poller = admin_client.begin_build_document_model(
197
build_mode=ModelBuildMode.NEURAL,
198
blob_container_url=blob_container_url,
199
description="Contract Analysis Model",
200
tags={"project": "legal-docs", "version": "1.0"}
201
)
202
203
model = poller.result()
204
print(f"Model ID: {model.model_id}")
205
print(f"Created: {model.created_date_time}")
206
print(f"Description: {model.description}")
207
208
# Use the model
209
from azure.ai.formrecognizer import DocumentAnalysisClient
210
211
doc_client = DocumentAnalysisClient(endpoint, AzureKeyCredential("key"))
212
with open("contract.pdf", "rb") as document:
213
poller = doc_client.begin_analyze_document(model.model_id, document)
214
result = poller.result()
215
```
216
217
#### Model Information and Management
218
219
```python { .api }
220
def get_document_model(model_id: str, **kwargs) -> DocumentModelDetails:
221
"""
222
Get detailed information about a document model.
223
224
Parameters:
225
- model_id: Model identifier
226
227
Returns:
228
DocumentModelDetails with complete model information
229
"""
230
231
def list_document_models(**kwargs) -> ItemPaged[DocumentModelSummary]:
232
"""
233
List all document models in the resource.
234
235
Returns:
236
ItemPaged iterator of DocumentModelSummary objects
237
"""
238
239
def delete_document_model(model_id: str, **kwargs) -> None:
240
"""
241
Delete a document model.
242
243
Parameters:
244
- model_id: Model identifier to delete
245
"""
246
247
def get_resource_details(**kwargs) -> ResourceDetails:
248
"""
249
Get resource information including quotas and usage.
250
251
Returns:
252
ResourceDetails with quota information
253
"""
254
```
255
256
#### Model Copying
257
258
```python { .api }
259
def get_copy_authorization(**kwargs) -> TargetAuthorization:
260
"""
261
Generate authorization for copying model to this resource.
262
263
Parameters:
264
- model_id: Optional target model ID
265
- description: Optional description for copied model
266
- tags: Optional tags for copied model
267
268
Returns:
269
TargetAuthorization for model copying
270
"""
271
272
def begin_copy_document_model_to(model_id: str, target: TargetAuthorization, **kwargs) -> DocumentModelAdministrationLROPoller[DocumentModelDetails]:
273
"""
274
Copy document model to another resource.
275
276
Parameters:
277
- model_id: Source model ID
278
- target: TargetAuthorization from destination resource
279
280
Returns:
281
DocumentModelAdministrationLROPoller that yields DocumentModelDetails
282
"""
283
```
284
285
#### Model Copying Example
286
287
```python
288
# On target resource - generate authorization
289
target_admin_client = DocumentModelAdministrationClient(target_endpoint, target_credential)
290
target_auth = target_admin_client.get_copy_authorization(
291
model_id="copied-model-id",
292
description="Copied invoice model",
293
tags={"source": "prod-resource"}
294
)
295
296
# On source resource - perform copy
297
source_admin_client = DocumentModelAdministrationClient(source_endpoint, source_credential)
298
copy_poller = source_admin_client.begin_copy_document_model_to(
299
"source-model-id",
300
target_auth
301
)
302
303
copied_model = copy_poller.result()
304
print(f"Model copied to: {copied_model.model_id}")
305
```
306
307
### Document Classification
308
309
Building and managing document classifiers for automatic document type detection.
310
311
```python { .api }
312
def begin_build_document_classifier(**kwargs) -> DocumentModelAdministrationLROPoller[DocumentClassifierDetails]:
313
"""
314
Build custom document classifier.
315
316
Parameters:
317
- doc_types: Dictionary mapping document types to training data sources
318
- classifier_id: Optional custom classifier ID
319
- description: Classifier description
320
321
Returns:
322
DocumentModelAdministrationLROPoller that yields DocumentClassifierDetails
323
"""
324
325
def get_document_classifier(classifier_id: str, **kwargs) -> DocumentClassifierDetails:
326
"""
327
Get document classifier information.
328
329
Parameters:
330
- classifier_id: Classifier identifier
331
332
Returns:
333
DocumentClassifierDetails with classifier information
334
"""
335
336
def list_document_classifiers(**kwargs) -> ItemPaged[DocumentClassifierDetails]:
337
"""
338
List all document classifiers.
339
340
Returns:
341
ItemPaged iterator of DocumentClassifierDetails
342
"""
343
344
def delete_document_classifier(classifier_id: str, **kwargs) -> None:
345
"""
346
Delete document classifier.
347
348
Parameters:
349
- classifier_id: Classifier identifier to delete
350
"""
351
```
352
353
#### Classifier Building Example
354
355
```python
356
# Define document types and training data
357
doc_types = {
358
"invoice": {
359
"azure_blob_source": {
360
"container_url": "https://storage.blob.core.windows.net/invoices?sas",
361
"prefix": "training/"
362
}
363
},
364
"receipt": {
365
"azure_blob_source": {
366
"container_url": "https://storage.blob.core.windows.net/receipts?sas",
367
"prefix": "training/"
368
}
369
},
370
"contract": {
371
"azure_blob_file_list_source": {
372
"container_url": "https://storage.blob.core.windows.net/contracts?sas",
373
"file_list": "contract_files.json"
374
}
375
}
376
}
377
378
# Build classifier
379
poller = admin_client.begin_build_document_classifier(
380
doc_types=doc_types,
381
description="Financial Document Classifier",
382
classifier_id="financial-docs-v1"
383
)
384
385
classifier = poller.result()
386
print(f"Classifier ID: {classifier.classifier_id}")
387
print(f"Document types: {list(classifier.doc_types.keys())}")
388
```
389
390
### Operation Monitoring
391
392
Track and monitor long-running operations across the service.
393
394
```python { .api }
395
def list_operations(**kwargs) -> ItemPaged[OperationSummary]:
396
"""
397
List all operations for the resource.
398
399
Returns:
400
ItemPaged iterator of OperationSummary objects
401
"""
402
403
def get_operation(operation_id: str, **kwargs) -> OperationDetails:
404
"""
405
Get detailed information about a specific operation.
406
407
Parameters:
408
- operation_id: Operation identifier
409
410
Returns:
411
OperationDetails with complete operation information
412
"""
413
```
414
415
#### Operation Monitoring Example
416
417
```python
418
# List recent operations
419
operations = admin_client.list_operations()
420
421
for operation in operations:
422
print(f"Operation: {operation.operation_id}")
423
print(f"Kind: {operation.kind}")
424
print(f"Status: {operation.status}")
425
print(f"Progress: {operation.percent_completed}%")
426
print(f"Created: {operation.created_date_time}")
427
428
if operation.status == "failed":
429
# Get detailed error information
430
details = admin_client.get_operation(operation.operation_id)
431
if details.error:
432
print(f"Error: {details.error.code} - {details.error.message}")
433
```
434
435
## FormTrainingClient
436
437
```python { .api }
438
class FormTrainingClient:
439
"""
440
Client for training and managing custom models using Form Recognizer API v2.1 and below.
441
"""
442
443
def __init__(
444
self,
445
endpoint: str,
446
credential: Union[AzureKeyCredential, TokenCredential],
447
**kwargs
448
):
449
"""
450
Initialize FormTrainingClient.
451
452
Parameters:
453
- endpoint: Cognitive Services endpoint URL
454
- credential: Authentication credential
455
- api_version: API version (default: FormRecognizerApiVersion.V2_1)
456
"""
457
458
def get_form_recognizer_client(self, **kwargs) -> FormRecognizerClient:
459
"""
460
Get FormRecognizerClient using same configuration.
461
462
Returns:
463
FormRecognizerClient instance
464
"""
465
466
def close(self) -> None:
467
"""Close client and release resources."""
468
469
# Async version
470
class AsyncFormTrainingClient:
471
"""
472
Async client for training and managing custom models using Form Recognizer API v2.1 and below.
473
474
Provides the same methods as FormTrainingClient but with async/await support.
475
"""
476
477
def __init__(
478
self,
479
endpoint: str,
480
credential: Union[AzureKeyCredential, AsyncTokenCredential],
481
**kwargs
482
):
483
"""
484
Initialize AsyncFormTrainingClient.
485
486
Parameters:
487
- endpoint: Cognitive Services endpoint URL
488
- credential: Authentication credential (must support async operations)
489
- api_version: API version (default: FormRecognizerApiVersion.V2_1)
490
"""
491
492
async def begin_training(self, training_files_url: str, use_training_labels: bool, **kwargs) -> AsyncLROPoller[CustomFormModel]: ...
493
async def delete_model(self, model_id: str, **kwargs) -> None: ...
494
async def list_custom_models(self, **kwargs) -> AsyncItemPaged[CustomFormModelInfo]: ...
495
async def get_account_properties(self, **kwargs) -> AccountProperties: ...
496
async def get_custom_model(self, model_id: str, **kwargs) -> CustomFormModel: ...
497
async def get_copy_authorization(self, **kwargs) -> Dict[str, str]: ...
498
async def begin_copy_model(self, model_id: str, target: Dict[str, str], **kwargs) -> AsyncLROPoller[CustomFormModelInfo]: ...
499
async def begin_create_composed_model(self, model_ids: List[str], **kwargs) -> AsyncLROPoller[CustomFormModel]: ...
500
501
def get_form_recognizer_client(self, **kwargs) -> AsyncFormRecognizerClient:
502
"""
503
Get AsyncFormRecognizerClient using same configuration.
504
505
Returns:
506
AsyncFormRecognizerClient instance
507
"""
508
509
async def close(self) -> None:
510
"""Close client and release resources."""
511
```
512
513
## DocumentModelAdministrationClient
514
515
```python { .api }
516
class DocumentModelAdministrationClient:
517
"""
518
Client for building and managing models using Document Intelligence API 2022-08-31 and later.
519
"""
520
521
def __init__(
522
self,
523
endpoint: str,
524
credential: Union[AzureKeyCredential, TokenCredential],
525
**kwargs
526
):
527
"""
528
Initialize DocumentModelAdministrationClient.
529
530
Parameters:
531
- endpoint: Cognitive Services endpoint URL
532
- credential: Authentication credential
533
- api_version: API version (default: DocumentAnalysisApiVersion.V2023_07_31)
534
"""
535
536
def get_document_analysis_client(self, **kwargs) -> DocumentAnalysisClient:
537
"""
538
Get DocumentAnalysisClient using same configuration.
539
540
Returns:
541
DocumentAnalysisClient instance
542
"""
543
544
def close(self) -> None:
545
"""Close client and release resources."""
546
547
# Async version
548
class AsyncDocumentModelAdministrationClient:
549
"""
550
Async client for building and managing models using Document Intelligence API 2022-08-31 and later.
551
552
Provides the same methods as DocumentModelAdministrationClient but with async/await support.
553
"""
554
555
def __init__(
556
self,
557
endpoint: str,
558
credential: Union[AzureKeyCredential, AsyncTokenCredential],
559
**kwargs
560
):
561
"""
562
Initialize AsyncDocumentModelAdministrationClient.
563
564
Parameters:
565
- endpoint: Cognitive Services endpoint URL
566
- credential: Authentication credential (must support async operations)
567
- api_version: API version (default: DocumentAnalysisApiVersion.V2023_07_31)
568
"""
569
570
async def begin_build_document_model(self, build_mode: Union[str, ModelBuildMode], **kwargs) -> AsyncDocumentModelAdministrationLROPoller[DocumentModelDetails]: ...
571
async def begin_compose_document_model(self, model_ids: List[str], **kwargs) -> AsyncDocumentModelAdministrationLROPoller[DocumentModelDetails]: ...
572
async def get_copy_authorization(self, **kwargs) -> TargetAuthorization: ...
573
async def begin_copy_document_model_to(self, model_id: str, target: TargetAuthorization, **kwargs) -> AsyncDocumentModelAdministrationLROPoller[DocumentModelDetails]: ...
574
async def delete_document_model(self, model_id: str, **kwargs) -> None: ...
575
async def list_document_models(self, **kwargs) -> AsyncItemPaged[DocumentModelSummary]: ...
576
async def get_resource_details(self, **kwargs) -> ResourceDetails: ...
577
async def get_document_model(self, model_id: str, **kwargs) -> DocumentModelDetails: ...
578
async def list_operations(self, **kwargs) -> AsyncItemPaged[OperationSummary]: ...
579
async def get_operation(self, operation_id: str, **kwargs) -> OperationDetails: ...
580
async def begin_build_document_classifier(self, **kwargs) -> AsyncDocumentModelAdministrationLROPoller[DocumentClassifierDetails]: ...
581
async def get_document_classifier(self, classifier_id: str, **kwargs) -> DocumentClassifierDetails: ...
582
async def list_document_classifiers(self, **kwargs) -> AsyncItemPaged[DocumentClassifierDetails]: ...
583
async def delete_document_classifier(self, classifier_id: str, **kwargs) -> None: ...
584
585
def get_document_analysis_client(self, **kwargs) -> AsyncDocumentAnalysisClient:
586
"""
587
Get AsyncDocumentAnalysisClient using same configuration.
588
589
Returns:
590
AsyncDocumentAnalysisClient instance
591
"""
592
593
async def close(self) -> None:
594
"""Close client and release resources."""
595
```
596
597
## Training Data Requirements
598
599
### Blob Storage Structure
600
601
```
602
container/
603
├── training/
604
│ ├── document1.pdf
605
│ ├── document2.pdf
606
│ ├── document3.pdf
607
│ └── ...
608
└── labels/ # For supervised training
609
├── document1.pdf.labels.json
610
├── document2.pdf.labels.json
611
└── ...
612
```
613
614
### Label Format (Legacy API)
615
616
```json
617
{
618
"document": "document1.pdf",
619
"labels": [
620
{
621
"label": "VendorName",
622
"key": null,
623
"value": [
624
{
625
"page": 1,
626
"text": "Contoso Inc",
627
"boundingBoxes": [
628
[100, 200, 300, 200, 300, 250, 100, 250]
629
]
630
}
631
]
632
}
633
]
634
}
635
```
636
637
### Modern API Training Data
638
639
```json
640
{
641
"fields": {
642
"VendorName": {
643
"type": "string",
644
"valueString": "Contoso Inc"
645
},
646
"InvoiceTotal": {
647
"type": "number",
648
"valueNumber": 1234.56
649
}
650
},
651
"boundingRegions": [
652
{
653
"pageNumber": 1,
654
"polygon": [100, 200, 300, 200, 300, 250, 100, 250]
655
}
656
]
657
}
658
```
659
660
## Error Handling
661
662
```python { .api }
663
from azure.ai.formrecognizer import FormRecognizerError, DocumentAnalysisError
664
665
# Legacy API errors
666
try:
667
poller = training_client.begin_training(training_url, True)
668
model = poller.result()
669
except FormRecognizerError as e:
670
print(f"Training failed: {e.error_code} - {e.message}")
671
672
# Modern API errors
673
try:
674
poller = admin_client.begin_build_document_model(ModelBuildMode.NEURAL, blob_container_url=training_url)
675
model = poller.result()
676
except DocumentAnalysisError as e:
677
print(f"Model building failed: {e.code} - {e.message}")
678
if e.innererror:
679
print(f"Inner error: {e.innererror.code}")
680
```