0
# Document Analysis Operations
1
2
Core document processing functionality for analyzing single documents, processing batches, and classifying documents. These operations support both prebuilt models (layout, invoice, receipt, etc.) and custom models with advanced features like high-resolution OCR, language detection, and structured data extraction.
3
4
## Capabilities
5
6
### Single Document Analysis
7
8
Analyzes individual documents using specified models to extract text, tables, key-value pairs, and structured data. Returns enhanced LRO poller with operation metadata.
9
10
```python { .api }
11
def begin_analyze_document(
12
model_id: str,
13
body: Union[AnalyzeDocumentRequest, JSON, IO[bytes]],
14
*,
15
pages: Optional[str] = None,
16
locale: Optional[str] = None,
17
string_index_type: Optional[Union[str, StringIndexType]] = None,
18
features: Optional[List[Union[str, DocumentAnalysisFeature]]] = None,
19
query_fields: Optional[List[str]] = None,
20
output_content_format: Optional[Union[str, DocumentContentFormat]] = None,
21
output: Optional[List[Union[str, AnalyzeOutputOption]]] = None,
22
**kwargs: Any
23
) -> AnalyzeDocumentLROPoller[AnalyzeResult]:
24
"""
25
Analyzes document with the specified model.
26
27
Parameters:
28
- model_id (str): Model ID for analysis (e.g., "prebuilt-layout", "prebuilt-invoice")
29
- body: Document data as AnalyzeDocumentRequest, JSON dict, or file bytes
30
- pages (str, optional): Page range specification (e.g., "1-3,5")
31
- locale (str, optional): Locale hint for better recognition
32
- string_index_type (StringIndexType, optional): Character indexing scheme
33
- features (List[DocumentAnalysisFeature], optional): Additional features to enable
34
- query_fields (List[str], optional): Custom field extraction queries
35
- output_content_format (DocumentContentFormat, optional): Content format (text/markdown)
36
- output (List[AnalyzeOutputOption], optional): Additional outputs (pdf/figures)
37
38
Returns:
39
AnalyzeDocumentLROPoller[AnalyzeResult]: Enhanced poller with operation metadata
40
"""
41
```
42
43
Usage example:
44
45
```python
46
# Analyze with file upload
47
with open("document.pdf", "rb") as f:
48
poller = client.begin_analyze_document(
49
model_id="prebuilt-layout",
50
body=f,
51
features=["languages", "barcodes"],
52
output_content_format="markdown"
53
)
54
result = poller.result()
55
56
# Access operation metadata
57
operation_id = poller.details["operation_id"]
58
59
# Analyze with custom fields
60
with open("invoice.pdf", "rb") as f:
61
poller = client.begin_analyze_document(
62
"prebuilt-invoice",
63
f,
64
query_fields=["Tax ID", "Purchase Order"]
65
)
66
result = poller.result()
67
```
68
69
### Batch Document Analysis
70
71
Processes multiple documents in a single operation for efficient bulk processing. Supports Azure Blob Storage as document source with flexible file selection.
72
73
```python { .api }
74
def begin_analyze_batch_documents(
75
model_id: str,
76
body: Union[AnalyzeBatchDocumentsRequest, JSON, IO[bytes]],
77
**kwargs: Any
78
) -> LROPoller[AnalyzeBatchResult]:
79
"""
80
Analyzes multiple documents in batch.
81
82
Parameters:
83
- model_id (str): Model ID for batch analysis
84
- body: Batch request with Azure Blob source configuration
85
86
Returns:
87
LROPoller[AnalyzeBatchResult]: Batch operation poller
88
"""
89
```
90
91
### Batch Results Management
92
93
Retrieves and manages batch processing results with support for listing operations and accessing individual results.
94
95
```python { .api }
96
def list_analyze_batch_results(
97
model_id: str,
98
*,
99
skip: Optional[int] = None,
100
top: Optional[int] = None,
101
**kwargs: Any
102
) -> Iterable[AnalyzeBatchOperation]:
103
"""
104
Lists batch analysis operations for the specified model.
105
106
Parameters:
107
- model_id (str): Model ID to filter operations
108
- skip (int, optional): Number of operations to skip
109
- top (int, optional): Maximum operations to return
110
111
Returns:
112
Iterable[AnalyzeBatchOperation]: Paginated batch operations
113
"""
114
115
def get_analyze_batch_result(
116
continuation_token: str,
117
**kwargs: Any
118
) -> LROPoller[AnalyzeBatchResult]:
119
"""
120
Continues batch analysis operation from continuation token.
121
122
Parameters:
123
- continuation_token (str): Continuation token for resuming batch operation
124
125
Returns:
126
LROPoller[AnalyzeBatchResult]: Batch operation poller
127
"""
128
129
def delete_analyze_batch_result(
130
model_id: str,
131
result_id: str,
132
**kwargs: Any
133
) -> None:
134
"""
135
Deletes batch analysis result.
136
137
Parameters:
138
- model_id (str): Model ID used for analysis
139
- result_id (str): Batch operation result ID to delete
140
"""
141
```
142
143
### Document Classification
144
145
Classifies documents using trained classifiers to automatically determine document types and route processing workflows.
146
147
```python { .api }
148
def begin_classify_document(
149
classifier_id: str,
150
body: Union[ClassifyDocumentRequest, JSON, IO[bytes]],
151
*,
152
string_index_type: Optional[Union[str, StringIndexType]] = None,
153
split_mode: Optional[Union[str, SplitMode]] = None,
154
pages: Optional[str] = None,
155
**kwargs: Any
156
) -> LROPoller[AnalyzeResult]:
157
"""
158
Classifies document using specified classifier.
159
160
Parameters:
161
- classifier_id (str): Document classifier ID
162
- body: Document data as ClassifyDocumentRequest, JSON dict, or file bytes
163
- string_index_type (StringIndexType, optional): Character indexing scheme
164
- split_mode (SplitMode, optional): Document splitting behavior
165
- pages (str, optional): Page range specification
166
167
Returns:
168
LROPoller[AnalyzeResult]: Classification result poller
169
"""
170
```
171
172
### Analysis Result Retrieval
173
174
Retrieves analysis outputs in various formats including searchable PDFs and extracted figure images.
175
176
```python { .api }
177
def get_analyze_result_pdf(
178
model_id: str,
179
result_id: str,
180
**kwargs: Any
181
) -> Iterator[bytes]:
182
"""
183
Gets analysis result as searchable PDF.
184
185
Parameters:
186
- model_id (str): Model ID used for analysis
187
- result_id (str): Analysis result ID
188
189
Returns:
190
Iterator[bytes]: PDF content stream
191
"""
192
193
def get_analyze_result_figure(
194
model_id: str,
195
result_id: str,
196
figure_id: str,
197
**kwargs: Any
198
) -> Iterator[bytes]:
199
"""
200
Gets extracted figure as image.
201
202
Parameters:
203
- model_id (str): Model ID used for analysis
204
- result_id (str): Analysis result ID
205
- figure_id (str): Figure identifier
206
207
Returns:
208
Iterator[bytes]: Image content stream
209
"""
210
211
def delete_analyze_result(
212
model_id: str,
213
result_id: str,
214
**kwargs: Any
215
) -> None:
216
"""
217
Deletes analysis result.
218
219
Parameters:
220
- model_id (str): Model ID used for analysis
221
- result_id (str): Analysis result ID to delete
222
"""
223
```
224
225
## Request Types
226
227
```python { .api }
228
class AnalyzeDocumentRequest:
229
"""Request for single document analysis."""
230
url_source: Optional[str]
231
base64_source: Optional[str]
232
pages: Optional[str]
233
locale: Optional[str]
234
string_index_type: Optional[StringIndexType]
235
features: Optional[List[DocumentAnalysisFeature]]
236
query_fields: Optional[List[str]]
237
output_content_format: Optional[DocumentContentFormat]
238
output: Optional[List[AnalyzeOutputOption]]
239
240
class AnalyzeBatchDocumentsRequest:
241
"""Request for batch document analysis."""
242
azure_blob_source: Optional[AzureBlobContentSource]
243
azure_blob_file_list_source: Optional[AzureBlobFileListContentSource]
244
result_container_url: str
245
result_prefix: Optional[str]
246
overwrite_existing: Optional[bool]
247
pages: Optional[str]
248
locale: Optional[str]
249
string_index_type: Optional[StringIndexType]
250
features: Optional[List[DocumentAnalysisFeature]]
251
query_fields: Optional[List[str]]
252
output_content_format: Optional[DocumentContentFormat]
253
output: Optional[List[AnalyzeOutputOption]]
254
255
class ClassifyDocumentRequest:
256
"""Request for document classification."""
257
url_source: Optional[str]
258
base64_source: Optional[str]
259
pages: Optional[str]
260
string_index_type: Optional[StringIndexType]
261
split_mode: Optional[SplitMode]
262
```
263
264
## Response Types
265
266
```python { .api }
267
class AnalyzeResult:
268
"""Main analysis result containing extracted content and metadata."""
269
api_version: Optional[str]
270
model_id: str
271
string_index_type: Optional[StringIndexType]
272
content: Optional[str]
273
pages: Optional[List[DocumentPage]]
274
paragraphs: Optional[List[DocumentParagraph]]
275
tables: Optional[List[DocumentTable]]
276
figures: Optional[List[DocumentFigure]]
277
sections: Optional[List[DocumentSection]]
278
key_value_pairs: Optional[List[DocumentKeyValuePair]]
279
styles: Optional[List[DocumentStyle]]
280
languages: Optional[List[DocumentLanguage]]
281
documents: Optional[List[AnalyzedDocument]]
282
warnings: Optional[List[DocumentIntelligenceWarning]]
283
284
class AnalyzeBatchResult:
285
"""Results from batch document analysis."""
286
succeeded_count: int
287
failed_count: int
288
skipped_count: int
289
details: List[AnalyzeBatchOperationDetail]
290
291
class AnalyzeBatchOperation:
292
"""Batch operation metadata and status."""
293
operation_id: str
294
status: DocumentIntelligenceOperationStatus
295
created_date_time: datetime
296
last_updated_date_time: datetime
297
percent_completed: Optional[int]
298
result: Optional[AnalyzeBatchResult]
299
error: Optional[DocumentIntelligenceError]
300
```
301
302
## Enhanced LRO Poller
303
304
```python { .api }
305
class AnalyzeDocumentLROPoller(LROPoller[AnalyzeResult]):
306
"""Enhanced poller for document analysis operations."""
307
308
@property
309
def details(self) -> Dict[str, Any]:
310
"""
311
Returns operation metadata including operation_id.
312
313
Returns:
314
Dict containing operation_id extracted from Operation-Location header
315
"""
316
317
@classmethod
318
def from_continuation_token(
319
cls,
320
polling_method: PollingMethod,
321
continuation_token: str,
322
**kwargs: Any
323
) -> "AnalyzeDocumentLROPoller[AnalyzeResult]":
324
"""Resume operation from continuation token."""
325
```
326
327
## Client Utility Methods
328
329
```python { .api }
330
def send_request(
331
request: HttpRequest,
332
*,
333
stream: bool = False,
334
**kwargs: Any
335
) -> HttpResponse:
336
"""
337
Sends custom HTTP request using the client's pipeline.
338
339
Parameters:
340
- request (HttpRequest): HTTP request to send
341
- stream (bool): Whether to stream the response
342
343
Returns:
344
HttpResponse: Raw HTTP response
345
"""
346
347
def close() -> None:
348
"""Close the client and release resources."""
349
```