or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

async-clients.mdclassifier-management.mddocument-analysis.mdindex.mdmodel-management.mdmodels-and-types.md

index.mddocs/

0

# Azure AI Document Intelligence

1

2

A comprehensive Python client library for Azure AI Document Intelligence service, enabling document analysis, custom model management, and document classification through machine learning. The service extracts text, key-value pairs, tables, structures, and custom fields from documents across various formats including PDFs, images, and Office documents.

3

4

## Package Information

5

6

- **Package Name**: azure-ai-documentintelligence

7

- **Package Type**: pypi

8

- **Language**: Python

9

- **Installation**: `pip install azure-ai-documentintelligence`

10

- **Version**: 1.0.2

11

- **API Version**: 2024-11-30

12

13

## Core Imports

14

15

```python

16

from azure.ai.documentintelligence import (

17

DocumentIntelligenceClient,

18

DocumentIntelligenceAdministrationClient,

19

AnalyzeDocumentLROPoller

20

)

21

```

22

23

Async clients:

24

25

```python

26

from azure.ai.documentintelligence.aio import (

27

DocumentIntelligenceClient,

28

DocumentIntelligenceAdministrationClient

29

)

30

```

31

32

Authentication:

33

34

```python

35

from azure.core.credentials import AzureKeyCredential, TokenCredential

36

```

37

38

## Basic Usage

39

40

```python

41

from azure.ai.documentintelligence import DocumentIntelligenceClient

42

from azure.core.credentials import AzureKeyCredential

43

44

# Initialize client with endpoint and API key

45

client = DocumentIntelligenceClient(

46

endpoint="https://your-resource.cognitiveservices.azure.com/",

47

credential=AzureKeyCredential("your-api-key")

48

)

49

50

# Analyze a document with prebuilt layout model

51

with open("invoice.pdf", "rb") as document:

52

poller = client.begin_analyze_document("prebuilt-layout", document)

53

result = poller.result()

54

55

# Access extracted content

56

print(f"Content: {result.content}")

57

58

# Access extracted tables

59

for table in result.tables or []:

60

print(f"Table with {table.row_count} rows and {table.column_count} columns")

61

for cell in table.cells:

62

print(f"Cell [{cell.row_index}][{cell.column_index}]: {cell.content}")

63

64

# Build custom model (administration client)

65

from azure.ai.documentintelligence import DocumentIntelligenceAdministrationClient

66

from azure.ai.documentintelligence.models import BuildDocumentModelRequest, AzureBlobContentSource

67

68

admin_client = DocumentIntelligenceAdministrationClient(

69

endpoint="https://your-resource.cognitiveservices.azure.com/",

70

credential=AzureKeyCredential("your-api-key")

71

)

72

73

# Build a custom model

74

build_request = BuildDocumentModelRequest(

75

model_id="my-custom-model",

76

build_mode="neural",

77

training_data_source=AzureBlobContentSource(

78

container_url="https://account.blob.core.windows.net/container"

79

)

80

)

81

82

poller = admin_client.begin_build_document_model(build_request)

83

model = poller.result()

84

print(f"Model built: {model.model_id}")

85

```

86

87

## Architecture

88

89

The Azure AI Document Intelligence SDK is organized around several key components:

90

91

- **DocumentIntelligenceClient**: Main client for document analysis operations including single document analysis, batch processing, and document classification

92

- **DocumentIntelligenceAdministrationClient**: Management client for custom models, classifiers, and service operations

93

- **Async Clients**: Full async/await support through `aio` module with identical functionality

94

- **Custom LRO Poller**: Enhanced `AnalyzeDocumentLROPoller` with operation metadata access

95

- **Rich Type System**: Comprehensive models for analysis results, document structures, and configuration options

96

97

Both clients support multiple authentication methods (API key and Azure Active Directory) and provide extensive customization options for document processing features.

98

99

## Capabilities

100

101

### Document Analysis Operations

102

103

Core document processing functionality including single document analysis, batch operations, result retrieval, and resource management. Supports prebuilt models and custom models with advanced features like high-resolution OCR, language detection, and structured data extraction.

104

105

```python { .api }

106

def begin_analyze_document(

107

model_id: str,

108

body: Union[AnalyzeDocumentRequest, JSON, IO[bytes]],

109

**kwargs

110

) -> AnalyzeDocumentLROPoller[AnalyzeResult]: ...

111

112

def begin_analyze_batch_documents(

113

model_id: str,

114

body: Union[AnalyzeBatchDocumentsRequest, JSON, IO[bytes]],

115

**kwargs

116

) -> LROPoller[AnalyzeBatchResult]: ...

117

118

def begin_classify_document(

119

classifier_id: str,

120

body: Union[ClassifyDocumentRequest, JSON, IO[bytes]],

121

**kwargs

122

) -> LROPoller[AnalyzeResult]: ...

123

124

def get_analyze_result_pdf(

125

model_id: str, result_id: str, **kwargs

126

) -> Iterator[bytes]: ...

127

128

def get_analyze_result_figure(

129

model_id: str, result_id: str, figure_id: str, **kwargs

130

) -> Iterator[bytes]: ...

131

```

132

133

[Document Analysis Operations](./document-analysis.md)

134

135

### Model Management Operations

136

137

Custom model lifecycle management including building, composing, copying, and managing document models. Supports both template and neural training modes with comprehensive model metadata, operation tracking, and resource management.

138

139

```python { .api }

140

def begin_build_document_model(

141

body: Union[BuildDocumentModelRequest, JSON, IO[bytes]],

142

**kwargs

143

) -> LROPoller[DocumentModelDetails]: ...

144

145

def begin_compose_model(

146

body: Union[ComposeDocumentModelRequest, JSON, IO[bytes]],

147

**kwargs

148

) -> LROPoller[DocumentModelDetails]: ...

149

150

def begin_copy_model_to(

151

model_id: str,

152

body: Union[ModelCopyAuthorization, JSON, IO[bytes]],

153

**kwargs

154

) -> LROPoller[DocumentModelDetails]: ...

155

156

def authorize_model_copy(

157

body: Union[AuthorizeCopyRequest, JSON, IO[bytes]],

158

**kwargs

159

) -> ModelCopyAuthorization: ...

160

161

def get_resource_details(**kwargs) -> DocumentIntelligenceResourceDetails: ...

162

163

def list_operations(**kwargs) -> Iterable[DocumentIntelligenceOperationDetails]: ...

164

```

165

166

[Model Management Operations](./model-management.md)

167

168

### Classifier Management Operations

169

170

Document classifier lifecycle management for automated document type classification. Includes building, copying, and managing custom classifiers with support for multi-class document routing and comprehensive classifier management.

171

172

```python { .api }

173

def begin_build_classifier(

174

body: Union[BuildDocumentClassifierRequest, JSON, IO[bytes]],

175

**kwargs

176

) -> LROPoller[DocumentClassifierDetails]: ...

177

178

def begin_copy_classifier_to(

179

classifier_id: str,

180

body: Union[ClassifierCopyAuthorization, JSON, IO[bytes]],

181

**kwargs

182

) -> LROPoller[DocumentClassifierDetails]: ...

183

184

def authorize_classifier_copy(

185

body: Union[AuthorizeClassifierCopyRequest, JSON, IO[bytes]],

186

**kwargs

187

) -> ClassifierCopyAuthorization: ...

188

189

def get_classifier(classifier_id: str, **kwargs) -> DocumentClassifierDetails: ...

190

191

def list_classifiers(**kwargs) -> Iterable[DocumentClassifierDetails]: ...

192

```

193

194

[Classifier Management Operations](./classifier-management.md)

195

196

### Async Client Implementations

197

198

Full asynchronous implementations of both DocumentIntelligenceClient and DocumentIntelligenceAdministrationClient with identical functionality and enhanced performance for concurrent operations.

199

200

```python { .api }

201

async def begin_analyze_document(

202

model_id: str,

203

body: Union[AnalyzeDocumentRequest, JSON, IO[bytes]],

204

**kwargs

205

) -> AnalyzeDocumentLROPoller[AnalyzeResult]: ...

206

207

async def begin_build_document_model(

208

body: Union[BuildDocumentModelRequest, JSON, IO[bytes]],

209

**kwargs

210

) -> LROPoller[DocumentModelDetails]: ...

211

```

212

213

[Async Client Implementations](./async-clients.md)

214

215

### Models and Type Definitions

216

217

Comprehensive data models, enums, and type definitions covering analysis results, document structures, configuration options, and service responses. Includes 57 model classes and 19 enums providing complete type safety.

218

219

```python { .api }

220

class AnalyzeResult:

221

api_version: Optional[str]

222

model_id: str

223

content: Optional[str]

224

pages: Optional[List[DocumentPage]]

225

tables: Optional[List[DocumentTable]]

226

documents: Optional[List[AnalyzedDocument]]

227

# ... additional properties

228

229

class DocumentField:

230

type: Optional[DocumentFieldType]

231

content: Optional[str]

232

confidence: Optional[float]

233

# ... type-specific value properties

234

```

235

236

[Models and Type Definitions](./models-and-types.md)