or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

async-clients.mdclassifier-management.mddocument-analysis.mdindex.mdmodel-management.mdmodels-and-types.md
tile.json

tessl/pypi-azure-ai-documentintelligence

Azure AI Document Intelligence client library for Python - a cloud service that uses machine learning to analyze text and structured data from documents

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/azure-ai-documentintelligence@1.0.x

To install, run

npx @tessl/cli install tessl/pypi-azure-ai-documentintelligence@1.0.0

index.mddocs/

Azure AI Document Intelligence

A comprehensive Python client library for Azure AI Document Intelligence service, enabling document analysis, custom model management, and document classification through machine learning. The service extracts text, key-value pairs, tables, structures, and custom fields from documents across various formats including PDFs, images, and Office documents.

Package Information

  • Package Name: azure-ai-documentintelligence
  • Package Type: pypi
  • Language: Python
  • Installation: pip install azure-ai-documentintelligence
  • Version: 1.0.2
  • API Version: 2024-11-30

Core Imports

from azure.ai.documentintelligence import (
    DocumentIntelligenceClient,
    DocumentIntelligenceAdministrationClient,
    AnalyzeDocumentLROPoller
)

Async clients:

from azure.ai.documentintelligence.aio import (
    DocumentIntelligenceClient,
    DocumentIntelligenceAdministrationClient
)

Authentication:

from azure.core.credentials import AzureKeyCredential, TokenCredential

Basic Usage

from azure.ai.documentintelligence import DocumentIntelligenceClient
from azure.core.credentials import AzureKeyCredential

# Initialize client with endpoint and API key
client = DocumentIntelligenceClient(
    endpoint="https://your-resource.cognitiveservices.azure.com/",
    credential=AzureKeyCredential("your-api-key")
)

# Analyze a document with prebuilt layout model
with open("invoice.pdf", "rb") as document:
    poller = client.begin_analyze_document("prebuilt-layout", document)
    result = poller.result()
    
    # Access extracted content
    print(f"Content: {result.content}")
    
    # Access extracted tables
    for table in result.tables or []:
        print(f"Table with {table.row_count} rows and {table.column_count} columns")
        for cell in table.cells:
            print(f"Cell [{cell.row_index}][{cell.column_index}]: {cell.content}")

# Build custom model (administration client)
from azure.ai.documentintelligence import DocumentIntelligenceAdministrationClient
from azure.ai.documentintelligence.models import BuildDocumentModelRequest, AzureBlobContentSource

admin_client = DocumentIntelligenceAdministrationClient(
    endpoint="https://your-resource.cognitiveservices.azure.com/",
    credential=AzureKeyCredential("your-api-key")
)

# Build a custom model
build_request = BuildDocumentModelRequest(
    model_id="my-custom-model",
    build_mode="neural",
    training_data_source=AzureBlobContentSource(
        container_url="https://account.blob.core.windows.net/container"
    )
)

poller = admin_client.begin_build_document_model(build_request)
model = poller.result()
print(f"Model built: {model.model_id}")

Architecture

The Azure AI Document Intelligence SDK is organized around several key components:

  • DocumentIntelligenceClient: Main client for document analysis operations including single document analysis, batch processing, and document classification
  • DocumentIntelligenceAdministrationClient: Management client for custom models, classifiers, and service operations
  • Async Clients: Full async/await support through aio module with identical functionality
  • Custom LRO Poller: Enhanced AnalyzeDocumentLROPoller with operation metadata access
  • Rich Type System: Comprehensive models for analysis results, document structures, and configuration options

Both clients support multiple authentication methods (API key and Azure Active Directory) and provide extensive customization options for document processing features.

Capabilities

Document Analysis Operations

Core document processing functionality including single document analysis, batch operations, result retrieval, and resource management. Supports prebuilt models and custom models with advanced features like high-resolution OCR, language detection, and structured data extraction.

def begin_analyze_document(
    model_id: str, 
    body: Union[AnalyzeDocumentRequest, JSON, IO[bytes]], 
    **kwargs
) -> AnalyzeDocumentLROPoller[AnalyzeResult]: ...

def begin_analyze_batch_documents(
    model_id: str,
    body: Union[AnalyzeBatchDocumentsRequest, JSON, IO[bytes]],
    **kwargs
) -> LROPoller[AnalyzeBatchResult]: ...

def begin_classify_document(
    classifier_id: str,
    body: Union[ClassifyDocumentRequest, JSON, IO[bytes]],
    **kwargs
) -> LROPoller[AnalyzeResult]: ...

def get_analyze_result_pdf(
    model_id: str, result_id: str, **kwargs
) -> Iterator[bytes]: ...

def get_analyze_result_figure(
    model_id: str, result_id: str, figure_id: str, **kwargs
) -> Iterator[bytes]: ...

Document Analysis Operations

Model Management Operations

Custom model lifecycle management including building, composing, copying, and managing document models. Supports both template and neural training modes with comprehensive model metadata, operation tracking, and resource management.

def begin_build_document_model(
    body: Union[BuildDocumentModelRequest, JSON, IO[bytes]],
    **kwargs
) -> LROPoller[DocumentModelDetails]: ...

def begin_compose_model(
    body: Union[ComposeDocumentModelRequest, JSON, IO[bytes]],
    **kwargs
) -> LROPoller[DocumentModelDetails]: ...

def begin_copy_model_to(
    model_id: str,
    body: Union[ModelCopyAuthorization, JSON, IO[bytes]],
    **kwargs
) -> LROPoller[DocumentModelDetails]: ...

def authorize_model_copy(
    body: Union[AuthorizeCopyRequest, JSON, IO[bytes]],
    **kwargs
) -> ModelCopyAuthorization: ...

def get_resource_details(**kwargs) -> DocumentIntelligenceResourceDetails: ...

def list_operations(**kwargs) -> Iterable[DocumentIntelligenceOperationDetails]: ...

Model Management Operations

Classifier Management Operations

Document classifier lifecycle management for automated document type classification. Includes building, copying, and managing custom classifiers with support for multi-class document routing and comprehensive classifier management.

def begin_build_classifier(
    body: Union[BuildDocumentClassifierRequest, JSON, IO[bytes]],
    **kwargs
) -> LROPoller[DocumentClassifierDetails]: ...

def begin_copy_classifier_to(
    classifier_id: str,
    body: Union[ClassifierCopyAuthorization, JSON, IO[bytes]],
    **kwargs
) -> LROPoller[DocumentClassifierDetails]: ...

def authorize_classifier_copy(
    body: Union[AuthorizeClassifierCopyRequest, JSON, IO[bytes]],
    **kwargs
) -> ClassifierCopyAuthorization: ...

def get_classifier(classifier_id: str, **kwargs) -> DocumentClassifierDetails: ...

def list_classifiers(**kwargs) -> Iterable[DocumentClassifierDetails]: ...

Classifier Management Operations

Async Client Implementations

Full asynchronous implementations of both DocumentIntelligenceClient and DocumentIntelligenceAdministrationClient with identical functionality and enhanced performance for concurrent operations.

async def begin_analyze_document(
    model_id: str,
    body: Union[AnalyzeDocumentRequest, JSON, IO[bytes]],
    **kwargs
) -> AnalyzeDocumentLROPoller[AnalyzeResult]: ...

async def begin_build_document_model(
    body: Union[BuildDocumentModelRequest, JSON, IO[bytes]],
    **kwargs
) -> LROPoller[DocumentModelDetails]: ...

Async Client Implementations

Models and Type Definitions

Comprehensive data models, enums, and type definitions covering analysis results, document structures, configuration options, and service responses. Includes 57 model classes and 19 enums providing complete type safety.

class AnalyzeResult:
    api_version: Optional[str]
    model_id: str
    content: Optional[str]
    pages: Optional[List[DocumentPage]]
    tables: Optional[List[DocumentTable]]
    documents: Optional[List[AnalyzedDocument]]
    # ... additional properties

class DocumentField:
    type: Optional[DocumentFieldType]
    content: Optional[str]
    confidence: Optional[float]
    # ... type-specific value properties

Models and Type Definitions