or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

batch-operations.mdbeta-features.mddocument-processing.mddocument-types.mdindex.mdprocessor-management.md
tile.json

tessl/pypi-google-cloud-documentai

Google Cloud Document AI client library for extracting structured information from documents using machine learning

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/google-cloud-documentai@3.6.x

To install, run

npx @tessl/cli install tessl/pypi-google-cloud-documentai@3.6.0

index.mddocs/

Google Cloud Document AI

Google Cloud Document AI is a machine learning service that extracts structured data from documents using pre-trained and custom document processors. The service can process various document types including invoices, receipts, forms, contracts, and other business documents.

Package Information

Package Name: google-cloud-documentai
Version: 3.6.0
Documentation: Google Cloud Document AI Documentation

Installation

pip install google-cloud-documentai

Authentication

This package requires Google Cloud authentication. Set up authentication using one of these methods:

  1. Application Default Credentials (Recommended):

    gcloud auth application-default login
  2. Service Account Key:

    export GOOGLE_APPLICATION_CREDENTIALS="path/to/service-account-key.json"
  3. Environment Variables:

    export GOOGLE_CLOUD_PROJECT="your-project-id"

Core Imports

# Main module - exports v1 (stable) API
from google.cloud.documentai import DocumentProcessorServiceClient
from google.cloud.documentai import Document, ProcessRequest, ProcessResponse

# Alternative import pattern
from google.cloud import documentai

# For async operations
from google.cloud.documentai import DocumentProcessorServiceAsyncClient

# Core types for document processing
from google.cloud.documentai.types import (
    RawDocument,
    GcsDocument,
    Processor,
    ProcessorType,
    BoundingPoly,
    Vertex
)

Basic Usage Example

from google.cloud.documentai import DocumentProcessorServiceClient
from google.cloud.documentai.types import RawDocument, ProcessRequest

def process_document(project_id: str, location: str, processor_id: str, file_path: str, mime_type: str):
    """
    Process a document using Google Cloud Document AI.
    
    Args:
        project_id: Google Cloud project ID
        location: Processor location (e.g., 'us' or 'eu')  
        processor_id: ID of the document processor to use
        file_path: Path to the document file
        mime_type: MIME type of the document (e.g., 'application/pdf')
    
    Returns:
        Document: Processed document with extracted data
    """
    # Initialize the client
    client = DocumentProcessorServiceClient()
    
    # The full resource name of the processor
    name = client.processor_path(project_id, location, processor_id)
    
    # Read the document file
    with open(file_path, "rb") as document:
        document_content = document.read()
    
    # Create raw document
    raw_document = RawDocument(content=document_content, mime_type=mime_type)
    
    # Configure the process request
    request = ProcessRequest(name=name, raw_document=raw_document)
    
    # Process the document
    result = client.process_document(request=request)
    
    # Access processed document
    document = result.document
    
    print(f"Document text: {document.text}")
    print(f"Number of pages: {len(document.pages)}")
    
    # Extract entities
    for entity in document.entities:
        print(f"Entity: {entity.type_} = {entity.mention_text}")
    
    return document

# Example usage
document = process_document(
    project_id="my-project",
    location="us",
    processor_id="abc123def456",
    file_path="invoice.pdf",
    mime_type="application/pdf"
)

Architecture

Document Processing Workflow

Google Cloud Document AI follows this processing workflow:

  1. Document Input: Raw documents (PDF, images) or Cloud Storage references
  2. Processor Selection: Choose appropriate pre-trained or custom processor
  3. Processing: AI models extract text, layout, and structured data
  4. Output: Structured document with text, entities, tables, and metadata

Key Concepts

Processors

Processors are AI models that extract data from specific document types:

  • Pre-trained processors: Ready-to-use for common documents (invoices, receipts, forms)
  • Custom processors: Trained on your specific document types
  • Processor versions: Different iterations of a processor with varying capabilities

Documents

The Document type represents processed documents with:

  • Text: Extracted text content with character-level positioning
  • Pages: Individual pages with layout elements (blocks, paragraphs, lines, tokens)
  • Entities: Extracted structured data (names, dates, amounts, addresses)
  • Tables: Detected tables with cell-level data
  • Form fields: Key-value pairs from forms

Locations

Processors are deployed in specific regions:

  • us: United States (Iowa)
  • eu: Europe (Belgium)
  • Custom locations for enterprise customers

Capabilities

Document Processing Operations

Core functionality for processing individual and batch documents.

# Process single document
from google.cloud.documentai import DocumentProcessorServiceClient
from google.cloud.documentai.types import ProcessRequest

client = DocumentProcessorServiceClient()
request = ProcessRequest(name="projects/my-project/locations/us/processors/abc123")
result = client.process_document(request=request)

→ Document Processing Operations

Processor Management

Manage processor lifecycle including creation, deployment, and training.

# List available processors
from google.cloud.documentai import DocumentProcessorServiceClient
from google.cloud.documentai.types import ListProcessorsRequest

client = DocumentProcessorServiceClient()
request = ListProcessorsRequest(parent="projects/my-project/locations/us")
response = client.list_processors(request=request)

for processor in response.processors:
    print(f"Processor: {processor.display_name} ({processor.name})")

→ Processor Management

Document Types and Schemas

Work with document structures, entities, and type definitions.

# Access document structure
from google.cloud.documentai.types import Document

def analyze_document_structure(document: Document):
    """Analyze the structure of a processed document."""
    print(f"Total text length: {len(document.text)}")
    
    # Analyze pages
    for i, page in enumerate(document.pages):
        print(f"Page {i+1}: {len(page.blocks)} blocks, {len(page.paragraphs)} paragraphs")
    
    # Analyze entities by type
    entity_types = {}
    for entity in document.entities:
        entity_type = entity.type_
        if entity_type not in entity_types:
            entity_types[entity_type] = []
        entity_types[entity_type].append(entity.mention_text)
    
    for entity_type, mentions in entity_types.items():
        print(f"{entity_type}: {len(mentions)} instances")

→ Document Types and Schemas

Batch Operations

Process multiple documents asynchronously for high-volume workflows.

# Batch process documents
from google.cloud.documentai import DocumentProcessorServiceClient
from google.cloud.documentai.types import BatchProcessRequest, GcsDocuments

client = DocumentProcessorServiceClient()

# Configure batch request
gcs_documents = GcsDocuments(documents=[
    {"gcs_uri": "gs://my-bucket/doc1.pdf", "mime_type": "application/pdf"},
    {"gcs_uri": "gs://my-bucket/doc2.pdf", "mime_type": "application/pdf"}
])

request = BatchProcessRequest(
    name="projects/my-project/locations/us/processors/abc123",
    input_documents=gcs_documents,
    document_output_config={
        "gcs_output_config": {"gcs_uri": "gs://my-bucket/output/"}
    }
)

operation = client.batch_process_documents(request=request)

→ Batch Operations

Beta Features (v1beta3)

Access experimental features including dataset management and enhanced document processing.

# Beta features - DocumentService for dataset management
from google.cloud.documentai_v1beta3 import DocumentServiceClient
from google.cloud.documentai_v1beta3.types import Dataset

client = DocumentServiceClient()

# List documents in a dataset
request = {"parent": "projects/my-project/locations/us/processors/abc123/dataset"}
response = client.list_documents(request=request)

→ Beta Features

API Versions

V1 (Stable)

The main google.cloud.documentai module exports the stable v1 API:

  • Module: google.cloud.documentai
  • Direct access: google.cloud.documentai_v1
  • Status: Production ready
  • Features: Core document processing and processor management

V1beta3 (Beta)

Extended API with additional features:

  • Module: google.cloud.documentai_v1beta3
  • Status: Beta (subject to breaking changes)
  • Additional features: Dataset management, enhanced document operations, custom training

Error Handling

from google.cloud.documentai import DocumentProcessorServiceClient
from google.cloud.exceptions import GoogleCloudError
from google.api_core.exceptions import NotFound, InvalidArgument

client = DocumentProcessorServiceClient()

try:
    # Process document
    result = client.process_document(request=request)
except NotFound as e:
    print(f"Processor not found: {e}")
except InvalidArgument as e:
    print(f"Invalid request: {e}")
except GoogleCloudError as e:
    print(f"Google Cloud error: {e}")
except Exception as e:
    print(f"Unexpected error: {e}")

Resource Names

Google Cloud Document AI uses hierarchical resource names:

from google.cloud.documentai import DocumentProcessorServiceClient

client = DocumentProcessorServiceClient()

# Build resource names using helper methods
processor_path = client.processor_path("my-project", "us", "processor-id")
# Result: "projects/my-project/locations/us/processors/processor-id"

processor_version_path = client.processor_version_path(
    "my-project", "us", "processor-id", "version-id"
)
# Result: "projects/my-project/locations/us/processors/processor-id/processorVersions/version-id"

location_path = client.common_location_path("my-project", "us") 
# Result: "projects/my-project/locations/us"

Performance Considerations

  • Document Size: Individual documents up to 20MB, batch operations up to 1000 documents
  • Rate Limits: Varies by processor type and region
  • Async Processing: Use batch operations for high-volume processing
  • Caching: Consider caching processed results for frequently accessed documents
  • Regional Processing: Use the same region as your data for better performance

Next Steps