or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

common-utilities.mddata-transfers.mdfirebase.mdgcp-services.mdgoogle-ads.mdgoogle-workspace.mdindex.mdleveldb.mdmarketing-platform.md
tile.json

tessl/pypi-apache-airflow-providers-google

Provider package for Google services integration with Apache Airflow, including Google Ads, Google Cloud (GCP), Google Firebase, Google LevelDB, Google Marketing Platform, and Google Workspace

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/apache-airflow-providers-google@17.1.x

To install, run

npx @tessl/cli install tessl/pypi-apache-airflow-providers-google@17.1.0

index.mddocs/

Apache Airflow Google Provider

Apache Airflow Google Provider is a comprehensive package that enables integration between Apache Airflow and Google services ecosystem. It provides operators, hooks, sensors, and transfer tools for Google Ads, Google Cloud Platform services (BigQuery, Cloud Storage, Cloud Functions, Compute Engine, etc.), Google Firebase, Google LevelDB, Google Marketing Platform, and Google Workspace. The package offers a unified interface for data pipeline orchestration across Google's suite of products, supporting authentication through Google Cloud credentials, extensive configuration options, and built-in error handling and retry mechanisms.

Package Information

  • Package Name: apache-airflow-providers-google
  • Language: Python
  • Package Type: Apache Airflow Provider
  • Installation: pip install apache-airflow-providers-google
  • Minimum Airflow Version: 2.10.0+

Core Imports

All components follow standard Airflow provider import patterns:

# Hooks - Base connectivity to Google services
from airflow.providers.google.cloud.hooks.bigquery import BigQueryHook
from airflow.providers.google.cloud.hooks.gcs import GCSHook
from airflow.providers.google.ads.hooks.ads import GoogleAdsHook

# Operators - Task execution components
from airflow.providers.google.cloud.operators.bigquery import BigQueryCreateDatasetOperator
from airflow.providers.google.cloud.operators.gcs import GCSCreateBucketOperator

# Sensors - Condition monitoring components
from airflow.providers.google.cloud.sensors.gcs import GCSObjectExistenceSensor
from airflow.providers.google.cloud.sensors.bigquery import BigQueryTableExistenceSensor

# Transfers - Data movement components
from airflow.providers.google.cloud.transfers.gcs_to_bigquery import GCSToBigQueryOperator
from airflow.providers.google.cloud.transfers.s3_to_gcs import S3ToGCSOperator

Basic Usage

from datetime import datetime
from airflow import DAG
from airflow.providers.google.cloud.hooks.bigquery import BigQueryHook
from airflow.providers.google.cloud.operators.bigquery import BigQueryCreateDatasetOperator
from airflow.providers.google.cloud.sensors.gcs import GCSObjectExistenceSensor
from airflow.providers.google.cloud.transfers.gcs_to_bigquery import GCSToBigQueryOperator

# Define DAG
dag = DAG(
    'google_provider_example',
    default_args={'start_date': datetime(2023, 1, 1)},
    schedule_interval='@daily',
    catchup=False
)

# Create BigQuery dataset
create_dataset = BigQueryCreateDatasetOperator(
    task_id='create_dataset',
    dataset_id='example_dataset',
    project_id='my-gcp-project',
    gcp_conn_id='google_cloud_default',
    dag=dag
)

# Wait for file in GCS
wait_for_file = GCSObjectExistenceSensor(
    task_id='wait_for_file',
    bucket='my-bucket',
    object='data/input.csv',
    gcp_conn_id='google_cloud_default',
    dag=dag
)

# Load data from GCS to BigQuery
load_data = GCSToBigQueryOperator(
    task_id='load_data',
    bucket='my-bucket',
    source_objects=['data/input.csv'],
    destination_project_dataset_table='my-gcp-project.example_dataset.example_table',
    schema_fields=[
        {'name': 'id', 'type': 'INTEGER', 'mode': 'REQUIRED'},
        {'name': 'name', 'type': 'STRING', 'mode': 'NULLABLE'},
    ],
    write_disposition='WRITE_TRUNCATE',
    gcp_conn_id='google_cloud_default',
    dag=dag
)

# Set task dependencies
create_dataset >> wait_for_file >> load_data

Architecture

The provider follows Airflow's architecture patterns with specialized components:

  • Hooks: Low-level interfaces to Google services, handling authentication, connection management, and API calls
  • Operators: Task execution components that use hooks to perform specific operations (create resources, run jobs, etc.)
  • Sensors: Monitoring components that wait for specific conditions (file existence, job completion, etc.)
  • Transfers: Specialized operators for moving data between systems
  • Links: Console link generators for easy navigation to Google Cloud Console
  • Triggers: Async components for long-running operations in deferrable mode

Authentication

All components support multiple authentication methods:

  • Service Account JSON Key Files: Specified via key_path or keyfile_dict
  • Application Default Credentials (ADC): Automatic credential discovery
  • Service Account Impersonation: Cross-project access via impersonation_chain
  • OAuth Flows: For Google Ads and Marketing Platform services
# Using Service Account Key File
hook = BigQueryHook(
    gcp_conn_id='my_connection',
    key_path='/path/to/service-account.json'
)

# Using impersonation
hook = BigQueryHook(
    gcp_conn_id='my_connection',
    impersonation_chain='service-account@project.iam.gserviceaccount.com'
)

Capabilities

Google Cloud Platform Services

Comprehensive integration with Google Cloud Platform including BigQuery, Cloud Storage, Dataproc, Dataflow, Vertex AI, Cloud SQL, Pub/Sub, and 40+ other services. Provides complete CRUD operations, batch processing, real-time streaming, and machine learning capabilities.

# Key GCP hooks and operators
class BigQueryHook: ...
class GCSHook: ...
class DataprocHook: ...
class DataflowHook: ...
class VertexAIHook: ...

class BigQueryCreateDatasetOperator: ...
class GCSCreateBucketOperator: ...
class DataprocCreateClusterOperator: ...
class DataflowCreatePythonJobOperator: ...

Google Cloud Platform

Google Ads Integration

Google Ads API integration with OAuth authentication, account management, and reporting capabilities. Supports campaign data extraction and automated reporting workflows.

class GoogleAdsHook: ...
class GoogleAdsListAccountsOperator: ...
class GoogleAdsToGcsOperator: ...

Google Ads

Google Marketing Platform

Integration with Google Marketing Platform services including Google Analytics Admin, Campaign Manager, Display & Video 360, and Search Ads. Provides comprehensive digital marketing automation and reporting.

class GoogleAnalyticsAdminHook: ...
class GoogleCampaignManagerHook: ...
class GoogleDisplayVideo360Hook: ...
class GoogleSearchAdsHook: ...

Marketing Platform

Google Workspace Integration

Google Workspace (formerly G Suite) integration for Drive, Sheets, and Calendar. Enables document management, spreadsheet automation, and calendar scheduling within data pipelines.

class GoogleDriveHook: ...
class GSheetsHook: ...
class GoogleCalendarHook: ...
class GCSToGoogleSheetsOperator: ...

Google Workspace

Firebase Integration

Google Firebase integration for Firestore database operations, enabling NoSQL database interactions in data pipelines.

class CloudFirestoreHook: ...
class CloudFirestoreExportDatabaseOperator: ...

Firebase

Google LevelDB Integration

Google LevelDB integration provides a high-performance, embedded key-value database interface through Apache Airflow. Supports put, get, delete, and batch operations for fast local data storage and retrieval.

class LevelDBHook: ...
class LevelDBOperator: ...
class LevelDBHookException: ...

Google LevelDB

Data Transfer Operations

Extensive transfer capabilities between Google services and external systems including AWS S3, Azure Blob Storage, SFTP, local filesystems, and various databases.

class GCSToBigQueryOperator: ...
class S3ToGCSOperator: ...
class BigQueryToGCSOperator: ...
class MySQLToGCSOperator: ...
class AzureBlobStorageToGCSOperator: ...

Data Transfers

Common Utilities and Base Classes

Shared utilities, authentication backends, base classes, and helper functions used across all Google service integrations.

class GoogleBaseHook: ...
class GoogleBaseAsyncHook: ...
class GoogleDiscoveryApiHook: ...
class OperationHelper: ...

Common Utilities

Error Handling

The provider includes comprehensive error handling for Google API errors:

  • Authentication Errors: Invalid credentials, expired tokens, insufficient permissions
  • Resource Errors: Resource not found, quota exceeded, invalid resource states
  • Network Errors: Connection timeouts, API rate limiting, service unavailable
  • Data Errors: Schema mismatches, data validation failures, invalid formats

Most operators support retry mechanisms and provide detailed error messages for troubleshooting.

Types

# Common type definitions used across the provider
from typing import Dict, List, Optional, Union, Any, Sequence
from airflow.models import BaseOperator
from airflow.providers.google.common.hooks.base_google import GoogleBaseHook

# Connection and authentication types
GoogleCredentials = Union[str, Dict[str, Any]]
ImpersonationChain = Union[str, Sequence[str]]
GcpConnId = str

# Common parameter types
ProjectId = str
Location = str
ResourceId = str
Labels = Dict[str, str]