Provider package for Google services integration with Apache Airflow, including Google Ads, Google Cloud (GCP), Google Firebase, Google LevelDB, Google Marketing Platform, and Google Workspace
npx @tessl/cli install tessl/pypi-apache-airflow-providers-google@17.1.0Apache Airflow Google Provider is a comprehensive package that enables integration between Apache Airflow and Google services ecosystem. It provides operators, hooks, sensors, and transfer tools for Google Ads, Google Cloud Platform services (BigQuery, Cloud Storage, Cloud Functions, Compute Engine, etc.), Google Firebase, Google LevelDB, Google Marketing Platform, and Google Workspace. The package offers a unified interface for data pipeline orchestration across Google's suite of products, supporting authentication through Google Cloud credentials, extensive configuration options, and built-in error handling and retry mechanisms.
pip install apache-airflow-providers-googleAll components follow standard Airflow provider import patterns:
# Hooks - Base connectivity to Google services
from airflow.providers.google.cloud.hooks.bigquery import BigQueryHook
from airflow.providers.google.cloud.hooks.gcs import GCSHook
from airflow.providers.google.ads.hooks.ads import GoogleAdsHook
# Operators - Task execution components
from airflow.providers.google.cloud.operators.bigquery import BigQueryCreateDatasetOperator
from airflow.providers.google.cloud.operators.gcs import GCSCreateBucketOperator
# Sensors - Condition monitoring components
from airflow.providers.google.cloud.sensors.gcs import GCSObjectExistenceSensor
from airflow.providers.google.cloud.sensors.bigquery import BigQueryTableExistenceSensor
# Transfers - Data movement components
from airflow.providers.google.cloud.transfers.gcs_to_bigquery import GCSToBigQueryOperator
from airflow.providers.google.cloud.transfers.s3_to_gcs import S3ToGCSOperatorfrom datetime import datetime
from airflow import DAG
from airflow.providers.google.cloud.hooks.bigquery import BigQueryHook
from airflow.providers.google.cloud.operators.bigquery import BigQueryCreateDatasetOperator
from airflow.providers.google.cloud.sensors.gcs import GCSObjectExistenceSensor
from airflow.providers.google.cloud.transfers.gcs_to_bigquery import GCSToBigQueryOperator
# Define DAG
dag = DAG(
'google_provider_example',
default_args={'start_date': datetime(2023, 1, 1)},
schedule_interval='@daily',
catchup=False
)
# Create BigQuery dataset
create_dataset = BigQueryCreateDatasetOperator(
task_id='create_dataset',
dataset_id='example_dataset',
project_id='my-gcp-project',
gcp_conn_id='google_cloud_default',
dag=dag
)
# Wait for file in GCS
wait_for_file = GCSObjectExistenceSensor(
task_id='wait_for_file',
bucket='my-bucket',
object='data/input.csv',
gcp_conn_id='google_cloud_default',
dag=dag
)
# Load data from GCS to BigQuery
load_data = GCSToBigQueryOperator(
task_id='load_data',
bucket='my-bucket',
source_objects=['data/input.csv'],
destination_project_dataset_table='my-gcp-project.example_dataset.example_table',
schema_fields=[
{'name': 'id', 'type': 'INTEGER', 'mode': 'REQUIRED'},
{'name': 'name', 'type': 'STRING', 'mode': 'NULLABLE'},
],
write_disposition='WRITE_TRUNCATE',
gcp_conn_id='google_cloud_default',
dag=dag
)
# Set task dependencies
create_dataset >> wait_for_file >> load_dataThe provider follows Airflow's architecture patterns with specialized components:
All components support multiple authentication methods:
key_path or keyfile_dictimpersonation_chain# Using Service Account Key File
hook = BigQueryHook(
gcp_conn_id='my_connection',
key_path='/path/to/service-account.json'
)
# Using impersonation
hook = BigQueryHook(
gcp_conn_id='my_connection',
impersonation_chain='service-account@project.iam.gserviceaccount.com'
)Comprehensive integration with Google Cloud Platform including BigQuery, Cloud Storage, Dataproc, Dataflow, Vertex AI, Cloud SQL, Pub/Sub, and 40+ other services. Provides complete CRUD operations, batch processing, real-time streaming, and machine learning capabilities.
# Key GCP hooks and operators
class BigQueryHook: ...
class GCSHook: ...
class DataprocHook: ...
class DataflowHook: ...
class VertexAIHook: ...
class BigQueryCreateDatasetOperator: ...
class GCSCreateBucketOperator: ...
class DataprocCreateClusterOperator: ...
class DataflowCreatePythonJobOperator: ...Google Ads API integration with OAuth authentication, account management, and reporting capabilities. Supports campaign data extraction and automated reporting workflows.
class GoogleAdsHook: ...
class GoogleAdsListAccountsOperator: ...
class GoogleAdsToGcsOperator: ...Integration with Google Marketing Platform services including Google Analytics Admin, Campaign Manager, Display & Video 360, and Search Ads. Provides comprehensive digital marketing automation and reporting.
class GoogleAnalyticsAdminHook: ...
class GoogleCampaignManagerHook: ...
class GoogleDisplayVideo360Hook: ...
class GoogleSearchAdsHook: ...Google Workspace (formerly G Suite) integration for Drive, Sheets, and Calendar. Enables document management, spreadsheet automation, and calendar scheduling within data pipelines.
class GoogleDriveHook: ...
class GSheetsHook: ...
class GoogleCalendarHook: ...
class GCSToGoogleSheetsOperator: ...Google Firebase integration for Firestore database operations, enabling NoSQL database interactions in data pipelines.
class CloudFirestoreHook: ...
class CloudFirestoreExportDatabaseOperator: ...Google LevelDB integration provides a high-performance, embedded key-value database interface through Apache Airflow. Supports put, get, delete, and batch operations for fast local data storage and retrieval.
class LevelDBHook: ...
class LevelDBOperator: ...
class LevelDBHookException: ...Extensive transfer capabilities between Google services and external systems including AWS S3, Azure Blob Storage, SFTP, local filesystems, and various databases.
class GCSToBigQueryOperator: ...
class S3ToGCSOperator: ...
class BigQueryToGCSOperator: ...
class MySQLToGCSOperator: ...
class AzureBlobStorageToGCSOperator: ...Shared utilities, authentication backends, base classes, and helper functions used across all Google service integrations.
class GoogleBaseHook: ...
class GoogleBaseAsyncHook: ...
class GoogleDiscoveryApiHook: ...
class OperationHelper: ...The provider includes comprehensive error handling for Google API errors:
Most operators support retry mechanisms and provide detailed error messages for troubleshooting.
# Common type definitions used across the provider
from typing import Dict, List, Optional, Union, Any, Sequence
from airflow.models import BaseOperator
from airflow.providers.google.common.hooks.base_google import GoogleBaseHook
# Connection and authentication types
GoogleCredentials = Union[str, Dict[str, Any]]
ImpersonationChain = Union[str, Sequence[str]]
GcpConnId = str
# Common parameter types
ProjectId = str
Location = str
ResourceId = str
Labels = Dict[str, str]