Provider package for Google services integration with Apache Airflow, including Google Ads, Google Cloud (GCP), Google Firebase, Google LevelDB, Google Marketing Platform, and Google Workspace
npx @tessl/cli install tessl/pypi-apache-airflow-providers-google@17.1.00
# Apache Airflow Google Provider
1
2
Apache Airflow Google Provider is a comprehensive package that enables integration between Apache Airflow and Google services ecosystem. It provides operators, hooks, sensors, and transfer tools for Google Ads, Google Cloud Platform services (BigQuery, Cloud Storage, Cloud Functions, Compute Engine, etc.), Google Firebase, Google LevelDB, Google Marketing Platform, and Google Workspace. The package offers a unified interface for data pipeline orchestration across Google's suite of products, supporting authentication through Google Cloud credentials, extensive configuration options, and built-in error handling and retry mechanisms.
3
4
## Package Information
5
6
- **Package Name**: apache-airflow-providers-google
7
- **Language**: Python
8
- **Package Type**: Apache Airflow Provider
9
- **Installation**: `pip install apache-airflow-providers-google`
10
- **Minimum Airflow Version**: 2.10.0+
11
12
## Core Imports
13
14
All components follow standard Airflow provider import patterns:
15
16
```python
17
# Hooks - Base connectivity to Google services
18
from airflow.providers.google.cloud.hooks.bigquery import BigQueryHook
19
from airflow.providers.google.cloud.hooks.gcs import GCSHook
20
from airflow.providers.google.ads.hooks.ads import GoogleAdsHook
21
22
# Operators - Task execution components
23
from airflow.providers.google.cloud.operators.bigquery import BigQueryCreateDatasetOperator
24
from airflow.providers.google.cloud.operators.gcs import GCSCreateBucketOperator
25
26
# Sensors - Condition monitoring components
27
from airflow.providers.google.cloud.sensors.gcs import GCSObjectExistenceSensor
28
from airflow.providers.google.cloud.sensors.bigquery import BigQueryTableExistenceSensor
29
30
# Transfers - Data movement components
31
from airflow.providers.google.cloud.transfers.gcs_to_bigquery import GCSToBigQueryOperator
32
from airflow.providers.google.cloud.transfers.s3_to_gcs import S3ToGCSOperator
33
```
34
35
## Basic Usage
36
37
```python
38
from datetime import datetime
39
from airflow import DAG
40
from airflow.providers.google.cloud.hooks.bigquery import BigQueryHook
41
from airflow.providers.google.cloud.operators.bigquery import BigQueryCreateDatasetOperator
42
from airflow.providers.google.cloud.sensors.gcs import GCSObjectExistenceSensor
43
from airflow.providers.google.cloud.transfers.gcs_to_bigquery import GCSToBigQueryOperator
44
45
# Define DAG
46
dag = DAG(
47
'google_provider_example',
48
default_args={'start_date': datetime(2023, 1, 1)},
49
schedule_interval='@daily',
50
catchup=False
51
)
52
53
# Create BigQuery dataset
54
create_dataset = BigQueryCreateDatasetOperator(
55
task_id='create_dataset',
56
dataset_id='example_dataset',
57
project_id='my-gcp-project',
58
gcp_conn_id='google_cloud_default',
59
dag=dag
60
)
61
62
# Wait for file in GCS
63
wait_for_file = GCSObjectExistenceSensor(
64
task_id='wait_for_file',
65
bucket='my-bucket',
66
object='data/input.csv',
67
gcp_conn_id='google_cloud_default',
68
dag=dag
69
)
70
71
# Load data from GCS to BigQuery
72
load_data = GCSToBigQueryOperator(
73
task_id='load_data',
74
bucket='my-bucket',
75
source_objects=['data/input.csv'],
76
destination_project_dataset_table='my-gcp-project.example_dataset.example_table',
77
schema_fields=[
78
{'name': 'id', 'type': 'INTEGER', 'mode': 'REQUIRED'},
79
{'name': 'name', 'type': 'STRING', 'mode': 'NULLABLE'},
80
],
81
write_disposition='WRITE_TRUNCATE',
82
gcp_conn_id='google_cloud_default',
83
dag=dag
84
)
85
86
# Set task dependencies
87
create_dataset >> wait_for_file >> load_data
88
```
89
90
## Architecture
91
92
The provider follows Airflow's architecture patterns with specialized components:
93
94
- **Hooks**: Low-level interfaces to Google services, handling authentication, connection management, and API calls
95
- **Operators**: Task execution components that use hooks to perform specific operations (create resources, run jobs, etc.)
96
- **Sensors**: Monitoring components that wait for specific conditions (file existence, job completion, etc.)
97
- **Transfers**: Specialized operators for moving data between systems
98
- **Links**: Console link generators for easy navigation to Google Cloud Console
99
- **Triggers**: Async components for long-running operations in deferrable mode
100
101
## Authentication
102
103
All components support multiple authentication methods:
104
105
- **Service Account JSON Key Files**: Specified via `key_path` or `keyfile_dict`
106
- **Application Default Credentials (ADC)**: Automatic credential discovery
107
- **Service Account Impersonation**: Cross-project access via `impersonation_chain`
108
- **OAuth Flows**: For Google Ads and Marketing Platform services
109
110
```python
111
# Using Service Account Key File
112
hook = BigQueryHook(
113
gcp_conn_id='my_connection',
114
key_path='/path/to/service-account.json'
115
)
116
117
# Using impersonation
118
hook = BigQueryHook(
119
gcp_conn_id='my_connection',
120
impersonation_chain='service-account@project.iam.gserviceaccount.com'
121
)
122
```
123
124
## Capabilities
125
126
### Google Cloud Platform Services
127
128
Comprehensive integration with Google Cloud Platform including BigQuery, Cloud Storage, Dataproc, Dataflow, Vertex AI, Cloud SQL, Pub/Sub, and 40+ other services. Provides complete CRUD operations, batch processing, real-time streaming, and machine learning capabilities.
129
130
```python { .api }
131
# Key GCP hooks and operators
132
class BigQueryHook: ...
133
class GCSHook: ...
134
class DataprocHook: ...
135
class DataflowHook: ...
136
class VertexAIHook: ...
137
138
class BigQueryCreateDatasetOperator: ...
139
class GCSCreateBucketOperator: ...
140
class DataprocCreateClusterOperator: ...
141
class DataflowCreatePythonJobOperator: ...
142
```
143
144
[Google Cloud Platform](./gcp-services.md)
145
146
### Google Ads Integration
147
148
Google Ads API integration with OAuth authentication, account management, and reporting capabilities. Supports campaign data extraction and automated reporting workflows.
149
150
```python { .api }
151
class GoogleAdsHook: ...
152
class GoogleAdsListAccountsOperator: ...
153
class GoogleAdsToGcsOperator: ...
154
```
155
156
[Google Ads](./google-ads.md)
157
158
### Google Marketing Platform
159
160
Integration with Google Marketing Platform services including Google Analytics Admin, Campaign Manager, Display & Video 360, and Search Ads. Provides comprehensive digital marketing automation and reporting.
161
162
```python { .api }
163
class GoogleAnalyticsAdminHook: ...
164
class GoogleCampaignManagerHook: ...
165
class GoogleDisplayVideo360Hook: ...
166
class GoogleSearchAdsHook: ...
167
```
168
169
[Marketing Platform](./marketing-platform.md)
170
171
### Google Workspace Integration
172
173
Google Workspace (formerly G Suite) integration for Drive, Sheets, and Calendar. Enables document management, spreadsheet automation, and calendar scheduling within data pipelines.
174
175
```python { .api }
176
class GoogleDriveHook: ...
177
class GSheetsHook: ...
178
class GoogleCalendarHook: ...
179
class GCSToGoogleSheetsOperator: ...
180
```
181
182
[Google Workspace](./google-workspace.md)
183
184
### Firebase Integration
185
186
Google Firebase integration for Firestore database operations, enabling NoSQL database interactions in data pipelines.
187
188
```python { .api }
189
class CloudFirestoreHook: ...
190
class CloudFirestoreExportDatabaseOperator: ...
191
```
192
193
[Firebase](./firebase.md)
194
195
### Google LevelDB Integration
196
197
Google LevelDB integration provides a high-performance, embedded key-value database interface through Apache Airflow. Supports put, get, delete, and batch operations for fast local data storage and retrieval.
198
199
```python { .api }
200
class LevelDBHook: ...
201
class LevelDBOperator: ...
202
class LevelDBHookException: ...
203
```
204
205
[Google LevelDB](./leveldb.md)
206
207
### Data Transfer Operations
208
209
Extensive transfer capabilities between Google services and external systems including AWS S3, Azure Blob Storage, SFTP, local filesystems, and various databases.
210
211
```python { .api }
212
class GCSToBigQueryOperator: ...
213
class S3ToGCSOperator: ...
214
class BigQueryToGCSOperator: ...
215
class MySQLToGCSOperator: ...
216
class AzureBlobStorageToGCSOperator: ...
217
```
218
219
[Data Transfers](./data-transfers.md)
220
221
### Common Utilities and Base Classes
222
223
Shared utilities, authentication backends, base classes, and helper functions used across all Google service integrations.
224
225
```python { .api }
226
class GoogleBaseHook: ...
227
class GoogleBaseAsyncHook: ...
228
class GoogleDiscoveryApiHook: ...
229
class OperationHelper: ...
230
```
231
232
[Common Utilities](./common-utilities.md)
233
234
## Error Handling
235
236
The provider includes comprehensive error handling for Google API errors:
237
238
- **Authentication Errors**: Invalid credentials, expired tokens, insufficient permissions
239
- **Resource Errors**: Resource not found, quota exceeded, invalid resource states
240
- **Network Errors**: Connection timeouts, API rate limiting, service unavailable
241
- **Data Errors**: Schema mismatches, data validation failures, invalid formats
242
243
Most operators support retry mechanisms and provide detailed error messages for troubleshooting.
244
245
## Types
246
247
```python { .api }
248
# Common type definitions used across the provider
249
from typing import Dict, List, Optional, Union, Any, Sequence
250
from airflow.models import BaseOperator
251
from airflow.providers.google.common.hooks.base_google import GoogleBaseHook
252
253
# Connection and authentication types
254
GoogleCredentials = Union[str, Dict[str, Any]]
255
ImpersonationChain = Union[str, Sequence[str]]
256
GcpConnId = str
257
258
# Common parameter types
259
ProjectId = str
260
Location = str
261
ResourceId = str
262
Labels = Dict[str, str]
263
```