or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/pypi-apache-airflow-providers-google

Provider package for Google services integration with Apache Airflow, including Google Ads, Google Cloud (GCP), Google Firebase, Google LevelDB, Google Marketing Platform, and Google Workspace

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/apache-airflow-providers-google@17.1.x

To install, run

npx @tessl/cli install tessl/pypi-apache-airflow-providers-google@17.1.0

0

# Apache Airflow Google Provider

1

2

Apache Airflow Google Provider is a comprehensive package that enables integration between Apache Airflow and Google services ecosystem. It provides operators, hooks, sensors, and transfer tools for Google Ads, Google Cloud Platform services (BigQuery, Cloud Storage, Cloud Functions, Compute Engine, etc.), Google Firebase, Google LevelDB, Google Marketing Platform, and Google Workspace. The package offers a unified interface for data pipeline orchestration across Google's suite of products, supporting authentication through Google Cloud credentials, extensive configuration options, and built-in error handling and retry mechanisms.

3

4

## Package Information

5

6

- **Package Name**: apache-airflow-providers-google

7

- **Language**: Python

8

- **Package Type**: Apache Airflow Provider

9

- **Installation**: `pip install apache-airflow-providers-google`

10

- **Minimum Airflow Version**: 2.10.0+

11

12

## Core Imports

13

14

All components follow standard Airflow provider import patterns:

15

16

```python

17

# Hooks - Base connectivity to Google services

18

from airflow.providers.google.cloud.hooks.bigquery import BigQueryHook

19

from airflow.providers.google.cloud.hooks.gcs import GCSHook

20

from airflow.providers.google.ads.hooks.ads import GoogleAdsHook

21

22

# Operators - Task execution components

23

from airflow.providers.google.cloud.operators.bigquery import BigQueryCreateDatasetOperator

24

from airflow.providers.google.cloud.operators.gcs import GCSCreateBucketOperator

25

26

# Sensors - Condition monitoring components

27

from airflow.providers.google.cloud.sensors.gcs import GCSObjectExistenceSensor

28

from airflow.providers.google.cloud.sensors.bigquery import BigQueryTableExistenceSensor

29

30

# Transfers - Data movement components

31

from airflow.providers.google.cloud.transfers.gcs_to_bigquery import GCSToBigQueryOperator

32

from airflow.providers.google.cloud.transfers.s3_to_gcs import S3ToGCSOperator

33

```

34

35

## Basic Usage

36

37

```python

38

from datetime import datetime

39

from airflow import DAG

40

from airflow.providers.google.cloud.hooks.bigquery import BigQueryHook

41

from airflow.providers.google.cloud.operators.bigquery import BigQueryCreateDatasetOperator

42

from airflow.providers.google.cloud.sensors.gcs import GCSObjectExistenceSensor

43

from airflow.providers.google.cloud.transfers.gcs_to_bigquery import GCSToBigQueryOperator

44

45

# Define DAG

46

dag = DAG(

47

'google_provider_example',

48

default_args={'start_date': datetime(2023, 1, 1)},

49

schedule_interval='@daily',

50

catchup=False

51

)

52

53

# Create BigQuery dataset

54

create_dataset = BigQueryCreateDatasetOperator(

55

task_id='create_dataset',

56

dataset_id='example_dataset',

57

project_id='my-gcp-project',

58

gcp_conn_id='google_cloud_default',

59

dag=dag

60

)

61

62

# Wait for file in GCS

63

wait_for_file = GCSObjectExistenceSensor(

64

task_id='wait_for_file',

65

bucket='my-bucket',

66

object='data/input.csv',

67

gcp_conn_id='google_cloud_default',

68

dag=dag

69

)

70

71

# Load data from GCS to BigQuery

72

load_data = GCSToBigQueryOperator(

73

task_id='load_data',

74

bucket='my-bucket',

75

source_objects=['data/input.csv'],

76

destination_project_dataset_table='my-gcp-project.example_dataset.example_table',

77

schema_fields=[

78

{'name': 'id', 'type': 'INTEGER', 'mode': 'REQUIRED'},

79

{'name': 'name', 'type': 'STRING', 'mode': 'NULLABLE'},

80

],

81

write_disposition='WRITE_TRUNCATE',

82

gcp_conn_id='google_cloud_default',

83

dag=dag

84

)

85

86

# Set task dependencies

87

create_dataset >> wait_for_file >> load_data

88

```

89

90

## Architecture

91

92

The provider follows Airflow's architecture patterns with specialized components:

93

94

- **Hooks**: Low-level interfaces to Google services, handling authentication, connection management, and API calls

95

- **Operators**: Task execution components that use hooks to perform specific operations (create resources, run jobs, etc.)

96

- **Sensors**: Monitoring components that wait for specific conditions (file existence, job completion, etc.)

97

- **Transfers**: Specialized operators for moving data between systems

98

- **Links**: Console link generators for easy navigation to Google Cloud Console

99

- **Triggers**: Async components for long-running operations in deferrable mode

100

101

## Authentication

102

103

All components support multiple authentication methods:

104

105

- **Service Account JSON Key Files**: Specified via `key_path` or `keyfile_dict`

106

- **Application Default Credentials (ADC)**: Automatic credential discovery

107

- **Service Account Impersonation**: Cross-project access via `impersonation_chain`

108

- **OAuth Flows**: For Google Ads and Marketing Platform services

109

110

```python

111

# Using Service Account Key File

112

hook = BigQueryHook(

113

gcp_conn_id='my_connection',

114

key_path='/path/to/service-account.json'

115

)

116

117

# Using impersonation

118

hook = BigQueryHook(

119

gcp_conn_id='my_connection',

120

impersonation_chain='service-account@project.iam.gserviceaccount.com'

121

)

122

```

123

124

## Capabilities

125

126

### Google Cloud Platform Services

127

128

Comprehensive integration with Google Cloud Platform including BigQuery, Cloud Storage, Dataproc, Dataflow, Vertex AI, Cloud SQL, Pub/Sub, and 40+ other services. Provides complete CRUD operations, batch processing, real-time streaming, and machine learning capabilities.

129

130

```python { .api }

131

# Key GCP hooks and operators

132

class BigQueryHook: ...

133

class GCSHook: ...

134

class DataprocHook: ...

135

class DataflowHook: ...

136

class VertexAIHook: ...

137

138

class BigQueryCreateDatasetOperator: ...

139

class GCSCreateBucketOperator: ...

140

class DataprocCreateClusterOperator: ...

141

class DataflowCreatePythonJobOperator: ...

142

```

143

144

[Google Cloud Platform](./gcp-services.md)

145

146

### Google Ads Integration

147

148

Google Ads API integration with OAuth authentication, account management, and reporting capabilities. Supports campaign data extraction and automated reporting workflows.

149

150

```python { .api }

151

class GoogleAdsHook: ...

152

class GoogleAdsListAccountsOperator: ...

153

class GoogleAdsToGcsOperator: ...

154

```

155

156

[Google Ads](./google-ads.md)

157

158

### Google Marketing Platform

159

160

Integration with Google Marketing Platform services including Google Analytics Admin, Campaign Manager, Display & Video 360, and Search Ads. Provides comprehensive digital marketing automation and reporting.

161

162

```python { .api }

163

class GoogleAnalyticsAdminHook: ...

164

class GoogleCampaignManagerHook: ...

165

class GoogleDisplayVideo360Hook: ...

166

class GoogleSearchAdsHook: ...

167

```

168

169

[Marketing Platform](./marketing-platform.md)

170

171

### Google Workspace Integration

172

173

Google Workspace (formerly G Suite) integration for Drive, Sheets, and Calendar. Enables document management, spreadsheet automation, and calendar scheduling within data pipelines.

174

175

```python { .api }

176

class GoogleDriveHook: ...

177

class GSheetsHook: ...

178

class GoogleCalendarHook: ...

179

class GCSToGoogleSheetsOperator: ...

180

```

181

182

[Google Workspace](./google-workspace.md)

183

184

### Firebase Integration

185

186

Google Firebase integration for Firestore database operations, enabling NoSQL database interactions in data pipelines.

187

188

```python { .api }

189

class CloudFirestoreHook: ...

190

class CloudFirestoreExportDatabaseOperator: ...

191

```

192

193

[Firebase](./firebase.md)

194

195

### Google LevelDB Integration

196

197

Google LevelDB integration provides a high-performance, embedded key-value database interface through Apache Airflow. Supports put, get, delete, and batch operations for fast local data storage and retrieval.

198

199

```python { .api }

200

class LevelDBHook: ...

201

class LevelDBOperator: ...

202

class LevelDBHookException: ...

203

```

204

205

[Google LevelDB](./leveldb.md)

206

207

### Data Transfer Operations

208

209

Extensive transfer capabilities between Google services and external systems including AWS S3, Azure Blob Storage, SFTP, local filesystems, and various databases.

210

211

```python { .api }

212

class GCSToBigQueryOperator: ...

213

class S3ToGCSOperator: ...

214

class BigQueryToGCSOperator: ...

215

class MySQLToGCSOperator: ...

216

class AzureBlobStorageToGCSOperator: ...

217

```

218

219

[Data Transfers](./data-transfers.md)

220

221

### Common Utilities and Base Classes

222

223

Shared utilities, authentication backends, base classes, and helper functions used across all Google service integrations.

224

225

```python { .api }

226

class GoogleBaseHook: ...

227

class GoogleBaseAsyncHook: ...

228

class GoogleDiscoveryApiHook: ...

229

class OperationHelper: ...

230

```

231

232

[Common Utilities](./common-utilities.md)

233

234

## Error Handling

235

236

The provider includes comprehensive error handling for Google API errors:

237

238

- **Authentication Errors**: Invalid credentials, expired tokens, insufficient permissions

239

- **Resource Errors**: Resource not found, quota exceeded, invalid resource states

240

- **Network Errors**: Connection timeouts, API rate limiting, service unavailable

241

- **Data Errors**: Schema mismatches, data validation failures, invalid formats

242

243

Most operators support retry mechanisms and provide detailed error messages for troubleshooting.

244

245

## Types

246

247

```python { .api }

248

# Common type definitions used across the provider

249

from typing import Dict, List, Optional, Union, Any, Sequence

250

from airflow.models import BaseOperator

251

from airflow.providers.google.common.hooks.base_google import GoogleBaseHook

252

253

# Connection and authentication types

254

GoogleCredentials = Union[str, Dict[str, Any]]

255

ImpersonationChain = Union[str, Sequence[str]]

256

GcpConnId = str

257

258

# Common parameter types

259

ProjectId = str

260

Location = str

261

ResourceId = str

262

Labels = Dict[str, str]

263

```