0
# Connection Types
1
2
BigQuery Connection API supports multiple external data source connection types, each with specific configuration properties. The `Connection` object uses a OneOf field structure, meaning only one connection type can be active per connection.
3
4
## Capabilities
5
6
### Core Connection Structure
7
8
The base `Connection` object contains common metadata and exactly one connection type configuration.
9
10
```python { .api }
11
class Connection:
12
"""Configuration parameters for an external data source connection."""
13
name: str # Output only. Resource name of the connection
14
friendly_name: str # User provided display name for the connection
15
description: str # User provided description
16
creation_time: int # Output only. Creation timestamp in milliseconds since epoch
17
last_modified_time: int # Output only. Last update timestamp in milliseconds since epoch
18
has_credential: bool # Output only. True if credential is configured for this connection
19
20
# OneOf connection type (exactly one must be set)
21
cloud_sql: CloudSqlProperties
22
aws: AwsProperties
23
azure: AzureProperties
24
cloud_spanner: CloudSpannerProperties
25
cloud_resource: CloudResourceProperties
26
spark: SparkProperties
27
salesforce_data_cloud: SalesforceDataCloudProperties
28
```
29
30
**Usage Example:**
31
32
```python
33
from google.cloud.bigquery_connection import Connection
34
35
# Create base connection
36
connection = Connection()
37
connection.friendly_name = "My External Database"
38
connection.description = "Connection to external data source"
39
40
# Set exactly one connection type (examples below)
41
```
42
43
### Cloud SQL Connections
44
45
Connects to Google Cloud SQL instances (PostgreSQL or MySQL) for querying relational database data.
46
47
```python { .api }
48
class CloudSqlProperties:
49
"""Properties for a Cloud SQL connection."""
50
instance_id: str # Cloud SQL instance ID in format 'project:location:instance'
51
database: str # Database name
52
type_: DatabaseType # Type of Cloud SQL database
53
credential: CloudSqlCredential # Input only. Database credentials
54
service_account_id: str # Output only. Service account ID for the connection
55
56
class DatabaseType:
57
"""Database type enumeration."""
58
DATABASE_TYPE_UNSPECIFIED = 0
59
POSTGRES = 1
60
MYSQL = 2
61
62
class CloudSqlCredential:
63
"""Credential for Cloud SQL connections."""
64
username: str # Database username
65
password: str # Database password
66
```
67
68
**Usage Example:**
69
70
```python
71
from google.cloud.bigquery_connection import (
72
Connection,
73
CloudSqlProperties,
74
CloudSqlCredential
75
)
76
77
connection = Connection()
78
connection.friendly_name = "PostgreSQL Analytics DB"
79
connection.description = "Production analytics database"
80
81
# Configure Cloud SQL connection
82
connection.cloud_sql = CloudSqlProperties()
83
connection.cloud_sql.instance_id = "my-project:us-central1:analytics-db"
84
connection.cloud_sql.database = "analytics"
85
connection.cloud_sql.type_ = CloudSqlProperties.DatabaseType.POSTGRES
86
connection.cloud_sql.credential = CloudSqlCredential(
87
username="bigquery_service",
88
password="secure_password_123"
89
)
90
91
# After creation, service_account_id will be populated:
92
# print(f"Service Account: {connection.cloud_sql.service_account_id}")
93
```
94
95
### AWS Connections
96
97
Connects to Amazon Web Services data sources using IAM role-based authentication.
98
99
```python { .api }
100
class AwsProperties:
101
"""Properties for AWS connections."""
102
# OneOf authentication method (exactly one must be set)
103
cross_account_role: AwsCrossAccountRole # Deprecated. Google-owned AWS IAM user access key
104
access_role: AwsAccessRole # Recommended. Google-owned service account authentication
105
106
class AwsCrossAccountRole:
107
"""AWS cross-account role authentication (deprecated)."""
108
iam_role_id: str # User's AWS IAM Role that trusts Google-owned AWS IAM user
109
iam_user_id: str # Output only. Google-owned AWS IAM User for the connection
110
external_id: str # Output only. Google-generated ID for representing connection's identity in AWS
111
112
class AwsAccessRole:
113
"""AWS access role authentication (recommended)."""
114
iam_role_id: str # User's AWS IAM Role that trusts Google-owned AWS IAM user
115
identity: str # Unique Google-owned and generated identity for the connection
116
```
117
118
**Usage Example:**
119
120
```python
121
from google.cloud.bigquery_connection import (
122
Connection,
123
AwsProperties,
124
AwsAccessRole
125
)
126
127
connection = Connection()
128
connection.friendly_name = "AWS S3 Data Lake"
129
connection.description = "Connection to S3 data lake for analytics"
130
131
# Configure AWS connection (using recommended access role method)
132
connection.aws = AwsProperties()
133
connection.aws.access_role = AwsAccessRole()
134
connection.aws.access_role.iam_role_id = "arn:aws:iam::123456789012:role/BigQueryAccessRole"
135
136
# After creation, identity will be populated:
137
# print(f"Google Identity: {connection.aws.access_role.identity}")
138
```
139
140
### Azure Connections
141
142
Connects to Microsoft Azure data sources using Azure Active Directory authentication.
143
144
```python { .api }
145
class AzureProperties:
146
"""Properties for Azure connections."""
147
application: str # Output only. Name of the Azure Active Directory Application
148
client_id: str # Output only. Client ID of the Azure AD Application
149
object_id: str # Output only. Object ID of the Azure AD Application
150
customer_tenant_id: str # ID of the customer's directory that hosts the data
151
redirect_uri: str # URL user will be redirected to after granting consent during connection setup
152
federated_application_client_id: str # Client ID of the user's Azure AD Application for federated connection
153
identity: str # Output only. Unique Google identity for the connection
154
```
155
156
**Usage Example:**
157
158
```python
159
from google.cloud.bigquery_connection import Connection, AzureProperties
160
161
connection = Connection()
162
connection.friendly_name = "Azure Data Lake Gen2"
163
connection.description = "Connection to Azure Data Lake for analytics"
164
165
# Configure Azure connection
166
connection.azure = AzureProperties()
167
connection.azure.customer_tenant_id = "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
168
connection.azure.redirect_uri = "https://console.cloud.google.com/bigquery"
169
connection.azure.federated_application_client_id = "12345678-90ab-cdef-1234-567890abcdef"
170
171
# After creation, output-only fields will be populated:
172
# print(f"Application: {connection.azure.application}")
173
# print(f"Client ID: {connection.azure.client_id}")
174
# print(f"Google Identity: {connection.azure.identity}")
175
```
176
177
### Cloud Spanner Connections
178
179
Connects to Google Cloud Spanner databases for analytical queries.
180
181
```python { .api }
182
class CloudSpannerProperties:
183
"""Properties for Cloud Spanner connections."""
184
database: str # Cloud Spanner database resource name in format 'projects/{project}/instances/{instance}/databases/{database}'
185
use_parallelism: bool # If parallelism should be used when reading from the Spanner database
186
max_parallelism: int # Allows setting max parallelism per query when executing on Spanner compute resources
187
use_serverless_analytics: bool # If the serverless analytics service should be used to read data from Spanner
188
use_data_boost: bool # If the request should be executed via Spanner independent compute resources
189
database_role: str # Optional. Cloud Spanner database role for fine-grained access control
190
```
191
192
**Usage Example:**
193
194
```python
195
from google.cloud.bigquery_connection import Connection, CloudSpannerProperties
196
197
connection = Connection()
198
connection.friendly_name = "Spanner OLTP Database"
199
connection.description = "Connection to Spanner for analytical queries"
200
201
# Configure Cloud Spanner connection
202
connection.cloud_spanner = CloudSpannerProperties()
203
connection.cloud_spanner.database = "projects/my-project/instances/my-instance/databases/my-database"
204
connection.cloud_spanner.use_parallelism = True
205
connection.cloud_spanner.max_parallelism = 4
206
connection.cloud_spanner.use_serverless_analytics = True
207
connection.cloud_spanner.use_data_boost = False
208
connection.cloud_spanner.database_role = "analytics_reader"
209
```
210
211
### Cloud Resource Connections
212
213
Connects to other Google Cloud resources with automatic service account management.
214
215
```python { .api }
216
class CloudResourceProperties:
217
"""Properties for Cloud Resource connections."""
218
service_account_id: str # Output only. The account ID of the service created for the connection
219
```
220
221
**Usage Example:**
222
223
```python
224
from google.cloud.bigquery_connection import Connection, CloudResourceProperties
225
226
connection = Connection()
227
connection.friendly_name = "Cloud Storage Data"
228
connection.description = "Connection to Google Cloud Storage buckets"
229
230
# Configure Cloud Resource connection
231
connection.cloud_resource = CloudResourceProperties()
232
233
# After creation, service_account_id will be populated:
234
# print(f"Service Account: {connection.cloud_resource.service_account_id}")
235
```
236
237
### Spark Connections
238
239
Connects to Apache Spark clusters for distributed data processing.
240
241
```python { .api }
242
class SparkProperties:
243
"""Properties for Spark connections."""
244
service_account_id: str # Output only. The account ID of the service created for the connection
245
metastore_service_config: MetastoreServiceConfig # Optional. Dataproc Metastore Service configuration
246
spark_history_server_config: SparkHistoryServerConfig # Optional. Spark History Server configuration
247
248
class MetastoreServiceConfig:
249
"""Configuration for Dataproc Metastore Service."""
250
metastore_service: str # Optional. Resource name of an existing Dataproc Metastore service
251
252
class SparkHistoryServerConfig:
253
"""Configuration for Spark History Server."""
254
dataproc_cluster: str # Optional. Resource name of an existing Dataproc Cluster to act as a Spark History Server
255
```
256
257
**Usage Example:**
258
259
```python
260
from google.cloud.bigquery_connection import (
261
Connection,
262
SparkProperties,
263
MetastoreServiceConfig,
264
SparkHistoryServerConfig
265
)
266
267
connection = Connection()
268
connection.friendly_name = "Spark Analytics Cluster"
269
connection.description = "Connection to Spark cluster for big data processing"
270
271
# Configure Spark connection
272
connection.spark = SparkProperties()
273
274
# Optional: Configure metastore service
275
connection.spark.metastore_service_config = MetastoreServiceConfig()
276
connection.spark.metastore_service_config.metastore_service = (
277
"projects/my-project/locations/us-central1/services/my-metastore"
278
)
279
280
# Optional: Configure history server
281
connection.spark.spark_history_server_config = SparkHistoryServerConfig()
282
connection.spark.spark_history_server_config.dataproc_cluster = (
283
"projects/my-project/regions/us-central1/clusters/spark-history-cluster"
284
)
285
286
# After creation, service_account_id will be populated:
287
# print(f"Service Account: {connection.spark.service_account_id}")
288
```
289
290
### Salesforce Data Cloud Connections
291
292
Connects to Salesforce Data Cloud for CRM and customer data analytics.
293
294
```python { .api }
295
class SalesforceDataCloudProperties:
296
"""Properties for Salesforce Data Cloud connections."""
297
instance_uri: str # The URL to the user's Salesforce DataCloud instance
298
identity: str # Output only. Unique Google service account identity for the connection
299
tenant_id: str # The ID of the user's Salesforce tenant
300
```
301
302
**Usage Example:**
303
304
```python
305
from google.cloud.bigquery_connection import Connection, SalesforceDataCloudProperties
306
307
connection = Connection()
308
connection.friendly_name = "Salesforce CRM Data"
309
connection.description = "Connection to Salesforce Data Cloud for customer analytics"
310
311
# Configure Salesforce Data Cloud connection
312
connection.salesforce_data_cloud = SalesforceDataCloudProperties()
313
connection.salesforce_data_cloud.instance_uri = "https://mycompany.my.salesforce-datacloud.com"
314
connection.salesforce_data_cloud.tenant_id = "00D123456789012345"
315
316
# After creation, identity will be populated:
317
# print(f"Google Identity: {connection.salesforce_data_cloud.identity}")
318
```
319
320
## Connection Type Selection
321
322
When creating a connection, you must choose exactly one connection type. The choice depends on your external data source:
323
324
```python
325
# Cloud SQL for relational databases (PostgreSQL, MySQL)
326
connection.cloud_sql = CloudSqlProperties()
327
328
# AWS for Amazon S3, Redshift, RDS, etc.
329
connection.aws = AwsProperties()
330
331
# Azure for Azure Data Lake, SQL Database, etc.
332
connection.azure = AzureProperties()
333
334
# Cloud Spanner for Google's globally distributed database
335
connection.cloud_spanner = CloudSpannerProperties()
336
337
# Cloud Resource for other Google Cloud services
338
connection.cloud_resource = CloudResourceProperties()
339
340
# Spark for distributed data processing
341
connection.spark = SparkProperties()
342
343
# Salesforce Data Cloud for CRM data
344
connection.salesforce_data_cloud = SalesforceDataCloudProperties()
345
```
346
347
## Common Patterns
348
349
### Output-Only Fields
350
351
Many connection types have output-only fields that are populated by the service after connection creation:
352
353
```python
354
# These fields are set by the service and cannot be modified
355
connection.name # Resource name assigned by the service
356
connection.creation_time # Timestamp when connection was created
357
connection.last_modified_time # Timestamp when connection was last updated
358
connection.has_credential # Whether credential information is configured
359
360
# Connection-type specific output fields
361
connection.cloud_sql.service_account_id # For Cloud SQL
362
connection.aws.access_role.identity # For AWS access role
363
connection.azure.identity # For Azure
364
connection.cloud_resource.service_account_id # For Cloud Resource
365
```
366
367
### Credential Management
368
369
Credential information is handled differently by connection type:
370
371
- **Cloud SQL**: Direct username/password stored securely
372
- **AWS**: IAM role trust relationship with Google-managed identity
373
- **Azure**: OAuth-based federated authentication
374
- **Cloud Spanner**: Uses Google Cloud IAM (no explicit credentials)
375
- **Cloud Resource**: Automatic service account creation
376
- **Spark**: Automatic service account creation
377
- **Salesforce**: OAuth-based authentication with tenant-specific configuration
378
379
### Security Considerations
380
381
- Credentials are encrypted and stored securely by Google Cloud
382
- Output-only identity fields provide secure authentication to external services
383
- IAM policies control access to connection resources
384
- Service accounts created for connections follow least-privilege principles