Google Cloud BigQuery Connection API client library for managing external data source connections and credentials
—
Quality
Pending
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
BigQuery Connection API supports multiple external data source connection types, each with specific configuration properties. The Connection object uses a OneOf field structure, meaning only one connection type can be active per connection.
The base Connection object contains common metadata and exactly one connection type configuration.
class Connection:
"""Configuration parameters for an external data source connection."""
name: str # Output only. Resource name of the connection
friendly_name: str # User provided display name for the connection
description: str # User provided description
creation_time: int # Output only. Creation timestamp in milliseconds since epoch
last_modified_time: int # Output only. Last update timestamp in milliseconds since epoch
has_credential: bool # Output only. True if credential is configured for this connection
# OneOf connection type (exactly one must be set)
cloud_sql: CloudSqlProperties
aws: AwsProperties
azure: AzureProperties
cloud_spanner: CloudSpannerProperties
cloud_resource: CloudResourceProperties
spark: SparkProperties
salesforce_data_cloud: SalesforceDataCloudPropertiesUsage Example:
from google.cloud.bigquery_connection import Connection
# Create base connection
connection = Connection()
connection.friendly_name = "My External Database"
connection.description = "Connection to external data source"
# Set exactly one connection type (examples below)Connects to Google Cloud SQL instances (PostgreSQL or MySQL) for querying relational database data.
class CloudSqlProperties:
"""Properties for a Cloud SQL connection."""
instance_id: str # Cloud SQL instance ID in format 'project:location:instance'
database: str # Database name
type_: DatabaseType # Type of Cloud SQL database
credential: CloudSqlCredential # Input only. Database credentials
service_account_id: str # Output only. Service account ID for the connection
class DatabaseType:
"""Database type enumeration."""
DATABASE_TYPE_UNSPECIFIED = 0
POSTGRES = 1
MYSQL = 2
class CloudSqlCredential:
"""Credential for Cloud SQL connections."""
username: str # Database username
password: str # Database passwordUsage Example:
from google.cloud.bigquery_connection import (
Connection,
CloudSqlProperties,
CloudSqlCredential
)
connection = Connection()
connection.friendly_name = "PostgreSQL Analytics DB"
connection.description = "Production analytics database"
# Configure Cloud SQL connection
connection.cloud_sql = CloudSqlProperties()
connection.cloud_sql.instance_id = "my-project:us-central1:analytics-db"
connection.cloud_sql.database = "analytics"
connection.cloud_sql.type_ = CloudSqlProperties.DatabaseType.POSTGRES
connection.cloud_sql.credential = CloudSqlCredential(
username="bigquery_service",
password="secure_password_123"
)
# After creation, service_account_id will be populated:
# print(f"Service Account: {connection.cloud_sql.service_account_id}")Connects to Amazon Web Services data sources using IAM role-based authentication.
class AwsProperties:
"""Properties for AWS connections."""
# OneOf authentication method (exactly one must be set)
cross_account_role: AwsCrossAccountRole # Deprecated. Google-owned AWS IAM user access key
access_role: AwsAccessRole # Recommended. Google-owned service account authentication
class AwsCrossAccountRole:
"""AWS cross-account role authentication (deprecated)."""
iam_role_id: str # User's AWS IAM Role that trusts Google-owned AWS IAM user
iam_user_id: str # Output only. Google-owned AWS IAM User for the connection
external_id: str # Output only. Google-generated ID for representing connection's identity in AWS
class AwsAccessRole:
"""AWS access role authentication (recommended)."""
iam_role_id: str # User's AWS IAM Role that trusts Google-owned AWS IAM user
identity: str # Unique Google-owned and generated identity for the connectionUsage Example:
from google.cloud.bigquery_connection import (
Connection,
AwsProperties,
AwsAccessRole
)
connection = Connection()
connection.friendly_name = "AWS S3 Data Lake"
connection.description = "Connection to S3 data lake for analytics"
# Configure AWS connection (using recommended access role method)
connection.aws = AwsProperties()
connection.aws.access_role = AwsAccessRole()
connection.aws.access_role.iam_role_id = "arn:aws:iam::123456789012:role/BigQueryAccessRole"
# After creation, identity will be populated:
# print(f"Google Identity: {connection.aws.access_role.identity}")Connects to Microsoft Azure data sources using Azure Active Directory authentication.
class AzureProperties:
"""Properties for Azure connections."""
application: str # Output only. Name of the Azure Active Directory Application
client_id: str # Output only. Client ID of the Azure AD Application
object_id: str # Output only. Object ID of the Azure AD Application
customer_tenant_id: str # ID of the customer's directory that hosts the data
redirect_uri: str # URL user will be redirected to after granting consent during connection setup
federated_application_client_id: str # Client ID of the user's Azure AD Application for federated connection
identity: str # Output only. Unique Google identity for the connectionUsage Example:
from google.cloud.bigquery_connection import Connection, AzureProperties
connection = Connection()
connection.friendly_name = "Azure Data Lake Gen2"
connection.description = "Connection to Azure Data Lake for analytics"
# Configure Azure connection
connection.azure = AzureProperties()
connection.azure.customer_tenant_id = "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
connection.azure.redirect_uri = "https://console.cloud.google.com/bigquery"
connection.azure.federated_application_client_id = "12345678-90ab-cdef-1234-567890abcdef"
# After creation, output-only fields will be populated:
# print(f"Application: {connection.azure.application}")
# print(f"Client ID: {connection.azure.client_id}")
# print(f"Google Identity: {connection.azure.identity}")Connects to Google Cloud Spanner databases for analytical queries.
class CloudSpannerProperties:
"""Properties for Cloud Spanner connections."""
database: str # Cloud Spanner database resource name in format 'projects/{project}/instances/{instance}/databases/{database}'
use_parallelism: bool # If parallelism should be used when reading from the Spanner database
max_parallelism: int # Allows setting max parallelism per query when executing on Spanner compute resources
use_serverless_analytics: bool # If the serverless analytics service should be used to read data from Spanner
use_data_boost: bool # If the request should be executed via Spanner independent compute resources
database_role: str # Optional. Cloud Spanner database role for fine-grained access controlUsage Example:
from google.cloud.bigquery_connection import Connection, CloudSpannerProperties
connection = Connection()
connection.friendly_name = "Spanner OLTP Database"
connection.description = "Connection to Spanner for analytical queries"
# Configure Cloud Spanner connection
connection.cloud_spanner = CloudSpannerProperties()
connection.cloud_spanner.database = "projects/my-project/instances/my-instance/databases/my-database"
connection.cloud_spanner.use_parallelism = True
connection.cloud_spanner.max_parallelism = 4
connection.cloud_spanner.use_serverless_analytics = True
connection.cloud_spanner.use_data_boost = False
connection.cloud_spanner.database_role = "analytics_reader"Connects to other Google Cloud resources with automatic service account management.
class CloudResourceProperties:
"""Properties for Cloud Resource connections."""
service_account_id: str # Output only. The account ID of the service created for the connectionUsage Example:
from google.cloud.bigquery_connection import Connection, CloudResourceProperties
connection = Connection()
connection.friendly_name = "Cloud Storage Data"
connection.description = "Connection to Google Cloud Storage buckets"
# Configure Cloud Resource connection
connection.cloud_resource = CloudResourceProperties()
# After creation, service_account_id will be populated:
# print(f"Service Account: {connection.cloud_resource.service_account_id}")Connects to Apache Spark clusters for distributed data processing.
class SparkProperties:
"""Properties for Spark connections."""
service_account_id: str # Output only. The account ID of the service created for the connection
metastore_service_config: MetastoreServiceConfig # Optional. Dataproc Metastore Service configuration
spark_history_server_config: SparkHistoryServerConfig # Optional. Spark History Server configuration
class MetastoreServiceConfig:
"""Configuration for Dataproc Metastore Service."""
metastore_service: str # Optional. Resource name of an existing Dataproc Metastore service
class SparkHistoryServerConfig:
"""Configuration for Spark History Server."""
dataproc_cluster: str # Optional. Resource name of an existing Dataproc Cluster to act as a Spark History ServerUsage Example:
from google.cloud.bigquery_connection import (
Connection,
SparkProperties,
MetastoreServiceConfig,
SparkHistoryServerConfig
)
connection = Connection()
connection.friendly_name = "Spark Analytics Cluster"
connection.description = "Connection to Spark cluster for big data processing"
# Configure Spark connection
connection.spark = SparkProperties()
# Optional: Configure metastore service
connection.spark.metastore_service_config = MetastoreServiceConfig()
connection.spark.metastore_service_config.metastore_service = (
"projects/my-project/locations/us-central1/services/my-metastore"
)
# Optional: Configure history server
connection.spark.spark_history_server_config = SparkHistoryServerConfig()
connection.spark.spark_history_server_config.dataproc_cluster = (
"projects/my-project/regions/us-central1/clusters/spark-history-cluster"
)
# After creation, service_account_id will be populated:
# print(f"Service Account: {connection.spark.service_account_id}")Connects to Salesforce Data Cloud for CRM and customer data analytics.
class SalesforceDataCloudProperties:
"""Properties for Salesforce Data Cloud connections."""
instance_uri: str # The URL to the user's Salesforce DataCloud instance
identity: str # Output only. Unique Google service account identity for the connection
tenant_id: str # The ID of the user's Salesforce tenantUsage Example:
from google.cloud.bigquery_connection import Connection, SalesforceDataCloudProperties
connection = Connection()
connection.friendly_name = "Salesforce CRM Data"
connection.description = "Connection to Salesforce Data Cloud for customer analytics"
# Configure Salesforce Data Cloud connection
connection.salesforce_data_cloud = SalesforceDataCloudProperties()
connection.salesforce_data_cloud.instance_uri = "https://mycompany.my.salesforce-datacloud.com"
connection.salesforce_data_cloud.tenant_id = "00D123456789012345"
# After creation, identity will be populated:
# print(f"Google Identity: {connection.salesforce_data_cloud.identity}")When creating a connection, you must choose exactly one connection type. The choice depends on your external data source:
# Cloud SQL for relational databases (PostgreSQL, MySQL)
connection.cloud_sql = CloudSqlProperties()
# AWS for Amazon S3, Redshift, RDS, etc.
connection.aws = AwsProperties()
# Azure for Azure Data Lake, SQL Database, etc.
connection.azure = AzureProperties()
# Cloud Spanner for Google's globally distributed database
connection.cloud_spanner = CloudSpannerProperties()
# Cloud Resource for other Google Cloud services
connection.cloud_resource = CloudResourceProperties()
# Spark for distributed data processing
connection.spark = SparkProperties()
# Salesforce Data Cloud for CRM data
connection.salesforce_data_cloud = SalesforceDataCloudProperties()Many connection types have output-only fields that are populated by the service after connection creation:
# These fields are set by the service and cannot be modified
connection.name # Resource name assigned by the service
connection.creation_time # Timestamp when connection was created
connection.last_modified_time # Timestamp when connection was last updated
connection.has_credential # Whether credential information is configured
# Connection-type specific output fields
connection.cloud_sql.service_account_id # For Cloud SQL
connection.aws.access_role.identity # For AWS access role
connection.azure.identity # For Azure
connection.cloud_resource.service_account_id # For Cloud ResourceCredential information is handled differently by connection type:
Install with Tessl CLI
npx tessl i tessl/pypi-google-cloud-bigquery-connection