tessl/pypi-google-cloud-bigquery-connection

Google Cloud BigQuery Connection API client library for managing external data source connections and credentials

—

Pending

Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

Overview

Eval results

Files

Connection Types

Name: tessl/pypi-google-cloud-bigquery-connection
Author: tessl

BigQuery Connection API supports multiple external data source connection types, each with specific configuration properties. The Connection object uses a OneOf field structure, meaning only one connection type can be active per connection.

Capabilities

Core Connection Structure

The base Connection object contains common metadata and exactly one connection type configuration.

class Connection:
    """Configuration parameters for an external data source connection."""
    name: str  # Output only. Resource name of the connection
    friendly_name: str  # User provided display name for the connection
    description: str  # User provided description
    creation_time: int  # Output only. Creation timestamp in milliseconds since epoch
    last_modified_time: int  # Output only. Last update timestamp in milliseconds since epoch  
    has_credential: bool  # Output only. True if credential is configured for this connection
    
    # OneOf connection type (exactly one must be set)
    cloud_sql: CloudSqlProperties
    aws: AwsProperties
    azure: AzureProperties
    cloud_spanner: CloudSpannerProperties
    cloud_resource: CloudResourceProperties
    spark: SparkProperties
    salesforce_data_cloud: SalesforceDataCloudProperties

Usage Example:

from google.cloud.bigquery_connection import Connection

# Create base connection
connection = Connection()
connection.friendly_name = "My External Database"
connection.description = "Connection to external data source"

# Set exactly one connection type (examples below)

Cloud SQL Connections

Connects to Google Cloud SQL instances (PostgreSQL or MySQL) for querying relational database data.

class CloudSqlProperties:
    """Properties for a Cloud SQL connection."""
    instance_id: str  # Cloud SQL instance ID in format 'project:location:instance'
    database: str  # Database name
    type_: DatabaseType  # Type of Cloud SQL database  
    credential: CloudSqlCredential  # Input only. Database credentials
    service_account_id: str  # Output only. Service account ID for the connection

    class DatabaseType:
        """Database type enumeration."""
        DATABASE_TYPE_UNSPECIFIED = 0
        POSTGRES = 1
        MYSQL = 2

class CloudSqlCredential:
    """Credential for Cloud SQL connections."""
    username: str  # Database username
    password: str  # Database password

Usage Example:

from google.cloud.bigquery_connection import (
    Connection,
    CloudSqlProperties,
    CloudSqlCredential
)

connection = Connection()
connection.friendly_name = "PostgreSQL Analytics DB"
connection.description = "Production analytics database"

# Configure Cloud SQL connection
connection.cloud_sql = CloudSqlProperties()
connection.cloud_sql.instance_id = "my-project:us-central1:analytics-db"
connection.cloud_sql.database = "analytics"
connection.cloud_sql.type_ = CloudSqlProperties.DatabaseType.POSTGRES
connection.cloud_sql.credential = CloudSqlCredential(
    username="bigquery_service",
    password="secure_password_123"
)

# After creation, service_account_id will be populated:
# print(f"Service Account: {connection.cloud_sql.service_account_id}")

AWS Connections

Connects to Amazon Web Services data sources using IAM role-based authentication.

class AwsProperties:
    """Properties for AWS connections."""
    # OneOf authentication method (exactly one must be set)
    cross_account_role: AwsCrossAccountRole  # Deprecated. Google-owned AWS IAM user access key
    access_role: AwsAccessRole  # Recommended. Google-owned service account authentication

class AwsCrossAccountRole:
    """AWS cross-account role authentication (deprecated)."""
    iam_role_id: str  # User's AWS IAM Role that trusts Google-owned AWS IAM user
    iam_user_id: str  # Output only. Google-owned AWS IAM User for the connection
    external_id: str  # Output only. Google-generated ID for representing connection's identity in AWS

class AwsAccessRole:
    """AWS access role authentication (recommended)."""
    iam_role_id: str  # User's AWS IAM Role that trusts Google-owned AWS IAM user  
    identity: str  # Unique Google-owned and generated identity for the connection

Usage Example:

from google.cloud.bigquery_connection import (
    Connection,
    AwsProperties,
    AwsAccessRole
)

connection = Connection()
connection.friendly_name = "AWS S3 Data Lake"
connection.description = "Connection to S3 data lake for analytics"

# Configure AWS connection (using recommended access role method)
connection.aws = AwsProperties()
connection.aws.access_role = AwsAccessRole()
connection.aws.access_role.iam_role_id = "arn:aws:iam::123456789012:role/BigQueryAccessRole"

# After creation, identity will be populated:
# print(f"Google Identity: {connection.aws.access_role.identity}")

Azure Connections

Connects to Microsoft Azure data sources using Azure Active Directory authentication.

class AzureProperties:
    """Properties for Azure connections."""
    application: str  # Output only. Name of the Azure Active Directory Application
    client_id: str  # Output only. Client ID of the Azure AD Application
    object_id: str  # Output only. Object ID of the Azure AD Application
    customer_tenant_id: str  # ID of the customer's directory that hosts the data
    redirect_uri: str  # URL user will be redirected to after granting consent during connection setup
    federated_application_client_id: str  # Client ID of the user's Azure AD Application for federated connection
    identity: str  # Output only. Unique Google identity for the connection

Usage Example:

from google.cloud.bigquery_connection import Connection, AzureProperties

connection = Connection()
connection.friendly_name = "Azure Data Lake Gen2"
connection.description = "Connection to Azure Data Lake for analytics"

# Configure Azure connection
connection.azure = AzureProperties()
connection.azure.customer_tenant_id = "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
connection.azure.redirect_uri = "https://console.cloud.google.com/bigquery"
connection.azure.federated_application_client_id = "12345678-90ab-cdef-1234-567890abcdef"

# After creation, output-only fields will be populated:
# print(f"Application: {connection.azure.application}")
# print(f"Client ID: {connection.azure.client_id}")
# print(f"Google Identity: {connection.azure.identity}")

Cloud Spanner Connections

Connects to Google Cloud Spanner databases for analytical queries.

class CloudSpannerProperties:
    """Properties for Cloud Spanner connections."""
    database: str  # Cloud Spanner database resource name in format 'projects/{project}/instances/{instance}/databases/{database}'
    use_parallelism: bool  # If parallelism should be used when reading from the Spanner database
    max_parallelism: int  # Allows setting max parallelism per query when executing on Spanner compute resources
    use_serverless_analytics: bool  # If the serverless analytics service should be used to read data from Spanner
    use_data_boost: bool  # If the request should be executed via Spanner independent compute resources
    database_role: str  # Optional. Cloud Spanner database role for fine-grained access control

Usage Example:

from google.cloud.bigquery_connection import Connection, CloudSpannerProperties

connection = Connection()
connection.friendly_name = "Spanner OLTP Database"
connection.description = "Connection to Spanner for analytical queries"

# Configure Cloud Spanner connection
connection.cloud_spanner = CloudSpannerProperties()
connection.cloud_spanner.database = "projects/my-project/instances/my-instance/databases/my-database"
connection.cloud_spanner.use_parallelism = True
connection.cloud_spanner.max_parallelism = 4
connection.cloud_spanner.use_serverless_analytics = True
connection.cloud_spanner.use_data_boost = False
connection.cloud_spanner.database_role = "analytics_reader"

Cloud Resource Connections

Connects to other Google Cloud resources with automatic service account management.

class CloudResourceProperties:
    """Properties for Cloud Resource connections."""
    service_account_id: str  # Output only. The account ID of the service created for the connection

Usage Example:

from google.cloud.bigquery_connection import Connection, CloudResourceProperties

connection = Connection()
connection.friendly_name = "Cloud Storage Data"
connection.description = "Connection to Google Cloud Storage buckets"

# Configure Cloud Resource connection
connection.cloud_resource = CloudResourceProperties()

# After creation, service_account_id will be populated:
# print(f"Service Account: {connection.cloud_resource.service_account_id}")

Spark Connections

Connects to Apache Spark clusters for distributed data processing.

class SparkProperties:
    """Properties for Spark connections."""
    service_account_id: str  # Output only. The account ID of the service created for the connection
    metastore_service_config: MetastoreServiceConfig  # Optional. Dataproc Metastore Service configuration
    spark_history_server_config: SparkHistoryServerConfig  # Optional. Spark History Server configuration

class MetastoreServiceConfig:
    """Configuration for Dataproc Metastore Service."""
    metastore_service: str  # Optional. Resource name of an existing Dataproc Metastore service

class SparkHistoryServerConfig:
    """Configuration for Spark History Server."""
    dataproc_cluster: str  # Optional. Resource name of an existing Dataproc Cluster to act as a Spark History Server

Usage Example:

from google.cloud.bigquery_connection import (
    Connection,
    SparkProperties,
    MetastoreServiceConfig,
    SparkHistoryServerConfig
)

connection = Connection()
connection.friendly_name = "Spark Analytics Cluster"
connection.description = "Connection to Spark cluster for big data processing"

# Configure Spark connection
connection.spark = SparkProperties()

# Optional: Configure metastore service
connection.spark.metastore_service_config = MetastoreServiceConfig()
connection.spark.metastore_service_config.metastore_service = (
    "projects/my-project/locations/us-central1/services/my-metastore"
)

# Optional: Configure history server
connection.spark.spark_history_server_config = SparkHistoryServerConfig()
connection.spark.spark_history_server_config.dataproc_cluster = (
    "projects/my-project/regions/us-central1/clusters/spark-history-cluster"
)

# After creation, service_account_id will be populated:
# print(f"Service Account: {connection.spark.service_account_id}")

Salesforce Data Cloud Connections

Connects to Salesforce Data Cloud for CRM and customer data analytics.

class SalesforceDataCloudProperties:
    """Properties for Salesforce Data Cloud connections."""
    instance_uri: str  # The URL to the user's Salesforce DataCloud instance
    identity: str  # Output only. Unique Google service account identity for the connection
    tenant_id: str  # The ID of the user's Salesforce tenant

Usage Example:

from google.cloud.bigquery_connection import Connection, SalesforceDataCloudProperties

connection = Connection()
connection.friendly_name = "Salesforce CRM Data"
connection.description = "Connection to Salesforce Data Cloud for customer analytics"

# Configure Salesforce Data Cloud connection
connection.salesforce_data_cloud = SalesforceDataCloudProperties()
connection.salesforce_data_cloud.instance_uri = "https://mycompany.my.salesforce-datacloud.com"
connection.salesforce_data_cloud.tenant_id = "00D123456789012345"

# After creation, identity will be populated:
# print(f"Google Identity: {connection.salesforce_data_cloud.identity}")