or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

bigquery-client.mdclickhouse-client.mdcore-types.mdcredentials.mddatabricks-client.mdfactory-functions.mdindex.mdpostgres-client.mdredshift-client.mdsnowflake-client.mdsql-builders.mdssh-tunnel.mdtrino-client.md
tile.json

bigquery-client.mddocs/

BigQuery Client

Google Cloud BigQuery client with native async query support.

Quick Start

import { BigqueryWarehouseClient } from '@lightdash/warehouses';
import { type CreateBigqueryCredentials } from '@lightdash/common';

const client = new BigqueryWarehouseClient({
  type: 'bigquery',
  project: 'my-project-id',
  dataset: 'my_dataset',
  location: 'US',
  keyfileContents: {
    type: 'service_account',
    project_id: 'my-project-id',
    private_key: '-----BEGIN PRIVATE KEY-----\n...',
    client_email: 'service@project.iam.gserviceaccount.com',
  },
  timeoutSeconds: 300,
  priority: 'interactive',
  maximumBytesBilled: 1000000000,
  retries: 3,
});

await client.test();

Key Features

  • Native Async Queries: Returns job ID for long-running queries (results stream immediately)
  • Query Labels: Track queries with custom labels (auto-sanitized to lowercase, max 60 chars)
  • Maximum Bytes Billed: Set cost limits to prevent runaway query costs
  • Priority: INTERACTIVE (fast, higher cost) or BATCH (slower, lower cost)
  • Partition Detection: getAllTables() includes partition column metadata
  • OAuth Support: Application Default Credentials for local development
  • Error Enhancement: Line number extraction from query syntax errors

Common Patterns

Async Query Execution

// Note: BigQuery requires the callback parameter (streams results immediately)
// Returns: Promise<WarehouseExecuteAsyncQuery>
const result = await client.executeAsyncQuery(
  {
    sql: 'SELECT * FROM `my-dataset.large-table` WHERE date >= @date',
    tags: { source: 'app' },          // Required
    timezone: 'America/New_York',     // Optional
    queryParams: { date: '2023-01-01' }, // Named parameters
  },
  (rows, fields) => {
    // Callback invoked as results stream from BigQuery job
    console.log(`Received ${rows.length} rows`);
  }
);

// Result contains job metadata
console.log('Job ID:', result.queryId);           // string (BigQuery job ID)
console.log('Total rows:', result.totalRows);     // number
console.log('Duration:', result.durationMs);      // number
console.log('Location:', result.queryMetadata.jobLocation); // 'US', 'EU', etc.

Edge Cases:

  • BigQuery streams results immediately via callback (unlike other warehouses)
  • queryId contains the BigQuery job ID for tracking in GCP console
  • queryMetadata.jobLocation required for retrieving job status later

Query with Parameters

await client.streamQuery(
  'SELECT * FROM `dataset.table` WHERE status = @status',
  (data) => { /* process rows */ },
  { queryParams: { status: 'active' }, tags: {} }
);

Get All Tables with Partitions

const tables = await client.getAllTables();
tables.forEach(t => {
  if (t.partitionColumn) {
    console.log(`${t.table}: partitioned by ${t.partitionColumn.field}`);
  }
});

SQL Builder

import { BigquerySqlBuilder } from '@lightdash/warehouses';

const builder = new BigquerySqlBuilder();
builder.getFieldQuoteChar(); // '`'
builder.getFloatingType(); // 'FLOAT64'
builder.escapeString("O'Reilly"); // "O\'Reilly"

Percentile SQL: Uses APPROX_QUANTILES (approximate, fast for large datasets)

import { MetricType } from '@lightdash/common';

// Median (50th percentile)
const medianSql = builder.getMetricSql('revenue', {
  type: MetricType.MEDIAN,
  sql: 'revenue'
});
// Returns: "APPROX_QUANTILES(revenue, 100)[OFFSET(50)]"

// 95th percentile
const p95Sql = builder.getMetricSql('latency', {
  type: MetricType.PERCENTILE,
  sql: 'latency',
  percentile: 95
});
// Returns: "APPROX_QUANTILES(latency, 100)[OFFSET(95)]"

Note: APPROX_QUANTILES provides approximate results (faster but not exact). For exact percentiles on small datasets, use custom SQL with PERCENTILE_CONT.

Error Handling

BigQuery client provides enhanced error messages with line number extraction:

try {
  await client.streamQuery(sql, callback, { tags: {} });
} catch (error) {
  const parsed = client.parseError(error);
  console.error(parsed.message); // Enhanced message with line numbers and guidance
}

Common Error Codes and Messages:

Error CodeEnhanced MessageAction
accessDenied"Access denied. Check your BigQuery permissions and ensure the service account has appropriate roles (BigQuery Data Viewer, BigQuery Job User)."Verify IAM roles on service account
invalidQuery"Query syntax error at line X: [error details]"Fix syntax at indicated line
stopped"Query was stopped or cancelled by user."Check if query was manually cancelled
quotaExceeded"BigQuery quota exceeded. Check your project quotas and billing."Review GCP quotas, enable billing
rateLimitExceeded"BigQuery rate limit exceeded. Reduce query frequency or request quota increase."Throttle queries or request increase
invalid"Invalid query: [details]"Review query structure

Line Number Extraction: The parser extracts line numbers from errors like [1:34] Expected end of input but got keyword SELECT and formats them for clarity.

Static Methods

BigQuery client provides several static utility methods for working with BigQuery outside of a client instance.

Label Sanitization

/**
 * Sanitize labels to meet BigQuery requirements
 * - Converts to lowercase
 * - Truncates keys/values to 60 characters
 * - Removes invalid characters (only alphanumeric, underscore, hyphen allowed)
 * @param labels - Raw label key-value pairs
 * @returns Sanitized labels or undefined if input is undefined
 */
static sanitizeLabelsWithValues(
  labels?: Record<string, string>
): Record<string, string> | undefined;

// Usage
const labels = BigqueryWarehouseClient.sanitizeLabelsWithValues({
  'User-ID': '123',
  'UPPERCASE': 'value',
  'Very-Long-Label-Name-That-Exceeds-Maximum-Length-Of-60-Characters': 'val'
});
// Result: { 'user-id': '123', 'uppercase': 'value', 'very-long-label-name-that-exceeds-maximum-length-of-60-cha': 'val' }

Edge Cases:

  • Empty strings in keys/values are preserved
  • Consecutive hyphens/underscores are allowed
  • Leading/trailing invalid characters are removed

OAuth Dataset Listing

/**
 * Get available datasets (schemas) using OAuth refresh token
 * Creates temporary BigQuery client using OAuth for authentication
 * Useful for UI dataset selection without service account
 * @param projectId - GCP project ID
 * @param refresh_token - OAuth refresh token from Google OAuth flow
 * @returns Promise resolving to array of dataset metadata
 */
static getDatabases(
  projectId: string,
  refresh_token: string
): Promise<Array<{ projectId: string; datasetId: string; location?: string }>>;

// Usage
const datasets = await BigqueryWarehouseClient.getDatabases(
  'my-project-id',
  'oauth-refresh-token'
);
datasets.forEach(ds => {
  console.log(`${ds.projectId}.${ds.datasetId} (${ds.location})`);
});

Requirements:

  • OAuth refresh token must have bigquery.datasets.get permission
  • Project must exist and user must have access

Field Type Extraction

/**
 * Extract field type information from BigQuery query response
 * Maps BigQuery types to Lightdash DimensionType
 * @param response - BigQuery QueryRowsResponse[2] with schema metadata
 * @returns Field metadata mapping column names to types
 */
static getFieldsFromResponse(
  response: QueryRowsResponse[2] | undefined
): Record<string, { type: DimensionType }>;

// Usage
const [rows, , metadata] = await dataset.query({ query: sql });
const fields = BigqueryWarehouseClient.getFieldsFromResponse(metadata);
console.log(fields);
// { user_id: { type: 'number' }, created_at: { type: 'timestamp' } }

Field Type Mapping

/**
 * Map BigQuery field type to Lightdash DimensionType
 * @param fieldType - BigQuery field type string (e.g., 'STRING', 'INT64', 'TIMESTAMP')
 * @returns DimensionType enum value
 */
static mapFieldType(fieldType: string): DimensionType;

// Usage
const dimType = BigqueryWarehouseClient.mapFieldType('INT64');
// Returns: DimensionType.NUMBER

Type Mappings:

  • STRING, BYTESDimensionType.STRING
  • INT64, INTEGER, FLOAT, FLOAT64, NUMERIC, BIGNUMERICDimensionType.NUMBER
  • BOOLEAN, BOOLDimensionType.BOOLEAN
  • DATEDimensionType.DATE
  • TIMESTAMP, DATETIME, TIMEDimensionType.TIMESTAMP
  • Others → DimensionType.STRING

Error Type Guard

/**
 * Type guard to check if error is a BigQuery error with error details
 * @param error - Error object to check
 * @returns True if error contains BigQuery error details (IErrorProto)
 */
static isBigqueryError(error: unknown): error is BigqueryError;

// Usage
try {
  await client.streamQuery(sql, callback, { tags: {} });
} catch (error) {
  if (BigqueryWarehouseClient.isBigqueryError(error)) {
    console.error('BigQuery error code:', error.errors[0].reason);
    console.error('Error message:', error.errors[0].message);
  }
}

BigQuery Error Structure:

interface BigqueryError {
  errors: Array<{
    reason: string;     // Error code (e.g., 'accessDenied', 'invalidQuery')
    message: string;    // Error message with details
    location?: string;  // Query location if syntax error
  }>;
}

Authentication

Service Account (most common):

keyfileContents: {
  type: 'service_account',
  project_id: 'my-project-id',
  private_key: '-----BEGIN PRIVATE KEY-----\n...',
  client_email: 'service@project.iam.gserviceaccount.com',
  // ... other service account fields
}

Application Default Credentials: Set GOOGLE_APPLICATION_CREDENTIALS environment variable or use gcloud CLI auth.

Field Type Enum

import { BigqueryFieldType } from '@lightdash/warehouses';

enum BigqueryFieldType {
  STRING, INTEGER, INT64, BYTES, FLOAT, FLOAT64,
  BOOLEAN, BOOL, TIMESTAMP, DATE, TIME, DATETIME,
  GEOGRAPHY, NUMERIC, BIGNUMERIC, RECORD, STRUCT, ARRAY,
}