or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

bigquery-client.mdclickhouse-client.mdcore-types.mdcredentials.mddatabricks-client.mdfactory-functions.mdindex.mdpostgres-client.mdredshift-client.mdsnowflake-client.mdsql-builders.mdssh-tunnel.mdtrino-client.md
tile.json

tessl/npm-lightdash--warehouses

Warehouse connectors providing unified interface for querying BigQuery, ClickHouse, Databricks, PostgreSQL, Redshift, Snowflake, and Trino data warehouses

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
npmpkg:npm/@lightdash/warehouses@0.2231.x

To install, run

npx @tessl/cli install tessl/npm-lightdash--warehouses@0.2231.0

index.mddocs/

Lightdash Warehouses

Unified interface for BigQuery, ClickHouse, Databricks, PostgreSQL, Redshift, Snowflake, and Trino data warehouses.

Package: @lightdash/warehouses (npm) Install: npm install @lightdash/warehouses

Quick Start

TypeScript/ESM:

import { warehouseClientFromCredentials } from '@lightdash/warehouses';
import { WarehouseTypes } from '@lightdash/common';

const client = warehouseClientFromCredentials({
  type: WarehouseTypes.POSTGRES,
  host: 'localhost',
  port: 5432,
  user: 'myuser',
  password: 'mypassword',
  dbname: 'mydb',
  schema: 'public',
});

await client.test();

await client.streamQuery(
  'SELECT * FROM users',
  (data) => { /* process data.rows */ },
  { tags: {} }
);

CommonJS:

const { warehouseClientFromCredentials } = require('@lightdash/warehouses');
const { WarehouseTypes } = require('@lightdash/common');

const client = warehouseClientFromCredentials({
  type: WarehouseTypes.POSTGRES,
  host: 'localhost',
  port: 5432,
  user: 'myuser',
  password: 'mypassword',
  dbname: 'mydb',
  schema: 'public',
});

await client.test();

await client.streamQuery(
  'SELECT * FROM users',
  (data) => { /* process data.rows */ },
  { tags: {} }
);

Core Patterns

Creating Clients

Via Factory (recommended):

// Import from @lightdash/warehouses (factory function)
// Import from @lightdash/common (types and enums)
import { warehouseClientFromCredentials } from '@lightdash/warehouses';
import { WarehouseTypes } from '@lightdash/common';

const client = warehouseClientFromCredentials(credentials);

CommonJS:

const { warehouseClientFromCredentials } = require('@lightdash/warehouses');
const { WarehouseTypes } = require('@lightdash/common');

const client = warehouseClientFromCredentials(credentials);

Direct Import:

// All client classes exported from @lightdash/warehouses
// Exception: ClickhouseWarehouseClient - NOT exported, use factory
import {
  BigqueryWarehouseClient,
  // ClickhouseWarehouseClient - NOT exported, use factory
  DatabricksWarehouseClient,
  PostgresWarehouseClient,
  RedshiftWarehouseClient,
  SnowflakeWarehouseClient,
  TrinoWarehouseClient,
} from '@lightdash/warehouses';

const client = new PostgresWarehouseClient(credentials);

CommonJS Direct Import:

const {
  BigqueryWarehouseClient,
  // ClickhouseWarehouseClient - NOT exported
  DatabricksWarehouseClient,
  PostgresWarehouseClient,
  RedshiftWarehouseClient,
  SnowflakeWarehouseClient,
  TrinoWarehouseClient,
} = require('@lightdash/warehouses');

const client = new PostgresWarehouseClient(credentials);

Query Execution

Streaming (memory-efficient):

// Callback receives batches of results - memory-efficient for large datasets
await client.streamQuery(
  'SELECT * FROM large_table',
  (data: WarehouseResults) => {
    // data.rows: Record<string, any>[]
    // data.fields: Record<string, { type: DimensionType }>
    data.rows.forEach(row => processRow(row));
  },
  {
    tags: {},              // Required: tags for query tracking (use {} if no tags)
    timezone: 'UTC',       // Optional: session timezone (e.g., 'UTC', 'America/New_York')
    values: [param1, param2],    // Optional: positional params ($1, $2) for PostgreSQL/Redshift
    queryParams: { key: val }    // Optional: named params for ClickHouse
  }
);

Parameters:

  • query (string): SQL query to execute
  • streamCallback (function): Called for each batch of results
    • data.rows: Array of result rows (objects with column key-value pairs)
    • data.fields: Column metadata with DimensionType for each field
  • options (object):
    • tags: Required - Query tracking labels (use {} if no tags needed)
    • timezone: Optional - Session timezone for timestamp interpretation
    • values: Optional - Positional parameters for parameterized queries (PostgreSQL/Redshift)
    • queryParams: Optional - Named parameters for parameterized queries (ClickHouse)

All Results (loads all into memory):

// Returns: Promise<WarehouseResults>
// Note: tags parameter is required (use {} if no tags needed)
const results = await client.runQuery(
  'SELECT COUNT(*) FROM users',
  { source: 'app' }, // Required tags parameter
  'UTC' // Optional timezone
);
console.log(results.rows[0]);

Edge Cases:

  • tags parameter is always required (use {} if no tags needed)
  • values array uses positional placeholders ($1, $2) in PostgreSQL/Redshift
  • ClickHouse uses named parameters via queryParams instead of values

Metadata Operations

// Get all tables in warehouse
// Returns: Promise<WarehouseTables>
const tables = await client.getAllTables(
  'public',  // schema?: string (optional - ignored by PostgreSQL, Databricks, Trino, ClickHouse)
  {}         // tags?: Record<string, string> (optional - ignored by most warehouses)
);

// Get catalog metadata for specific tables
// Returns: Promise<WarehouseCatalog> - nested structure: database -> schema -> table -> columns
const catalog = await client.getCatalog([
  { database: 'db1', schema: 'public', table: 'users' },
  { database: 'db1', schema: 'public', table: 'orders' }
]);

// Get fields for a specific table
// Returns: Promise<WarehouseCatalog>
const fields = await client.getFields(
  'users',    // tableName: string (required)
  'public',   // schema?: string (optional, uses credentials schema if omitted)
  'db1',      // database?: string (optional, uses current database if omitted)
  {}          // tags?: Record<string, string> (optional)
);

Parameters for getAllTables:

  • schema: Optional schema filter
    • BigQuery/Snowflake: Filters by schema/dataset
    • PostgreSQL/Redshift/Databricks/Trino/ClickHouse: Parameter ignored, returns all tables
  • tags: Optional query tracking labels (ignored by most warehouses)

Parameters for getCatalog:

  • config: Array of table identifiers
    • Each entry requires: database (project/catalog), schema, table
    • Returns metadata for specified tables only

Parameters for getFields:

  • tableName: Required - Name of the table
  • schema: Optional - Schema name (defaults to credentials schema)
  • database: Optional - Database/project/catalog name (defaults to current)
  • tags: Optional - Query tracking labels

SQL Generation

// All clients implement WarehouseSqlBuilder interface
const quote = client.getFieldQuoteChar(); // Returns: '"' for Postgres, '`' for BigQuery
const safe = client.escapeString(userInput); // Returns: string (escaped for warehouse)
const sql = client.getMetricSql('revenue', {
  type: MetricType.SUM,
  sql: 'amount'
}); // Returns: string (warehouse-specific aggregation SQL)

// Other SQL builder methods:
client.getStringQuoteChar();         // Returns: "'" (single quote)
client.getEscapeStringQuoteChar();   // Returns: "\\" or "''" depending on warehouse
client.getFloatingType();            // Returns: "FLOAT64" (BigQuery), "DOUBLE" (Postgres), etc.
client.concatString('a', 'b', 'c');  // Returns: warehouse-specific concatenation SQL

Architecture

The package is built with these design principles:

  • Unified Client Interface: WarehouseClient interface provides consistent API across all warehouse types
  • Streaming Architecture: All queries use streaming to handle large result sets efficiently
    • PostgreSQL/Redshift: Server-side cursors with 1000-row batches
    • BigQuery: Native streaming API with automatic pagination
    • Snowflake: Native result set streaming
    • Others: Memory-efficient result iteration
  • Connection Management:
    • PostgreSQL/Redshift: Connection pooling via pg library (5s connection timeout, 5min query timeout)
    • BigQuery: Managed via Google Cloud SDK
    • Snowflake: Native connection pooling
    • Others: Warehouse-specific connection management
  • Type Safety: Full TypeScript support with generic types and strict credential typing
  • Automatic Type Conversion:
    • PostgreSQL/Redshift: NUMERICnumber (via parseFloat), INT8BigInt
    • All warehouses: Native types mapped to DimensionType (STRING, NUMBER, TIMESTAMP, DATE, BOOLEAN)

Warehouse Support Matrix

WarehouseFactory TypeAsync QueriesSSH TunnelAuth Methods
BigQueryBIGQUERY✓ NativeService Account, OAuth
ClickHouseCLICKHOUSEPassword
DatabricksDATABRICKSToken, OAuth M2M/U2M
PostgreSQLPOSTGRESFallbackPassword, SSL
RedshiftREDSHIFTFallbackPassword, SSL
SnowflakeSNOWFLAKE✓ NativePassword, Key Pair, OAuth, Browser
TrinoTRINOFallbackBasicAuth

Credentials Reference

All credential types are defined in @lightdash/common and include required fields like type, host/account, authentication details, and optional configuration.

Credentials Reference - Complete credential types with all required/optional fields

Quick credential patterns:

  • BigQuery: Requires project, location, keyfileContents (service account JSON)
  • ClickHouse: Requires host, port, user, password, schema (database name)
  • Databricks: Requires serverHostName, httpPath, database, and auth (token/OAuth)
  • PostgreSQL: Requires host, port, user, password, dbname, schema, sslmode
  • Redshift: Same as PostgreSQL plus optional ra3Node flag
  • Snowflake: Requires account, user, auth method, database, warehouse, schema
  • Trino: Requires host, port, user, password, dbname (catalog), schema, http_scheme

Warehouse-Specific Guides

  • BigQuery - GCP BigQuery with native async queries
  • ClickHouse - ClickHouse with named parameters
  • Databricks - Databricks with Unity Catalog
  • PostgreSQL - PostgreSQL with SSH tunnel
  • Redshift - Amazon Redshift (PostgreSQL-compatible)
  • Snowflake - Snowflake with multiple auth methods
  • Trino - Trino distributed SQL engine

Additional Topics

Key Imports

TypeScript/ESM:

// ========================================
// From @lightdash/warehouses
// ========================================

// Factory functions
import {
  warehouseClientFromCredentials,  // Create client from credentials
  warehouseSqlBuilderFromType,     // Create SQL builder from adapter type
} from '@lightdash/warehouses';

// Warehouse client classes (all exported except ClickHouse)
// Note: ClickhouseWarehouseClient is NOT exported - use factory function
import {
  BigqueryWarehouseClient,
  DatabricksWarehouseClient,
  PostgresWarehouseClient,
  RedshiftWarehouseClient,
  SnowflakeWarehouseClient,
  TrinoWarehouseClient,
} from '@lightdash/warehouses';

// SQL Builder classes (all exported except ClickHouse)
// Note: ClickhouseSqlBuilder is NOT exported - use factory function
import {
  BigquerySqlBuilder,
  DatabricksSqlBuilder,
  PostgresSqlBuilder,
  RedshiftSqlBuilder,
  SnowflakeSqlBuilder,
  TrinoSqlBuilder,
} from '@lightdash/warehouses';

// Warehouse-specific type enums
// Note: ClickhouseTypes is NOT exported - ClickHouse support only via factory
import {
  BigqueryFieldType,  // BigQuery field type enum
  PostgresTypes,      // PostgreSQL data type enum
  SnowflakeTypes,     // Snowflake data type enum
  TrinoTypes,         // Trino data type enum
} from '@lightdash/warehouses';

// SSH Tunnel
import { SshTunnel } from '@lightdash/warehouses';

// Databricks OAuth helper functions
import {
  exchangeDatabricksOAuthCredentials,  // M2M OAuth token exchange
  refreshDatabricksOAuthToken,         // Refresh OAuth access token
} from '@lightdash/warehouses';

// ========================================
// From @lightdash/common
// ========================================

// Core warehouse interfaces and types
import {
  type WarehouseClient,              // Unified warehouse client interface
  type WarehouseCatalog,             // Catalog structure (database->schema->table->columns)
  type WarehouseTableSchema,         // Table schema (column->DimensionType)
  type WarehouseResults,             // Query results (rows + fields)
  type WarehouseTables,              // Array of table metadata
  type WarehouseExecuteAsyncQueryArgs,      // Async query arguments
  type WarehouseExecuteAsyncQuery,          // Async query metadata result
  type WarehouseGetAsyncQueryResultsArgs,   // Paginated results arguments
  type WarehouseGetAsyncQueryResults,       // Paginated results
  type WarehouseSqlBuilder,          // SQL builder interface
} from '@lightdash/common';

// Credential types (all defined in @lightdash/common)
import {
  type CreateWarehouseCredentials,   // Union of all credential types
  type CreateBigqueryCredentials,
  type CreateClickhouseCredentials,
  type CreateDatabricksCredentials,
  type CreatePostgresCredentials,
  type CreateRedshiftCredentials,
  type CreateSnowflakeCredentials,
  type CreateTrinoCredentials,
} from '@lightdash/common';

// Enums
import {
  WarehouseTypes,        // Warehouse type discriminator enum
  DimensionType,         // Normalized column types (STRING, NUMBER, etc.)
  MetricType,            // Metric aggregation types (SUM, AVG, etc.)
  SupportedDbtAdapter,   // DBT adapter type enum
  WeekDay,               // Week start day (0=Monday, 6=Sunday)
  PartitionType,         // Partition types (DATE, RANGE) for BigQuery
} from '@lightdash/common';

CommonJS:

// ========================================
// From @lightdash/warehouses
// ========================================

const {
  // Factory functions
  warehouseClientFromCredentials,
  warehouseSqlBuilderFromType,

  // Client classes (ClickhouseWarehouseClient NOT exported)
  BigqueryWarehouseClient,
  DatabricksWarehouseClient,
  PostgresWarehouseClient,
  RedshiftWarehouseClient,
  SnowflakeWarehouseClient,
  TrinoWarehouseClient,

  // SQL Builders (ClickhouseSqlBuilder NOT exported)
  BigquerySqlBuilder,
  DatabricksSqlBuilder,
  PostgresSqlBuilder,
  RedshiftSqlBuilder,
  SnowflakeSqlBuilder,
  TrinoSqlBuilder,

  // Type enums (ClickhouseTypes NOT exported)
  BigqueryFieldType,
  PostgresTypes,
  SnowflakeTypes,
  TrinoTypes,

  // SSH Tunnel
  SshTunnel,

  // Databricks OAuth
  exchangeDatabricksOAuthCredentials,
  refreshDatabricksOAuthToken,
} = require('@lightdash/warehouses');

// ========================================
// From @lightdash/common
// ========================================

const {
  // Enums
  WarehouseTypes,
  DimensionType,
  MetricType,
  SupportedDbtAdapter,
  WeekDay,
  PartitionType,
} = require('@lightdash/common');

// For types, use JSDoc in CommonJS:
/**
 * @typedef {import('@lightdash/common').WarehouseClient} WarehouseClient
 * @typedef {import('@lightdash/common').CreateWarehouseCredentials} CreateWarehouseCredentials
 * @typedef {import('@lightdash/common').WarehouseResults} WarehouseResults
 */

Common Patterns

Error Handling

All clients provide enhanced error parsing via parseError():

try {
  await client.test();
} catch (error) {
  // Use client.parseError() for warehouse-specific error enhancement
  const parsed = client.parseError(error);
  console.error(parsed.message); // Enhanced with line numbers, specific guidance

  // BigQuery-specific error types: accessDenied, invalidQuery, stopped, quotaExceeded
  // PostgreSQL-specific: connection errors, syntax errors
  // Snowflake-specific: authentication, network, session errors
}

Common Error Types:

  • Connection errors: Check host, port, credentials, network access
  • Authentication errors: Verify credentials, permissions, token expiry
  • Query syntax errors: Line numbers extracted when available
  • Quota/rate limit errors: Reduce frequency or request quota increase
  • Permission errors: Check warehouse-specific IAM/roles (e.g., BigQuery Data Viewer)

Parameterized Queries

PostgreSQL/Redshift (positional):

await client.streamQuery(
  'SELECT * FROM users WHERE id = $1 AND status = $2',
  callback,
  { values: [123, 'active'], tags: {} }
);

ClickHouse (named):

await client.streamQuery(
  'SELECT * FROM users WHERE id = {userId: UInt64} AND status = {status: String}',
  callback,
  { queryParams: { userId: 123, status: 'active' }, tags: {} }
);

Async Queries (BigQuery/Snowflake)

// Execute async query with native warehouse support
// Returns: Promise<WarehouseExecuteAsyncQuery>
// Note: Callback is REQUIRED for most warehouses, optional only for BigQuery/Snowflake
const result = await client.executeAsyncQuery(
  {
    sql: 'SELECT * FROM large_table',
    tags: {},               // Required
    timezone: 'UTC',        // Optional
    values: [],             // Optional: positional params
    queryParams: {}         // Optional: named params (ClickHouse)
  },
  (rows, fields) => {
    // Stream results as they arrive (BigQuery, Snowflake stream immediately)
    // For other warehouses, this is called after query completes
  }
);

// Result contains query metadata
console.log('Query ID:', result.queryId);           // string | null
console.log('Total rows:', result.totalRows);       // number
console.log('Duration:', result.durationMs, 'ms'); // number

// Later: retrieve paginated results (Snowflake only - native pagination support)
// Returns: Promise<WarehouseGetAsyncQueryResults<T>>
const page = await client.getAsyncQueryResults({
  sql: 'SELECT * FROM large_table',
  queryId: result.queryId,              // From executeAsyncQuery
  queryMetadata: result.queryMetadata,  // From executeAsyncQuery
  page: 1,                              // 1-indexed page number
  pageSize: 1000,                       // Rows per page
});
// page.rows, page.fields, page.pageCount, page.totalRows

Async Query Support by Warehouse:

  • BigQuery: Native async with job ID, results stream immediately via callback
  • Snowflake: Native async with query ID, supports pagination via getAsyncQueryResults
  • Others: Fallback implementation (streams after query completes, queryId is null)

SSH Tunnel (PostgreSQL/Redshift)

const credentials = {
  type: 'postgres',
  host: 'internal-db.local',
  port: 5432,
  // ... other credentials
  useSshTunnel: true,
  sshTunnelHost: 'bastion.example.com',
  sshTunnelUser: 'sshuser',
  sshTunnelPrivateKey: '-----BEGIN RSA PRIVATE KEY-----\n...',
};

const client = new PostgresWarehouseClient(credentials);
// SSH tunnel automatically managed

Type System Reference

Core Enums

// Warehouse type discriminator
enum WarehouseTypes {
  BIGQUERY = 'bigquery',
  CLICKHOUSE = 'clickhouse',
  DATABRICKS = 'databricks',
  POSTGRES = 'postgres',
  REDSHIFT = 'redshift',
  SNOWFLAKE = 'snowflake',
  TRINO = 'trino',
}

// DBT adapter types
enum SupportedDbtAdapter {
  BIGQUERY = 'bigquery',
  CLICKHOUSE = 'clickhouse',
  DATABRICKS = 'databricks',
  POSTGRES = 'postgres',
  REDSHIFT = 'redshift',
  SNOWFLAKE = 'snowflake',
  TRINO = 'trino',
}

// Normalized dimension types
enum DimensionType {
  STRING = 'string',
  NUMBER = 'number',
  TIMESTAMP = 'timestamp',
  DATE = 'date',
  BOOLEAN = 'boolean',
}

// Metric aggregation types
enum MetricType {
  PERCENTILE = 'percentile',
  AVERAGE = 'average',
  COUNT = 'count',
  COUNT_DISTINCT = 'count_distinct',
  SUM = 'sum',
  MIN = 'min',
  MAX = 'max',
  MEDIAN = 'median',
}

// Week start configuration (0=Monday, 6=Sunday)
enum WeekDay {
  MONDAY = 0,
  TUESDAY = 1,
  WEDNESDAY = 2,
  THURSDAY = 3,
  FRIDAY = 4,
  SATURDAY = 5,
  SUNDAY = 6,
}

// Partition types (BigQuery)
enum PartitionType {
  DATE = 'DATE',
  RANGE = 'RANGE',
}

Warehouse-Specific Type Enums

See Core Types for complete warehouse-specific type enums:

  • BigqueryFieldType: STRING, INT64, FLOAT64, BOOLEAN, TIMESTAMP, DATE, etc.
  • PostgresTypes: INTEGER, VARCHAR, NUMERIC, BOOLEAN, TIMESTAMP, etc.
  • SnowflakeTypes: NUMBER, STRING, TIMESTAMP, VARIANT, etc.
  • TrinoTypes: INTEGER, VARCHAR, DOUBLE, TIMESTAMP, etc.
  • ClickhouseTypes (not exported - internal)