or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

cli-interface.mdconfiguration.mddata-models.mddatabase-connectors.mdindex.mdsecurity.mdsql-lab.mdutilities.mdvisualization.mdweb-application.md

database-connectors.mddocs/

0

# Database Connectors

1

2

Connector framework supporting SQL databases through SQLAlchemy and Druid through native APIs. Provides unified interface for datasource registration, metadata discovery, query execution, and data exploration across diverse data sources.

3

4

## Capabilities

5

6

### Connector Registry

7

8

Central registry system for managing and accessing different datasource types with unified interface.

9

10

```python { .api }

11

class ConnectorRegistry:

12

"""

13

Central registry for datasource types and instances.

14

Manages registration and discovery of available connector implementations.

15

"""

16

17

def register_sources(self, datasource_config):

18

"""

19

Register datasource classes in the registry.

20

21

Parameters:

22

- datasource_config: dict, mapping of datasource types to implementation classes

23

24

Usage:

25

Typically called during application initialization to register

26

SQLAlchemy tables, Druid datasources, and custom connectors.

27

"""

28

29

def get_datasource(self, datasource_type, datasource_id, session):

30

"""

31

Get datasource instance by type and identifier.

32

33

Parameters:

34

- datasource_type: str, type identifier ('table', 'druid', etc.)

35

- datasource_id: int, datasource unique identifier

36

- session: SQLAlchemy session for database operations

37

38

Returns:

39

Datasource instance (SqlaTable, DruidDatasource, or custom type)

40

41

Raises:

42

DatasourceNotFound if datasource doesn't exist

43

"""

44

45

def get_all_datasources(self, session):

46

"""

47

Get all available datasource instances across all types.

48

49

Parameters:

50

- session: SQLAlchemy session for database operations

51

52

Returns:

53

List of all registered datasource instances

54

"""

55

56

def get_datasource_by_name(self, session, datasource_type, datasource_name, schema, database_name):

57

"""

58

Find datasource by name and context.

59

60

Parameters:

61

- session: SQLAlchemy session

62

- datasource_type: str, datasource type identifier

63

- datasource_name: str, datasource name

64

- schema: str, schema context (for SQL databases)

65

- database_name: str, database context

66

67

Returns:

68

Matching datasource instance or None if not found

69

"""

70

71

def query_datasources_by_permissions(self, session, database, permissions):

72

"""

73

Filter datasources by user permissions.

74

75

Parameters:

76

- session: SQLAlchemy session

77

- database: Database instance for context

78

- permissions: set, user permission strings

79

80

Returns:

81

List of accessible datasource instances

82

"""

83

84

def get_eager_datasource(self, session, datasource_type, datasource_id):

85

"""

86

Get datasource with eagerly loaded relationships.

87

88

Parameters:

89

- session: SQLAlchemy session

90

- datasource_type: str, datasource type

91

- datasource_id: int, datasource identifier

92

93

Returns:

94

Datasource instance with loaded columns, metrics, and relationships

95

"""

96

97

def query_datasources_by_name(self, session, database, datasource_name, schema):

98

"""

99

Query datasources by name pattern.

100

101

Parameters:

102

- session: SQLAlchemy session

103

- database: Database instance

104

- datasource_name: str, name pattern for matching

105

- schema: str, schema context

106

107

Returns:

108

Query object for further filtering and execution

109

"""

110

```

111

112

## SQLAlchemy Connector

113

114

SQL database connector supporting traditional relational databases through SQLAlchemy ORM.

115

116

```python { .api }

117

class SqlaTable:

118

"""

119

SQL table/view datasource with comprehensive metadata management.

120

121

Key Fields:

122

- table_name: str, name of database table or view

123

- main_dttm_col: str, primary datetime column for time-series operations

124

- default_endpoint: str, default API endpoint for data access

125

- database_id: int, foreign key to Database connection

126

- fetch_values_predicate: str, SQL WHERE clause for value fetching

127

- is_sqllab_view: bool, indicates if created from SQL Lab

128

- template_params: str, JSON-encoded Jinja template parameters

129

130

Relationships:

131

- columns: TableColumn[], table column definitions (one-to-many)

132

- metrics: SqlMetric[], calculated metric definitions (one-to-many)

133

- database: Database, database connection instance (many-to-one)

134

"""

135

136

def query(self):

137

"""

138

Execute queries against this datasource.

139

Core method for data retrieval with filtering, grouping, and aggregation.

140

141

Returns:

142

Query result object with data, metadata, and performance information

143

"""

144

145

def get_sqla_table(self):

146

"""

147

Get SQLAlchemy Table object.

148

149

Returns:

150

SQLAlchemy Table instance with column definitions and constraints

151

"""

152

153

def fetch_metadata(self):

154

"""

155

Update column metadata from database schema.

156

Discovers column names, types, and constraints from database catalog.

157

158

Side Effects:

159

Creates or updates TableColumn instances for all table columns

160

"""

161

162

def values_for_column(self):

163

"""

164

Get distinct column values for filter dropdowns.

165

166

Returns:

167

List of distinct values from specified column,

168

limited and filtered according to datasource configuration

169

"""

170

171

class TableColumn:

172

"""

173

Individual table column definition and metadata.

174

175

Key Fields:

176

- column_name: str, database column name

177

- type: str, SQLAlchemy data type string

178

- groupby: bool, available for grouping operations

179

- filterable: bool, available for filtering operations

180

- description: str, human-readable column description

181

- is_dttm: bool, indicates datetime/timestamp column

182

- python_date_format: str, Python strftime format for datetime parsing

183

- database_expression: str, custom SQL expression for computed columns

184

"""

185

186

class SqlMetric:

187

"""

188

Calculated metric definition using SQL expressions.

189

190

Key Fields:

191

- metric_name: str, display name for metric

192

- metric_type: str, aggregation type identifier

193

- expression: str, SQL expression for metric calculation

194

- description: str, metric description and documentation

195

- d3format: str, D3.js format string for number display

196

"""

197

```

198

199

## Druid Connector

200

201

Native Druid connector for real-time analytics and OLAP operations.

202

203

```python { .api }

204

class DruidDatasource:

205

"""

206

Druid datasource with native query interface.

207

208

Key Fields:

209

- datasource_name: str, name of Druid datasource

210

- cluster_name: str, Druid cluster identifier

211

- description: str, datasource description

212

- default_endpoint: str, default API endpoint

213

- fetch_values_from: str, method for fetching filter values

214

215

Relationships:

216

- columns: DruidColumn[], dimension definitions (one-to-many)

217

- metrics: DruidMetric[], metric aggregation definitions (one-to-many)

218

- cluster: DruidCluster, cluster connection configuration (many-to-one)

219

"""

220

221

class DruidCluster:

222

"""

223

Druid cluster connection configuration and management.

224

225

Key Fields:

226

- cluster_name: str, unique cluster identifier

227

- coordinator_host: str, Druid coordinator hostname

228

- coordinator_port: int, coordinator HTTP port

229

- coordinator_endpoint: str, coordinator API endpoint path

230

- broker_host: str, Druid broker hostname

231

- broker_port: int, broker HTTP port

232

- broker_endpoint: str, broker query endpoint path

233

- cache_timeout: int, default cache duration for queries

234

- verbose_name: str, human-readable cluster name

235

"""

236

237

class DruidColumn:

238

"""

239

Druid dimension column definition.

240

241

Key Fields:

242

- column_name: str, dimension name in Druid schema

243

- type: str, Druid dimension type (string, long, float, etc.)

244

- groupby: bool, available for grouping in queries

245

- filterable: bool, available for filtering operations

246

- description: str, dimension description

247

"""

248

249

class DruidMetric:

250

"""

251

Druid aggregation metric definition.

252

253

Key Fields:

254

- metric_name: str, metric display name

255

- metric_type: str, Druid aggregation type

256

- json: str, complete Druid aggregation JSON specification

257

- description: str, metric description and usage notes

258

- d3format: str, number formatting specification

259

"""

260

```

261

262

## Database Engine Specifications

263

264

Engine-specific configurations for different database systems.

265

266

```python { .api }

267

class BaseEngineSpec:

268

"""

269

Abstract base class for database engine specifications.

270

271

Key Properties:

272

- engine: str, SQLAlchemy engine identifier

273

- time_grain_functions: dict, time grouping function mappings

274

- time_groupby_inline: bool, inline time grouping support

275

- limit_method: enum, result limiting strategy

276

- time_secondary_columns: bool, secondary time column support

277

- inner_joins: bool, inner join capability flag

278

- allows_subquery: bool, subquery support indicator

279

- force_column_alias_quotes: bool, quoted alias requirement

280

- arraysize: int, default database cursor array size

281

"""

282

283

# Supported Database Engines

284

class PostgresEngineSpec(BaseEngineSpec):

285

"""PostgreSQL database engine specification."""

286

287

class MySQLEngineSpec(BaseEngineSpec):

288

"""MySQL/MariaDB database engine specification."""

289

290

class RedshiftEngineSpec(BaseEngineSpec):

291

"""Amazon Redshift data warehouse specification."""

292

293

class SnowflakeEngineSpec(BaseEngineSpec):

294

"""Snowflake cloud data warehouse specification."""

295

296

class BigQueryEngineSpec(BaseEngineSpec):

297

"""Google BigQuery specification."""

298

299

class PrestoEngineSpec(BaseEngineSpec):

300

"""Presto distributed SQL query engine specification."""

301

302

class HiveEngineSpec(BaseEngineSpec):

303

"""Apache Hive data warehouse specification."""

304

305

class DruidEngineSpec(BaseEngineSpec):

306

"""Apache Druid OLAP database specification."""

307

308

class ClickHouseEngineSpec(BaseEngineSpec):

309

"""ClickHouse columnar database specification."""

310

311

class OracleEngineSpec(BaseEngineSpec):

312

"""Oracle Database specification."""

313

314

class MssqlEngineSpec(BaseEngineSpec):

315

"""Microsoft SQL Server specification."""

316

```

317

318

## Time Grain Functions

319

320

Standardized time grouping capabilities across database engines.

321

322

```python { .api }

323

# Built-in Time Grains

324

TIME_GRAINS = {

325

'PT1S': 'Second',

326

'PT1M': 'Minute',

327

'PT5M': '5 Minutes',

328

'PT10M': '10 Minutes',

329

'PT15M': '15 Minutes',

330

'PT0.5H': '30 Minutes',

331

'PT1H': 'Hour',

332

'P1D': 'Day',

333

'P1W': 'Week',

334

'P1M': 'Month',

335

'P0.25Y': 'Quarter',

336

'P1Y': 'Year'

337

}

338

339

# Week Variations

340

WEEK_GRAINS = {

341

'1969-12-28T00:00:00Z/P1W': 'Week (Sunday Start)',

342

'1969-12-29T00:00:00Z/P1W': 'Week (Monday Start)',

343

'P1W/1970-01-03T00:00:00Z': 'Week (Saturday End)',

344

'P1W/1970-01-04T00:00:00Z': 'Week (Sunday End)'

345

}

346

```

347

348

## Query Limiting Methods

349

350

Different strategies for limiting query results based on database capabilities.

351

352

```python { .api }

353

class LimitMethod:

354

"""Query result limiting strategies."""

355

356

FETCH_MANY = 'fetch_many'

357

"""Use cursor.fetchmany() for result limiting."""

358

359

WRAP_SQL = 'wrap_sql'

360

"""Wrap query in LIMIT clause or equivalent."""

361

362

FORCE_LIMIT = 'force_limit'

363

"""Always apply limit regardless of query structure."""

364

```

365

366

## Usage Examples

367

368

### Registering Custom Connector

369

370

```python

371

from superset.connectors.connector_registry import ConnectorRegistry

372

373

# Register custom datasource type

374

ConnectorRegistry.register_sources({

375

'custom_type': CustomDatasourceClass

376

})

377

```

378

379

### Accessing Datasources

380

381

```python

382

from superset.connectors.connector_registry import ConnectorRegistry

383

384

# Get specific datasource

385

datasource = ConnectorRegistry.get_datasource(

386

datasource_type='table',

387

datasource_id=123,

388

session=db.session

389

)

390

391

# Get all accessible datasources

392

all_sources = ConnectorRegistry.get_all_datasources(db.session)

393

```

394

395

### Engine-Specific Operations

396

397

```python

398

# Get engine specification

399

engine_spec = database.db_engine_spec()

400

401

# Get available time grains

402

time_grains = engine_spec.time_grain_functions

403

404

# Check capabilities

405

supports_subqueries = engine_spec.allows_subquery

406

supports_joins = engine_spec.inner_joins

407

```

408

409

The connector framework provides a flexible and extensible architecture for integrating diverse data sources while maintaining a consistent interface for data exploration and visualization.