0
# Database Connectors
1
2
Connector framework supporting SQL databases through SQLAlchemy and Druid through native APIs. Provides unified interface for datasource registration, metadata discovery, query execution, and data exploration across diverse data sources.
3
4
## Capabilities
5
6
### Connector Registry
7
8
Central registry system for managing and accessing different datasource types with unified interface.
9
10
```python { .api }
11
class ConnectorRegistry:
12
"""
13
Central registry for datasource types and instances.
14
Manages registration and discovery of available connector implementations.
15
"""
16
17
def register_sources(self, datasource_config):
18
"""
19
Register datasource classes in the registry.
20
21
Parameters:
22
- datasource_config: dict, mapping of datasource types to implementation classes
23
24
Usage:
25
Typically called during application initialization to register
26
SQLAlchemy tables, Druid datasources, and custom connectors.
27
"""
28
29
def get_datasource(self, datasource_type, datasource_id, session):
30
"""
31
Get datasource instance by type and identifier.
32
33
Parameters:
34
- datasource_type: str, type identifier ('table', 'druid', etc.)
35
- datasource_id: int, datasource unique identifier
36
- session: SQLAlchemy session for database operations
37
38
Returns:
39
Datasource instance (SqlaTable, DruidDatasource, or custom type)
40
41
Raises:
42
DatasourceNotFound if datasource doesn't exist
43
"""
44
45
def get_all_datasources(self, session):
46
"""
47
Get all available datasource instances across all types.
48
49
Parameters:
50
- session: SQLAlchemy session for database operations
51
52
Returns:
53
List of all registered datasource instances
54
"""
55
56
def get_datasource_by_name(self, session, datasource_type, datasource_name, schema, database_name):
57
"""
58
Find datasource by name and context.
59
60
Parameters:
61
- session: SQLAlchemy session
62
- datasource_type: str, datasource type identifier
63
- datasource_name: str, datasource name
64
- schema: str, schema context (for SQL databases)
65
- database_name: str, database context
66
67
Returns:
68
Matching datasource instance or None if not found
69
"""
70
71
def query_datasources_by_permissions(self, session, database, permissions):
72
"""
73
Filter datasources by user permissions.
74
75
Parameters:
76
- session: SQLAlchemy session
77
- database: Database instance for context
78
- permissions: set, user permission strings
79
80
Returns:
81
List of accessible datasource instances
82
"""
83
84
def get_eager_datasource(self, session, datasource_type, datasource_id):
85
"""
86
Get datasource with eagerly loaded relationships.
87
88
Parameters:
89
- session: SQLAlchemy session
90
- datasource_type: str, datasource type
91
- datasource_id: int, datasource identifier
92
93
Returns:
94
Datasource instance with loaded columns, metrics, and relationships
95
"""
96
97
def query_datasources_by_name(self, session, database, datasource_name, schema):
98
"""
99
Query datasources by name pattern.
100
101
Parameters:
102
- session: SQLAlchemy session
103
- database: Database instance
104
- datasource_name: str, name pattern for matching
105
- schema: str, schema context
106
107
Returns:
108
Query object for further filtering and execution
109
"""
110
```
111
112
## SQLAlchemy Connector
113
114
SQL database connector supporting traditional relational databases through SQLAlchemy ORM.
115
116
```python { .api }
117
class SqlaTable:
118
"""
119
SQL table/view datasource with comprehensive metadata management.
120
121
Key Fields:
122
- table_name: str, name of database table or view
123
- main_dttm_col: str, primary datetime column for time-series operations
124
- default_endpoint: str, default API endpoint for data access
125
- database_id: int, foreign key to Database connection
126
- fetch_values_predicate: str, SQL WHERE clause for value fetching
127
- is_sqllab_view: bool, indicates if created from SQL Lab
128
- template_params: str, JSON-encoded Jinja template parameters
129
130
Relationships:
131
- columns: TableColumn[], table column definitions (one-to-many)
132
- metrics: SqlMetric[], calculated metric definitions (one-to-many)
133
- database: Database, database connection instance (many-to-one)
134
"""
135
136
def query(self):
137
"""
138
Execute queries against this datasource.
139
Core method for data retrieval with filtering, grouping, and aggregation.
140
141
Returns:
142
Query result object with data, metadata, and performance information
143
"""
144
145
def get_sqla_table(self):
146
"""
147
Get SQLAlchemy Table object.
148
149
Returns:
150
SQLAlchemy Table instance with column definitions and constraints
151
"""
152
153
def fetch_metadata(self):
154
"""
155
Update column metadata from database schema.
156
Discovers column names, types, and constraints from database catalog.
157
158
Side Effects:
159
Creates or updates TableColumn instances for all table columns
160
"""
161
162
def values_for_column(self):
163
"""
164
Get distinct column values for filter dropdowns.
165
166
Returns:
167
List of distinct values from specified column,
168
limited and filtered according to datasource configuration
169
"""
170
171
class TableColumn:
172
"""
173
Individual table column definition and metadata.
174
175
Key Fields:
176
- column_name: str, database column name
177
- type: str, SQLAlchemy data type string
178
- groupby: bool, available for grouping operations
179
- filterable: bool, available for filtering operations
180
- description: str, human-readable column description
181
- is_dttm: bool, indicates datetime/timestamp column
182
- python_date_format: str, Python strftime format for datetime parsing
183
- database_expression: str, custom SQL expression for computed columns
184
"""
185
186
class SqlMetric:
187
"""
188
Calculated metric definition using SQL expressions.
189
190
Key Fields:
191
- metric_name: str, display name for metric
192
- metric_type: str, aggregation type identifier
193
- expression: str, SQL expression for metric calculation
194
- description: str, metric description and documentation
195
- d3format: str, D3.js format string for number display
196
"""
197
```
198
199
## Druid Connector
200
201
Native Druid connector for real-time analytics and OLAP operations.
202
203
```python { .api }
204
class DruidDatasource:
205
"""
206
Druid datasource with native query interface.
207
208
Key Fields:
209
- datasource_name: str, name of Druid datasource
210
- cluster_name: str, Druid cluster identifier
211
- description: str, datasource description
212
- default_endpoint: str, default API endpoint
213
- fetch_values_from: str, method for fetching filter values
214
215
Relationships:
216
- columns: DruidColumn[], dimension definitions (one-to-many)
217
- metrics: DruidMetric[], metric aggregation definitions (one-to-many)
218
- cluster: DruidCluster, cluster connection configuration (many-to-one)
219
"""
220
221
class DruidCluster:
222
"""
223
Druid cluster connection configuration and management.
224
225
Key Fields:
226
- cluster_name: str, unique cluster identifier
227
- coordinator_host: str, Druid coordinator hostname
228
- coordinator_port: int, coordinator HTTP port
229
- coordinator_endpoint: str, coordinator API endpoint path
230
- broker_host: str, Druid broker hostname
231
- broker_port: int, broker HTTP port
232
- broker_endpoint: str, broker query endpoint path
233
- cache_timeout: int, default cache duration for queries
234
- verbose_name: str, human-readable cluster name
235
"""
236
237
class DruidColumn:
238
"""
239
Druid dimension column definition.
240
241
Key Fields:
242
- column_name: str, dimension name in Druid schema
243
- type: str, Druid dimension type (string, long, float, etc.)
244
- groupby: bool, available for grouping in queries
245
- filterable: bool, available for filtering operations
246
- description: str, dimension description
247
"""
248
249
class DruidMetric:
250
"""
251
Druid aggregation metric definition.
252
253
Key Fields:
254
- metric_name: str, metric display name
255
- metric_type: str, Druid aggregation type
256
- json: str, complete Druid aggregation JSON specification
257
- description: str, metric description and usage notes
258
- d3format: str, number formatting specification
259
"""
260
```
261
262
## Database Engine Specifications
263
264
Engine-specific configurations for different database systems.
265
266
```python { .api }
267
class BaseEngineSpec:
268
"""
269
Abstract base class for database engine specifications.
270
271
Key Properties:
272
- engine: str, SQLAlchemy engine identifier
273
- time_grain_functions: dict, time grouping function mappings
274
- time_groupby_inline: bool, inline time grouping support
275
- limit_method: enum, result limiting strategy
276
- time_secondary_columns: bool, secondary time column support
277
- inner_joins: bool, inner join capability flag
278
- allows_subquery: bool, subquery support indicator
279
- force_column_alias_quotes: bool, quoted alias requirement
280
- arraysize: int, default database cursor array size
281
"""
282
283
# Supported Database Engines
284
class PostgresEngineSpec(BaseEngineSpec):
285
"""PostgreSQL database engine specification."""
286
287
class MySQLEngineSpec(BaseEngineSpec):
288
"""MySQL/MariaDB database engine specification."""
289
290
class RedshiftEngineSpec(BaseEngineSpec):
291
"""Amazon Redshift data warehouse specification."""
292
293
class SnowflakeEngineSpec(BaseEngineSpec):
294
"""Snowflake cloud data warehouse specification."""
295
296
class BigQueryEngineSpec(BaseEngineSpec):
297
"""Google BigQuery specification."""
298
299
class PrestoEngineSpec(BaseEngineSpec):
300
"""Presto distributed SQL query engine specification."""
301
302
class HiveEngineSpec(BaseEngineSpec):
303
"""Apache Hive data warehouse specification."""
304
305
class DruidEngineSpec(BaseEngineSpec):
306
"""Apache Druid OLAP database specification."""
307
308
class ClickHouseEngineSpec(BaseEngineSpec):
309
"""ClickHouse columnar database specification."""
310
311
class OracleEngineSpec(BaseEngineSpec):
312
"""Oracle Database specification."""
313
314
class MssqlEngineSpec(BaseEngineSpec):
315
"""Microsoft SQL Server specification."""
316
```
317
318
## Time Grain Functions
319
320
Standardized time grouping capabilities across database engines.
321
322
```python { .api }
323
# Built-in Time Grains
324
TIME_GRAINS = {
325
'PT1S': 'Second',
326
'PT1M': 'Minute',
327
'PT5M': '5 Minutes',
328
'PT10M': '10 Minutes',
329
'PT15M': '15 Minutes',
330
'PT0.5H': '30 Minutes',
331
'PT1H': 'Hour',
332
'P1D': 'Day',
333
'P1W': 'Week',
334
'P1M': 'Month',
335
'P0.25Y': 'Quarter',
336
'P1Y': 'Year'
337
}
338
339
# Week Variations
340
WEEK_GRAINS = {
341
'1969-12-28T00:00:00Z/P1W': 'Week (Sunday Start)',
342
'1969-12-29T00:00:00Z/P1W': 'Week (Monday Start)',
343
'P1W/1970-01-03T00:00:00Z': 'Week (Saturday End)',
344
'P1W/1970-01-04T00:00:00Z': 'Week (Sunday End)'
345
}
346
```
347
348
## Query Limiting Methods
349
350
Different strategies for limiting query results based on database capabilities.
351
352
```python { .api }
353
class LimitMethod:
354
"""Query result limiting strategies."""
355
356
FETCH_MANY = 'fetch_many'
357
"""Use cursor.fetchmany() for result limiting."""
358
359
WRAP_SQL = 'wrap_sql'
360
"""Wrap query in LIMIT clause or equivalent."""
361
362
FORCE_LIMIT = 'force_limit'
363
"""Always apply limit regardless of query structure."""
364
```
365
366
## Usage Examples
367
368
### Registering Custom Connector
369
370
```python
371
from superset.connectors.connector_registry import ConnectorRegistry
372
373
# Register custom datasource type
374
ConnectorRegistry.register_sources({
375
'custom_type': CustomDatasourceClass
376
})
377
```
378
379
### Accessing Datasources
380
381
```python
382
from superset.connectors.connector_registry import ConnectorRegistry
383
384
# Get specific datasource
385
datasource = ConnectorRegistry.get_datasource(
386
datasource_type='table',
387
datasource_id=123,
388
session=db.session
389
)
390
391
# Get all accessible datasources
392
all_sources = ConnectorRegistry.get_all_datasources(db.session)
393
```
394
395
### Engine-Specific Operations
396
397
```python
398
# Get engine specification
399
engine_spec = database.db_engine_spec()
400
401
# Get available time grains
402
time_grains = engine_spec.time_grain_functions
403
404
# Check capabilities
405
supports_subqueries = engine_spec.allows_subquery
406
supports_joins = engine_spec.inner_joins
407
```
408
409
The connector framework provides a flexible and extensible architecture for integrating diverse data sources while maintaining a consistent interface for data exploration and visualization.