0
# SQL Lab
1
2
Interactive SQL editor with syntax highlighting, auto-completion, query history, and asynchronous execution capabilities. Provides comprehensive SQL development environment with result management, saved queries, and collaborative features for data exploration and analysis.
3
4
## Capabilities
5
6
### Query Execution
7
8
Core SQL execution engine with synchronous and asynchronous processing capabilities.
9
10
```python { .api }
11
def get_sql_results(ctask, query_id, rendered_query, return_results=True, store_results=False, user_name=None, start_time=None):
12
"""
13
Celery task for asynchronous SQL execution.
14
Main entry point for background query processing with comprehensive
15
error handling, progress tracking, and result storage.
16
17
Parameters:
18
- ctask: Celery task instance for progress updates
19
- query_id: str, unique identifier for query tracking
20
- rendered_query: str, final SQL text after template processing
21
- return_results: bool, whether to return data in response
22
- store_results: bool, whether to persist results for large datasets
23
- user_name: str, optional username for audit logging
24
- start_time: datetime, query initiation timestamp
25
26
Returns:
27
dict with query results, metadata, and execution statistics
28
"""
29
30
def execute_sql(ctask, query_id, rendered_query, return_results, store_results, user_name, session, start_time):
31
"""
32
Core SQL execution logic with database interaction.
33
Handles query parsing, security validation, execution,
34
and result processing with comprehensive error handling.
35
36
Parameters:
37
- ctask: Celery task instance
38
- query_id: str, query identifier
39
- rendered_query: str, executable SQL statement
40
- return_results: bool, include data in response
41
- store_results: bool, persist results to backend storage
42
- user_name: str, executing user identification
43
- session: SQLAlchemy session for database operations
44
- start_time: datetime, execution start timestamp
45
46
Returns:
47
dict with execution results, query metadata, and performance metrics
48
"""
49
50
def has_table_query(sql):
51
"""
52
Check if query creates or modifies tables.
53
Analyzes SQL for DDL operations requiring special permissions.
54
55
Parameters:
56
- sql: str, SQL query text to analyze
57
58
Returns:
59
bool, True if query contains table creation/modification operations
60
"""
61
62
def get_query(query_id, session, retry_count=5):
63
"""
64
Retrieve query object with retry logic.
65
Handles concurrent access and temporary database unavailability.
66
67
Parameters:
68
- query_id: str, query identifier to retrieve
69
- session: SQLAlchemy session for database access
70
- retry_count: int, maximum retry attempts for retrieval
71
72
Returns:
73
Query object or None if not found after retries
74
"""
75
76
def session_scope(nullpool):
77
"""
78
Context manager for database session management.
79
Ensures proper session cleanup and transaction handling.
80
81
Parameters:
82
- nullpool: bool, whether to use null connection pooling
83
84
Yields:
85
SQLAlchemy session configured for query operations
86
"""
87
```
88
89
### SQL Lab Views
90
91
Web interface endpoints for SQL Lab functionality and user interaction.
92
93
```python { .api }
94
class SqlLabView:
95
"""
96
SQL Lab web interface controller.
97
Provides REST API endpoints for SQL development environment
98
including query execution, result management, and metadata browsing.
99
"""
100
101
def index(self):
102
"""
103
SQL Lab main interface.
104
105
Returns:
106
HTML interface for SQL Lab with editor, results, and navigation
107
"""
108
109
def format_sql(self):
110
"""
111
Format SQL query text.
112
Applies standard formatting rules for improved readability.
113
114
HTTP Method: POST
115
116
Request Body:
117
- sql: str, SQL text to format
118
119
Returns:
120
JSON response with formatted SQL text
121
"""
122
123
def runsql(self):
124
"""
125
Execute SQL query synchronously.
126
Immediate execution for small queries with direct result return.
127
128
HTTP Method: POST
129
130
Request Body:
131
- database_id: int, target database identifier
132
- sql: str, SQL query text to execute
133
- schema: str, optional schema context
134
- tmp_table_name: str, optional temporary table name
135
- select_as_cta: bool, CREATE TABLE AS SELECT flag
136
- client_id: str, client session identifier
137
138
Returns:
139
JSON response with query results and execution metadata
140
"""
141
142
def schemas(self):
143
"""
144
List database schemas.
145
146
HTTP Method: GET
147
148
Query Parameters:
149
- database_id: int, database identifier
150
151
Returns:
152
JSON array of available schema names
153
"""
154
155
def tables(self, database_id, schema):
156
"""
157
List schema tables.
158
159
Parameters:
160
- database_id: int, database identifier
161
- schema: str, schema name to explore
162
163
Returns:
164
JSON array of table names in specified schema
165
"""
166
167
def table(self, database_id, schema, table):
168
"""
169
Get table metadata and column information.
170
171
Parameters:
172
- database_id: int, database identifier
173
- schema: str, schema name
174
- table: str, table name
175
176
Returns:
177
JSON object with table metadata, columns, and constraints
178
"""
179
180
def select_star(self, database_id, schema, table):
181
"""
182
Generate SELECT * query template.
183
184
Parameters:
185
- database_id: int, database identifier
186
- schema: str, schema name
187
- table: str, table name
188
189
HTTP Method: POST
190
191
Returns:
192
JSON response with generated SELECT query
193
"""
194
195
def queries(self, last_updated_ms):
196
"""
197
Get recent queries for current user.
198
199
Parameters:
200
- last_updated_ms: int, timestamp filter for incremental updates
201
202
Returns:
203
JSON array of recent query objects with status and metadata
204
"""
205
206
def sql_json(self):
207
"""
208
Execute SQL and return JSON results.
209
Asynchronous execution endpoint for complex queries.
210
211
HTTP Method: POST
212
213
Request Body:
214
- database_id: int, target database
215
- sql: str, SQL query text
216
- async: bool, asynchronous execution flag
217
- run_async: bool, force async execution
218
219
Returns:
220
JSON response with query_id for async tracking or immediate results
221
"""
222
223
def csv(self, query_id):
224
"""
225
Export query results as CSV.
226
227
Parameters:
228
- query_id: str, query identifier for result export
229
230
Returns:
231
CSV file download response with query results
232
"""
233
234
def results(self, query_id):
235
"""
236
Get asynchronous query results.
237
238
Parameters:
239
- query_id: str, query identifier to retrieve results for
240
241
Returns:
242
JSON response with query results, status, and metadata
243
"""
244
245
def stop_query(self):
246
"""
247
Stop running query execution.
248
249
HTTP Method: POST
250
251
Request Body:
252
- client_id: str, client session identifier
253
254
Returns:
255
JSON confirmation of query cancellation
256
"""
257
258
def copy_query(self, query_id):
259
"""
260
Copy existing query to new SQL Lab tab.
261
262
Parameters:
263
- query_id: str, source query identifier
264
265
HTTP Method: POST
266
267
Returns:
268
JSON response with copied query details
269
"""
270
```
271
272
### Query Status Management
273
274
Query lifecycle state tracking and status monitoring system.
275
276
```python { .api }
277
class QueryStatus:
278
"""
279
Query execution status enumeration.
280
Defines all possible states during query lifecycle.
281
"""
282
283
STOPPED = 'stopped'
284
"""Query execution was stopped by user or system."""
285
286
FAILED = 'failed'
287
"""Query execution failed due to error."""
288
289
PENDING = 'pending'
290
"""Query is queued for execution."""
291
292
RUNNING = 'running'
293
"""Query is currently executing."""
294
295
SCHEDULED = 'scheduled'
296
"""Query is scheduled for future execution."""
297
298
SUCCESS = 'success'
299
"""Query completed successfully."""
300
301
TIMED_OUT = 'timed_out'
302
"""Query exceeded maximum execution time."""
303
```
304
305
### SQL Parsing and Validation
306
307
Query analysis and security validation functionality.
308
309
```python { .api }
310
class SupersetQuery:
311
"""
312
SQL query analysis and validation utilities.
313
Provides parsing capabilities for security and feature detection.
314
315
Properties:
316
- tables: set, extracted table names from query
317
- limit: int, query LIMIT clause value if present
318
"""
319
320
def is_select(self):
321
"""
322
Check if query is a SELECT statement.
323
324
Returns:
325
bool, True for SELECT queries, False for DDL/DML
326
"""
327
328
def is_explain(self):
329
"""
330
Check if query is an EXPLAIN statement.
331
332
Returns:
333
bool, True for EXPLAIN queries
334
"""
335
336
def is_readonly(self):
337
"""
338
Check if query is read-only.
339
Validates that query doesn't modify data or schema.
340
341
Returns:
342
bool, True for read-only queries (SELECT, EXPLAIN, SHOW, etc.)
343
"""
344
345
def stripped(self):
346
"""
347
Get SQL text with comments and whitespace removed.
348
349
Returns:
350
str, cleaned SQL text for analysis
351
"""
352
353
def extract_tables_from_sql(sql):
354
"""
355
Extract table references from SQL query.
356
357
Parameters:
358
- sql: str, SQL query text to analyze
359
360
Returns:
361
set of table names referenced in the query
362
"""
363
364
def validate_sql(sql):
365
"""
366
Basic SQL validation and syntax checking.
367
368
Parameters:
369
- sql: str, SQL query to validate
370
371
Returns:
372
bool, True if SQL appears syntactically valid
373
374
Raises:
375
SupersetException for invalid SQL syntax
376
"""
377
```
378
379
### Result Management
380
381
Query result storage, caching, and export functionality.
382
383
```python { .api }
384
# Result Storage Configuration
385
RESULTS_BACKEND = {
386
'cache_type': 'redis',
387
'cache_key_prefix': 'superset_results',
388
'cache_timeout': 3600
389
}
390
"""
391
Configurable backend for storing large query results.
392
Supports Redis, S3, and custom storage implementations.
393
"""
394
395
# CSV Export Configuration
396
CSV_EXPORT = {
397
'encoding': 'utf-8',
398
'delimiter': ',',
399
'line_terminator': '\n',
400
'quote_char': '"',
401
'escape_char': '\\',
402
'max_rows': 100000
403
}
404
"""
405
CSV export format and limits configuration.
406
"""
407
408
# Query Result Caching
409
QUERY_CACHE_TIMEOUT = 3600
410
"""
411
Default timeout for caching query results in seconds.
412
"""
413
414
# Async Query Configuration
415
SQLLAB_ASYNC_TIME_LIMIT_SEC = 21600 # 6 hours
416
"""
417
Maximum execution time for asynchronous queries.
418
"""
419
420
SQL_MAX_ROW = 100000
421
"""
422
Maximum number of rows for async SQL results.
423
"""
424
```
425
426
## Usage Examples
427
428
### Synchronous Query Execution
429
430
```python
431
# Execute small query immediately
432
response = requests.post('/superset/runsql/', json={
433
'database_id': 1,
434
'sql': 'SELECT COUNT(*) FROM users',
435
'schema': 'public'
436
})
437
438
result = response.json()
439
if result['status'] == 'success':
440
data = result['data']
441
```
442
443
### Asynchronous Query Execution
444
445
```python
446
# Start long-running query
447
response = requests.post('/superset/sql_json/', json={
448
'database_id': 1,
449
'sql': 'SELECT * FROM large_table',
450
'async': True,
451
'run_async': True
452
})
453
454
query_id = response.json()['query']['queryId']
455
456
# Check query status
457
status_response = requests.get(f'/superset/results/{query_id}')
458
status = status_response.json()
459
460
if status['status'] == 'success':
461
results = status['data']
462
elif status['status'] == 'running':
463
# Query still executing, check again later
464
pass
465
```
466
467
### Query Result Export
468
469
```python
470
# Export results as CSV
471
csv_response = requests.get(f'/superset/csv/{query_id}')
472
with open('results.csv', 'wb') as f:
473
f.write(csv_response.content)
474
```
475
476
### Database Metadata Browsing
477
478
```python
479
# Get available schemas
480
schemas = requests.get('/superset/schemas/', params={
481
'database_id': 1
482
}).json()
483
484
# Get tables in schema
485
tables = requests.get(f'/superset/tables/1/public/').json()
486
487
# Get table metadata
488
table_info = requests.get('/superset/table/1/public/users').json()
489
columns = table_info['columns']
490
```
491
492
## Configuration
493
494
### SQL Lab Settings
495
496
```python
497
# Enable SQL Lab
498
SQLLAB_ASYNC_TIME_LIMIT_SEC = 21600
499
500
# Default database for SQL Lab
501
SQLLAB_DEFAULT_DBID = 1
502
503
# Synchronous query timeout
504
SQLLAB_TIMEOUT = 30
505
506
# Maximum rows for display
507
DISPLAY_MAX_ROW = 1000
508
509
# Query search limit
510
QUERY_SEARCH_LIMIT = 1000
511
```
512
513
### Security Configuration
514
515
```python
516
# SQL Lab permissions
517
SQLLAB_PERMISSIONS = [
518
'can_sql_json',
519
'can_csv',
520
'can_search_queries'
521
]
522
523
# Allow DML operations
524
PREVENT_UNSAFE_DB_CONNECTIONS = True
525
526
# Query validation rules
527
SQLLAB_CTAS_NO_LIMIT = True
528
```
529
530
## Integration
531
532
SQL Lab integrates with:
533
534
- **Celery**: Asynchronous query processing and task management
535
- **Redis/Database**: Result caching and temporary storage
536
- **Security Manager**: Permission-based database and table access
537
- **Database Connectors**: Multi-engine SQL execution support
538
- **Frontend**: React-based SQL editor with syntax highlighting
539
- **Export Systems**: CSV, JSON, and custom result format support
540
541
The SQL Lab provides a comprehensive SQL development environment enabling data analysts and engineers to explore datasets, develop queries, and collaborate on data analysis within the Superset ecosystem.