0
# Utility Functions
1
2
Additional tools for SQL analysis, comparison, and manipulation including AST diffing, column lineage analysis, time parsing utilities, and error handling for comprehensive SQL processing workflows.
3
4
## Capabilities
5
6
### AST Diffing and Comparison
7
8
Compare SQL expressions and identify differences between query structures.
9
10
```python { .api }
11
def diff(
12
source: Expression,
13
target: Expression,
14
**opts
15
) -> str:
16
"""
17
Compare two SQL expressions and generate diff output.
18
19
Args:
20
source (Expression): Source expression for comparison
21
target (Expression): Target expression for comparison
22
**opts: Diff formatting options
23
24
Returns:
25
str: Human-readable diff showing changes between expressions
26
"""
27
```
28
29
### Column Lineage Analysis
30
31
Trace column dependencies and data flow through complex SQL queries.
32
33
```python { .api }
34
def lineage(
35
sql: str,
36
schema: Optional[Schema] = None,
37
**opts
38
) -> Dict:
39
"""
40
Analyze column lineage and dependencies in SQL query.
41
42
Args:
43
sql (str): SQL query to analyze
44
schema (Schema): Schema for column resolution
45
**opts: Lineage analysis options
46
47
Returns:
48
Dict: Column lineage mapping showing data flow
49
"""
50
```
51
52
### Error Handling Classes
53
54
Comprehensive error handling for SQL processing operations.
55
56
```python { .api }
57
class SqlglotError(Exception):
58
"""Base exception class for all SQLGlot errors."""
59
60
class ParseError(SqlglotError):
61
"""Exception raised when SQL parsing fails."""
62
63
def __init__(
64
self,
65
message: str,
66
errors: Optional[List[Dict[str, Any]]] = None
67
):
68
"""
69
Initialize parse error with message and detailed error information.
70
71
Args:
72
message (str): Error message
73
errors (List[Dict]): List of detailed error dictionaries
74
"""
75
76
@classmethod
77
def new(
78
cls,
79
message: str,
80
description: Optional[str] = None,
81
line: Optional[int] = None,
82
col: Optional[int] = None,
83
start_context: Optional[str] = None,
84
highlight: Optional[str] = None,
85
end_context: Optional[str] = None,
86
into_expression: Optional[str] = None
87
) -> ParseError:
88
"""Create new ParseError with detailed location information."""
89
90
class TokenError(SqlglotError):
91
"""Exception raised when tokenization fails."""
92
93
class UnsupportedError(SqlglotError):
94
"""Exception raised when encountering unsupported SQL features."""
95
96
class OptimizeError(SqlglotError):
97
"""Exception raised during query optimization."""
98
99
class SchemaError(SqlglotError):
100
"""Exception raised for schema-related issues."""
101
102
class ExecuteError(SqlglotError):
103
"""Exception raised during SQL execution."""
104
105
class ErrorLevel:
106
"""Error handling levels for parser and validator."""
107
108
IGNORE: str # Ignore all errors
109
WARN: str # Log errors but continue processing
110
RAISE: str # Collect all errors and raise single exception
111
IMMEDIATE: str # Raise exception immediately on first error
112
```
113
114
### Helper Utilities
115
116
General utility functions and classes for SQL processing.
117
118
```python { .api }
119
class AutoName:
120
"""Base class for auto-naming enumerations."""
121
122
def dict_depth(mapping: Dict) -> int:
123
"""
124
Calculate maximum nesting depth of dictionary.
125
126
Args:
127
mapping (Dict): Dictionary to analyze
128
129
Returns:
130
int: Maximum nesting depth
131
"""
132
133
def first(iterable) -> Any:
134
"""
135
Return first element from iterable.
136
137
Args:
138
iterable: Iterable to get first element from
139
140
Returns:
141
Any: First element or None if empty
142
"""
143
144
def concat_messages(errors: List[Any], maximum: int) -> str:
145
"""
146
Concatenate error messages with truncation.
147
148
Args:
149
errors (List): List of error objects
150
maximum (int): Maximum number of errors to include
151
152
Returns:
153
str: Concatenated error message string
154
"""
155
156
def merge_errors(errors: List[ParseError]) -> List[Dict[str, Any]]:
157
"""
158
Merge multiple ParseError instances into unified error list.
159
160
Args:
161
errors (List[ParseError]): Parse errors to merge
162
163
Returns:
164
List[Dict]: Merged error information dictionaries
165
"""
166
```
167
168
### Time and Date Utilities
169
170
Specialized utilities for handling SQL time and date operations.
171
172
```python { .api }
173
# Time parsing and formatting utilities for SQL time literals
174
# Located in sqlglot.time module - provides functions for:
175
# - Parsing time strings in various formats
176
# - Converting between time representations
177
# - Formatting time values for different SQL dialects
178
```
179
180
### Transformation Utilities
181
182
Various SQL transformation and manipulation utilities.
183
184
```python { .api }
185
# SQL transformation utilities for advanced query manipulation
186
# Located in sqlglot.transforms module - provides functions for:
187
# - Custom expression transformations
188
# - Dialect-specific syntax conversions
189
# - Advanced AST manipulation patterns
190
```
191
192
### JSON Path Support
193
194
JSON path expression support for SQL queries with JSON data.
195
196
```python { .api }
197
# JSON path expression utilities for SQL JSON operations
198
# Located in sqlglot.jsonpath module - provides:
199
# - JSON path parsing and evaluation
200
# - Integration with SQL JSON functions
201
# - Cross-dialect JSON path support
202
```
203
204
### Data Structure Utilities
205
206
Trie and other data structure utilities for efficient processing.
207
208
```python { .api }
209
def new_trie() -> Dict:
210
"""
211
Create new trie data structure.
212
213
Returns:
214
Dict: Empty trie structure
215
"""
216
217
def in_trie(trie: Dict, key: str) -> TrieResult:
218
"""
219
Check if key exists in trie structure.
220
221
Args:
222
trie (Dict): Trie to search
223
key (str): Key to search for
224
225
Returns:
226
TrieResult: Result indicating presence and type of match
227
"""
228
229
class TrieResult:
230
"""Result of trie lookup operation."""
231
232
def __init__(self, exists: bool, prefix: bool = False):
233
"""
234
Initialize trie result.
235
236
Args:
237
exists (bool): Whether key exists in trie
238
prefix (bool): Whether key is a prefix of existing keys
239
"""
240
```
241
242
### Serialization Utilities
243
244
Serialization and deserialization utilities for expressions.
245
246
```python { .api }
247
# Serialization utilities for SQLGlot expressions
248
# Located in sqlglot.serde module - provides:
249
# - Expression serialization to JSON/dict format
250
# - Deserialization from stored representations
251
# - Cross-process expression transfer support
252
```
253
254
## Usage Examples
255
256
### Expression Diffing
257
258
```python
259
import sqlglot
260
from sqlglot.diff import diff
261
262
# Compare two similar queries
263
query1 = sqlglot.parse_one("SELECT name, age FROM users WHERE age > 25")
264
query2 = sqlglot.parse_one("SELECT name, age, email FROM users WHERE age >= 25")
265
266
# Generate diff
267
diff_output = diff(query1, query2)
268
print(diff_output)
269
270
# Compare more complex queries
271
original = sqlglot.parse_one("""
272
SELECT u.name, COUNT(o.id) as order_count
273
FROM users u
274
LEFT JOIN orders o ON u.id = o.user_id
275
WHERE u.active = 1
276
GROUP BY u.name
277
""")
278
279
modified = sqlglot.parse_one("""
280
SELECT u.name, u.email, COUNT(o.id) as total_orders
281
FROM users u
282
LEFT JOIN orders o ON u.id = o.user_id
283
WHERE u.active = 1 AND u.verified = 1
284
GROUP BY u.name, u.email
285
""")
286
287
print(diff(original, modified))
288
```
289
290
### Column Lineage Analysis
291
292
```python
293
import sqlglot
294
from sqlglot.lineage import lineage
295
from sqlglot.schema import MappingSchema
296
297
# Define schema for lineage analysis
298
schema = MappingSchema({
299
"raw_data": {
300
"user_id": "INT",
301
"first_name": "VARCHAR",
302
"last_name": "VARCHAR",
303
"email": "VARCHAR",
304
"signup_date": "DATE"
305
},
306
"orders": {
307
"id": "INT",
308
"user_id": "INT",
309
"amount": "DECIMAL",
310
"order_date": "DATE"
311
}
312
})
313
314
# Analyze complex query lineage
315
sql = """
316
SELECT
317
CONCAT(u.first_name, ' ', u.last_name) as full_name,
318
u.email,
319
COUNT(o.id) as total_orders,
320
SUM(o.amount) as total_spent,
321
AVG(o.amount) as avg_order_value,
322
MAX(o.order_date) as last_order_date
323
FROM raw_data u
324
LEFT JOIN orders o ON u.user_id = o.user_id
325
WHERE u.signup_date >= '2023-01-01'
326
GROUP BY u.user_id, u.first_name, u.last_name, u.email
327
"""
328
329
# Get lineage information
330
lineage_info = lineage(sql, schema=schema)
331
print("Column lineage:")
332
for output_col, source_cols in lineage_info.items():
333
print(f"{output_col} <- {source_cols}")
334
```
335
336
### Error Handling and Validation
337
338
```python
339
import sqlglot
340
from sqlglot import ParseError, UnsupportedError, ErrorLevel
341
342
# Handle parsing errors gracefully
343
def safe_parse(sql_queries):
344
results = []
345
errors = []
346
347
for sql in sql_queries:
348
try:
349
parsed = sqlglot.parse_one(sql)
350
results.append(parsed)
351
except ParseError as e:
352
errors.append(f"Parse error in '{sql}': {e}")
353
except UnsupportedError as e:
354
errors.append(f"Unsupported feature in '{sql}': {e}")
355
356
return results, errors
357
358
# Test with mixed valid/invalid SQL
359
test_queries = [
360
"SELECT * FROM users",
361
"SELECT FROM", # Invalid
362
"SELECT name FROM users WHERE age > 25",
363
"INVALID SQL SYNTAX", # Invalid
364
]
365
366
parsed_queries, parse_errors = safe_parse(test_queries)
367
print(f"Successfully parsed: {len(parsed_queries)} queries")
368
print(f"Errors: {len(parse_errors)}")
369
for error in parse_errors:
370
print(f" {error}")
371
372
# Use different error levels
373
try:
374
# Parse with warning level - continues on errors
375
expressions = sqlglot.parse(
376
"SELECT 1; INVALID; SELECT 2;",
377
error_level=ErrorLevel.WARN
378
)
379
print(f"Parsed {len([e for e in expressions if e])} valid expressions")
380
except Exception as e:
381
print(f"Error: {e}")
382
```
383
384
### Working with Helper Utilities
385
386
```python
387
import sqlglot
388
from sqlglot.helper import dict_depth, first
389
390
# Calculate nested dictionary depth
391
schema_dict = {
392
"database1": {
393
"schema1": {
394
"table1": {"col1": "INT", "col2": "VARCHAR"}
395
}
396
},
397
"database2": {
398
"schema1": {
399
"table1": {"col1": "INT"},
400
"table2": {"col1": "INT", "col2": "VARCHAR", "col3": "DATE"}
401
}
402
}
403
}
404
405
depth = dict_depth(schema_dict)
406
print(f"Schema depth: {depth}")
407
408
# Get first valid expression from parse results
409
sql_statements = "INVALID; SELECT 1; SELECT 2;"
410
expressions = sqlglot.parse(sql_statements, error_level=ErrorLevel.WARN)
411
first_valid = first(expr for expr in expressions if expr is not None)
412
if first_valid:
413
print(f"First valid expression: {first_valid.sql()}")
414
```
415
416
### Trie Data Structure Usage
417
418
```python
419
from sqlglot.trie import new_trie, in_trie
420
421
# Create trie for keyword matching
422
keywords_trie = new_trie()
423
424
# Add keywords (this would typically be done internally)
425
keywords = ["SELECT", "FROM", "WHERE", "GROUP", "ORDER", "HAVING"]
426
# Note: Actual trie population would use internal SQLGlot methods
427
428
# Check for keyword presence
429
for word in ["SELECT", "SELEC", "SELECTED", "FROM", "TABLE"]:
430
result = in_trie(keywords_trie, word.upper())
431
if result.exists:
432
print(f"'{word}' is a complete keyword")
433
elif result.prefix:
434
print(f"'{word}' is a keyword prefix")
435
else:
436
print(f"'{word}' is not a keyword or prefix")
437
```
438
439
### Advanced Error Information
440
441
```python
442
import sqlglot
443
from sqlglot import ParseError
444
445
# Parse with detailed error information
446
invalid_sql = """
447
SELECT name, age
448
FROM users
449
WHERE age > 25 AND
450
status = 'active' AND
451
created_date >= '2023-01-01'
452
missing_clause
453
"""
454
455
try:
456
parsed = sqlglot.parse_one(invalid_sql)
457
except ParseError as e:
458
print(f"Parse error: {e}")
459
460
# Access detailed error information
461
for error_detail in e.errors:
462
print(f" Line: {error_detail.get('line')}")
463
print(f" Column: {error_detail.get('col')}")
464
print(f" Description: {error_detail.get('description')}")
465
print(f" Context: {error_detail.get('start_context')}")
466
print(f" Highlight: {error_detail.get('highlight')}")
467
```
468
469
## Types
470
471
```python { .api }
472
class SqlglotError(Exception):
473
"""Base exception for all SQLGlot errors."""
474
475
class ParseError(SqlglotError):
476
"""SQL parsing error with detailed information."""
477
478
errors: List[Dict[str, Any]] # Detailed error information
479
480
def __init__(self, message: str, errors: Optional[List[Dict]] = None): ...
481
482
class TokenError(SqlglotError):
483
"""Tokenization error."""
484
485
class UnsupportedError(SqlglotError):
486
"""Unsupported feature error."""
487
488
class OptimizeError(SqlglotError):
489
"""Query optimization error."""
490
491
class SchemaError(SqlglotError):
492
"""Schema-related error."""
493
494
class ExecuteError(SqlglotError):
495
"""SQL execution error."""
496
497
class ErrorLevel:
498
"""Error handling level enumeration."""
499
500
IGNORE: str # Ignore all errors
501
WARN: str # Log errors but continue
502
RAISE: str # Collect errors and raise exception
503
IMMEDIATE: str # Raise immediately on first error
504
505
class TrieResult:
506
"""Result of trie lookup operation."""
507
508
exists: bool # Whether key exists completely in trie
509
prefix: bool # Whether key is prefix of existing keys
510
511
def __init__(self, exists: bool, prefix: bool = False): ...
512
513
class AutoName:
514
"""Base class for auto-naming enumerations."""
515
```