0
# SQL Optimization
1
2
Comprehensive query optimization engine with 15+ optimization rules for improving query performance, simplifying expressions, and analyzing query structure. The optimizer transforms SQL expressions to equivalent but more efficient forms.
3
4
## Capabilities
5
6
### Main Optimization Engine
7
8
Core optimization functionality that applies multiple optimization rules to SQL expressions.
9
10
```python { .api }
11
def optimize(
12
sql: str | Expression,
13
schema: Optional[Schema] = None,
14
dialect: str = None,
15
**opts
16
) -> Expression:
17
"""
18
Applies optimization rules to SQL expressions for improved performance.
19
20
Args:
21
sql: SQL string or Expression to optimize
22
schema (Schema): Database schema for column/table resolution
23
dialect (str): SQL dialect for parsing and optimization
24
**opts: Additional optimization options
25
26
Returns:
27
Expression: Optimized expression tree
28
"""
29
30
RULES: List[Callable] = [
31
# List of available optimization rules
32
# Applied in sequence for maximum effectiveness
33
]
34
```
35
36
### Type Annotation and Analysis
37
38
Add type information to expressions for better optimization and validation.
39
40
```python { .api }
41
def annotate_types(
42
expression: Expression,
43
schema: Optional[Schema] = None,
44
**opts
45
) -> Expression:
46
"""
47
Annotates expressions with type information from schema.
48
49
Args:
50
expression (Expression): Expression to annotate
51
schema (Schema): Schema providing type information
52
**opts: Type annotation options
53
54
Returns:
55
Expression: Expression with type annotations
56
"""
57
```
58
59
### Column and Table Qualification
60
61
Resolve column and table references using schema information.
62
63
```python { .api }
64
def qualify_columns(
65
expression: Expression,
66
schema: Schema,
67
**opts
68
) -> Expression:
69
"""
70
Qualifies column references with table names using schema.
71
72
Args:
73
expression (Expression): Expression to qualify
74
schema (Schema): Schema for column resolution
75
**opts: Qualification options
76
77
Returns:
78
Expression: Expression with qualified column references
79
"""
80
81
def qualify_tables(
82
expression: Expression,
83
schema: Optional[Schema] = None,
84
**opts
85
) -> Expression:
86
"""
87
Qualifies table references with database/schema names.
88
89
Args:
90
expression (Expression): Expression to qualify
91
schema (Schema): Schema for table resolution
92
**opts: Qualification options
93
94
Returns:
95
Expression: Expression with qualified table references
96
"""
97
```
98
99
### Predicate Optimization
100
101
Optimize WHERE clause conditions for better performance.
102
103
```python { .api }
104
def pushdown_predicates(expression: Expression, **opts) -> Expression:
105
"""
106
Pushes WHERE predicates down to lowest possible level in query tree.
107
108
Args:
109
expression (Expression): Expression to optimize
110
**opts: Pushdown options
111
112
Returns:
113
Expression: Expression with optimized predicate placement
114
"""
115
116
def pushdown_projections(expression: Expression, **opts) -> Expression:
117
"""
118
Pushes SELECT projections down to reduce data movement.
119
120
Args:
121
expression (Expression): Expression to optimize
122
**opts: Projection pushdown options
123
124
Returns:
125
Expression: Expression with optimized projections
126
"""
127
```
128
129
### Subquery Optimization
130
131
Eliminate and optimize subqueries for better performance.
132
133
```python { .api }
134
def eliminate_subqueries(expression: Expression, **opts) -> Expression:
135
"""
136
Eliminates unnecessary subqueries by flattening or merging.
137
138
Args:
139
expression (Expression): Expression to optimize
140
**opts: Subquery elimination options
141
142
Returns:
143
Expression: Expression with eliminated subqueries
144
"""
145
146
def unnest_subqueries(expression: Expression, **opts) -> Expression:
147
"""
148
Unnests subqueries where possible for performance.
149
150
Args:
151
expression (Expression): Expression to optimize
152
**opts: Unnesting options
153
154
Returns:
155
Expression: Expression with unnested subqueries
156
"""
157
158
def merge_subqueries(expression: Expression, **opts) -> Expression:
159
"""
160
Merges compatible subqueries to reduce query complexity.
161
162
Args:
163
expression (Expression): Expression to optimize
164
**opts: Merge options
165
166
Returns:
167
Expression: Expression with merged subqueries
168
"""
169
```
170
171
### Join Optimization
172
173
Optimize JOIN operations and eliminate unnecessary joins.
174
175
```python { .api }
176
def optimize_joins(expression: Expression, **opts) -> Expression:
177
"""
178
Optimizes JOIN conditions and order for better performance.
179
180
Args:
181
expression (Expression): Expression to optimize
182
**opts: Join optimization options
183
184
Returns:
185
Expression: Expression with optimized joins
186
"""
187
188
def eliminate_joins(expression: Expression, **opts) -> Expression:
189
"""
190
Eliminates unnecessary JOINs that don't affect results.
191
192
Args:
193
expression (Expression): Expression to optimize
194
**opts: Join elimination options
195
196
Returns:
197
Expression: Expression with eliminated joins
198
"""
199
```
200
201
### CTE and Expression Simplification
202
203
Optimize Common Table Expressions and simplify complex expressions.
204
205
```python { .api }
206
def eliminate_ctes(expression: Expression, **opts) -> Expression:
207
"""
208
Eliminates unnecessary Common Table Expressions.
209
210
Args:
211
expression (Expression): Expression to optimize
212
**opts: CTE elimination options
213
214
Returns:
215
Expression: Expression with eliminated CTEs
216
"""
217
218
def simplify(expression: Expression, **opts) -> Expression:
219
"""
220
Simplifies expressions by applying algebraic rules.
221
222
Args:
223
expression (Expression): Expression to simplify
224
**opts: Simplification options
225
226
Returns:
227
Expression: Simplified expression
228
"""
229
230
def canonicalize(expression: Expression, **opts) -> Expression:
231
"""
232
Canonicalizes expressions to standard form.
233
234
Args:
235
expression (Expression): Expression to canonicalize
236
**opts: Canonicalization options
237
238
Returns:
239
Expression: Canonicalized expression
240
"""
241
```
242
243
### Schema and Identifier Normalization
244
245
Normalize identifiers and schema references for consistency.
246
247
```python { .api }
248
def normalize(expression: Expression, **opts) -> Expression:
249
"""
250
Normalizes expressions to consistent format.
251
252
Args:
253
expression (Expression): Expression to normalize
254
**opts: Normalization options
255
256
Returns:
257
Expression: Normalized expression
258
"""
259
260
def normalize_identifiers(expression: Expression, **opts) -> Expression:
261
"""
262
Normalizes identifier casing and quoting.
263
264
Args:
265
expression (Expression): Expression to normalize
266
**opts: Identifier normalization options
267
268
Returns:
269
Expression: Expression with normalized identifiers
270
"""
271
272
def isolate_table_selects(expression: Expression, **opts) -> Expression:
273
"""
274
Isolates table SELECT expressions for optimization.
275
276
Args:
277
expression (Expression): Expression to isolate
278
**opts: Isolation options
279
280
Returns:
281
Expression: Expression with isolated table selects
282
"""
283
```
284
285
### Scope Analysis
286
287
Analyze and traverse expression scopes for column and table resolution.
288
289
```python { .api }
290
class Scope:
291
"""Represents a scope in SQL expression for variable resolution."""
292
293
def __init__(self, expression: Expression, sources: Dict = None): ...
294
295
@property
296
def columns(self) -> List[Column]:
297
"""Available columns in this scope."""
298
299
@property
300
def tables(self) -> List[Table]:
301
"""Available tables in this scope."""
302
303
def build_scope(expression: Expression, **opts) -> Scope:
304
"""
305
Builds scope tree for expression with column/table resolution.
306
307
Args:
308
expression (Expression): Expression to analyze
309
**opts: Scope building options
310
311
Returns:
312
Scope: Root scope of expression tree
313
"""
314
315
def traverse_scope(scope: Scope, **opts) -> Iterator[Scope]:
316
"""
317
Traverses scope hierarchy depth-first.
318
319
Args:
320
scope (Scope): Root scope to traverse
321
**opts: Traversal options
322
323
Yields:
324
Scope: Each scope in traversal order
325
"""
326
327
def find_in_scope(scope: Scope, expression_type: Type, **opts) -> Optional[Expression]:
328
"""
329
Finds first expression of type in scope.
330
331
Args:
332
scope (Scope): Scope to search
333
expression_type (Type): Expression type to find
334
**opts: Search options
335
336
Returns:
337
Optional[Expression]: Found expression or None
338
"""
339
340
def find_all_in_scope(scope: Scope, expression_type: Type, **opts) -> List[Expression]:
341
"""
342
Finds all expressions of type in scope.
343
344
Args:
345
scope (Scope): Scope to search
346
expression_type (Type): Expression type to find
347
**opts: Search options
348
349
Returns:
350
List[Expression]: List of found expressions
351
"""
352
353
def walk_in_scope(scope: Scope, **opts) -> Iterator[Expression]:
354
"""
355
Walks all expressions within scope.
356
357
Args:
358
scope (Scope): Scope to walk
359
**opts: Walk options
360
361
Yields:
362
Expression: Each expression in scope
363
"""
364
```
365
366
## Usage Examples
367
368
### Basic Optimization
369
370
```python
371
import sqlglot
372
from sqlglot.optimizer import optimize
373
from sqlglot.schema import MappingSchema
374
375
# Define schema for optimization
376
schema = MappingSchema({
377
"users": {
378
"id": "INT",
379
"name": "VARCHAR",
380
"email": "VARCHAR",
381
"created_date": "DATE"
382
},
383
"orders": {
384
"id": "INT",
385
"user_id": "INT",
386
"amount": "DECIMAL",
387
"order_date": "DATE"
388
}
389
})
390
391
# Optimize complex query
392
sql = """
393
SELECT u.name, COUNT(o.id) as order_count
394
FROM users u
395
LEFT JOIN orders o ON u.id = o.user_id
396
WHERE u.created_date > '2023-01-01'
397
AND (SELECT COUNT(*) FROM orders WHERE user_id = u.id) > 5
398
GROUP BY u.name
399
"""
400
401
optimized = optimize(sql, schema=schema, dialect="postgres")
402
print(optimized.sql(pretty=True))
403
```
404
405
### Individual Optimization Rules
406
407
```python
408
import sqlglot
409
from sqlglot.optimizer import (
410
annotate_types, qualify_columns, pushdown_predicates,
411
eliminate_subqueries, simplify
412
)
413
from sqlglot.schema import MappingSchema
414
415
# Apply specific optimization rules
416
expression = sqlglot.parse_one("""
417
SELECT name FROM users
418
WHERE age > (SELECT AVG(age) FROM users) + 10 - 5
419
""")
420
421
schema = MappingSchema({"users": {"name": "VARCHAR", "age": "INT"}})
422
423
# Step-by-step optimization
424
expression = annotate_types(expression, schema=schema)
425
expression = qualify_columns(expression, schema=schema)
426
expression = pushdown_predicates(expression)
427
expression = eliminate_subqueries(expression)
428
expression = simplify(expression)
429
430
print(expression.sql(pretty=True))
431
```
432
433
### Scope Analysis
434
435
```python
436
import sqlglot
437
from sqlglot.optimizer import build_scope, traverse_scope
438
439
# Analyze query scope
440
sql = """
441
SELECT u.name, o.amount
442
FROM users u
443
JOIN (
444
SELECT user_id, amount
445
FROM orders
446
WHERE amount > 100
447
) o ON u.id = o.user_id
448
"""
449
450
expression = sqlglot.parse_one(sql)
451
root_scope = build_scope(expression)
452
453
# Traverse scopes
454
for scope in traverse_scope(root_scope):
455
print(f"Scope columns: {[c.name for c in scope.columns]}")
456
print(f"Scope tables: {[t.name for t in scope.tables]}")
457
```
458
459
### Custom Optimization Pipeline
460
461
```python
462
import sqlglot
463
from sqlglot.optimizer import (
464
annotate_types, qualify_columns, qualify_tables,
465
pushdown_predicates, eliminate_subqueries, optimize_joins,
466
simplify, canonicalize
467
)
468
469
def custom_optimize(expression, schema=None):
470
"""Custom optimization pipeline with specific rules."""
471
472
# Core qualification and type annotation
473
if schema:
474
expression = annotate_types(expression, schema=schema)
475
expression = qualify_tables(expression, schema=schema)
476
expression = qualify_columns(expression, schema=schema)
477
478
# Predicate and projection optimization
479
expression = pushdown_predicates(expression)
480
481
# Subquery optimization
482
expression = eliminate_subqueries(expression)
483
484
# Join optimization
485
expression = optimize_joins(expression)
486
487
# Expression simplification
488
expression = simplify(expression)
489
expression = canonicalize(expression)
490
491
return expression
492
493
# Use custom optimizer
494
sql = "SELECT * FROM users WHERE age > 25 AND status = 'active'"
495
expression = sqlglot.parse_one(sql)
496
optimized = custom_optimize(expression)
497
print(optimized.sql())
498
```
499
500
## Types
501
502
```python { .api }
503
class Scope:
504
"""Query scope for column and table resolution."""
505
506
expression: Expression # Root expression of scope
507
sources: Dict # Available data sources
508
509
@property
510
def columns(self) -> List[Column]:
511
"""Columns available in this scope."""
512
513
@property
514
def tables(self) -> List[Table]:
515
"""Tables available in this scope."""
516
517
def find(self, expression_type: Type) -> Optional[Expression]:
518
"""Find expression of specific type in scope."""
519
520
def find_all(self, expression_type: Type) -> List[Expression]:
521
"""Find all expressions of specific type in scope."""
522
523
# Optimization rule type
524
OptimizationRule = Callable[[Expression], Expression]
525
526
# Available optimization rules
527
RULES: List[OptimizationRule] = [
528
annotate_types,
529
qualify_columns,
530
qualify_tables,
531
pushdown_predicates,
532
pushdown_projections,
533
eliminate_subqueries,
534
unnest_subqueries,
535
merge_subqueries,
536
optimize_joins,
537
eliminate_joins,
538
eliminate_ctes,
539
simplify,
540
canonicalize,
541
normalize,
542
normalize_identifiers,
543
isolate_table_selects
544
]
545
```