0
# SQLGlot
1
2
A comprehensive no-dependency SQL parser, transpiler, optimizer, and engine written in Python. SQLGlot supports translation between 30+ different SQL dialects including DuckDB, Presto/Trino, Spark/Databricks, Snowflake, and BigQuery, providing robust SQL parsing capabilities with syntax error detection, comprehensive formatting and optimization features, AST manipulation tools for programmatic SQL building, and a built-in SQL execution engine.
3
4
## Package Information
5
6
- **Package Name**: sqlglot
7
- **Language**: Python
8
- **Installation**: `pip install sqlglot` or `pip install "sqlglot[rs]"` (with Rust tokenizer for better performance)
9
10
## Core Imports
11
12
```python
13
import sqlglot
14
```
15
16
Common for parsing and transpilation:
17
18
```python
19
from sqlglot import parse, parse_one, transpile, tokenize
20
```
21
22
Expression building:
23
24
```python
25
from sqlglot import select, column, table_, func, and_, or_, case
26
```
27
28
Advanced features:
29
30
```python
31
from sqlglot import Expression, Dialect, Parser, Generator
32
from sqlglot.optimizer import optimize
33
from sqlglot.executor import execute
34
from sqlglot.schema import MappingSchema
35
```
36
37
## Basic Usage
38
39
```python
40
import sqlglot
41
42
# Parse SQL into abstract syntax tree
43
sql = "SELECT user_id, COUNT(*) FROM users WHERE age > 21 GROUP BY user_id"
44
parsed = sqlglot.parse_one(sql)
45
46
# Transpile between SQL dialects
47
spark_sql = "SELECT DATE_ADD(CURRENT_DATE(), 7)"
48
postgres_sql = sqlglot.transpile(spark_sql, read="spark", write="postgres")[0]
49
# Result: "SELECT (CURRENT_DATE + INTERVAL '7' DAY)"
50
51
# Build SQL programmatically
52
query = (
53
sqlglot.select("user_id", sqlglot.func("COUNT", "*").as_("total"))
54
.from_("users")
55
.where(sqlglot.column("age") > 21)
56
.group_by("user_id")
57
)
58
print(query.sql()) # SELECT user_id, COUNT(*) AS total FROM users WHERE age > 21 GROUP BY user_id
59
60
# Format SQL with pretty printing
61
formatted = sqlglot.transpile(sql, pretty=True)[0]
62
```
63
64
## Architecture
65
66
SQLGlot's architecture is built around four core components:
67
68
- **Tokenizer**: Lexical analysis that converts SQL text into tokens, with support for dialect-specific keywords and syntax
69
- **Parser**: Recursive descent parser that builds Abstract Syntax Trees (ASTs) from tokens, handling dialect-specific grammar rules
70
- **Generator**: Converts ASTs back into SQL text with dialect-specific formatting and syntax rules
71
- **Expression System**: Rich object model representing all SQL constructs as Python objects with manipulation methods
72
73
This design enables SQLGlot to serve as both a high-level transpilation tool and a low-level SQL manipulation library, supporting everything from simple format conversion to complex query optimization and analysis.
74
75
## Capabilities
76
77
### Core Parsing and Transpilation
78
79
Essential SQL parsing and dialect translation functionality. Parse SQL strings into abstract syntax trees, transpile between 30+ SQL dialects, and tokenize SQL for lexical analysis.
80
81
```python { .api }
82
def parse(sql: str, read: str = None, dialect: str = None, **opts) -> List[Optional[Expression]]: ...
83
def parse_one(sql: str, read: str = None, dialect: str = None, into: Optional[Type] = None, **opts) -> Expression: ...
84
def transpile(sql: str, read: str = None, write: str = None, identity: bool = True, error_level: Optional[ErrorLevel] = None, **opts) -> List[str]: ...
85
def tokenize(sql: str, read: str = None, dialect: str = None) -> List[Token]: ...
86
```
87
88
[Core Parsing and Transpilation](./core-parsing.md)
89
90
### Expression Building and AST Manipulation
91
92
Programmatic SQL construction using builder functions and direct AST manipulation. Create complex SQL queries through code, modify existing parsed queries, and traverse expression trees.
93
94
```python { .api }
95
def select(*expressions) -> Select: ...
96
def column(col: str) -> Column: ...
97
def table_(name: str) -> Table: ...
98
def func(name: str, *args) -> Function: ...
99
def and_(*conditions) -> And: ...
100
def or_(*conditions) -> Or: ...
101
def case() -> Case: ...
102
def cast(expression, to: str) -> Cast: ...
103
```
104
105
[Expression Building](./expression-building.md)
106
107
### SQL Dialects
108
109
Support for 30+ SQL dialects with dialect-specific parsing, generation, and transformation rules. Each dialect handles unique syntax, functions, and data types.
110
111
```python { .api }
112
class Dialect:
113
def parse(self, sql: str, **opts) -> List[Optional[Expression]]: ...
114
def generate(self, expression: Expression, **opts) -> str: ...
115
def tokenize(self, sql: str) -> List[Token]: ...
116
117
# Available dialects
118
class BigQuery(Dialect): ...
119
class Snowflake(Dialect): ...
120
class Spark(Dialect): ...
121
class DuckDB(Dialect): ...
122
# ... and 26 more
123
```
124
125
[SQL Dialects](./dialects.md)
126
127
### SQL Optimization
128
129
Comprehensive query optimization engine with 15+ optimization rules. Optimize SQL for performance, simplify expressions, and analyze query structure.
130
131
```python { .api }
132
def optimize(sql: str | Expression, schema: Optional[Schema] = None, **opts) -> Expression: ...
133
134
# Optimization rules and components
135
def annotate_types(expression: Expression, **opts) -> Expression: ...
136
def qualify_columns(expression: Expression, schema: Schema, **opts) -> Expression: ...
137
def pushdown_predicates(expression: Expression, **opts) -> Expression: ...
138
def eliminate_subqueries(expression: Expression, **opts) -> Expression: ...
139
```
140
141
[SQL Optimization](./optimization.md)
142
143
### SQL Execution Engine
144
145
Built-in SQL execution engine for running queries against in-memory data structures. Execute SQL against Python data, perform joins and aggregations, and integrate with pandas DataFrames.
146
147
```python { .api }
148
def execute(sql: str | Expression, schema: Optional[Schema] = None, read: str = None, dialect: str = None, tables: Optional[Dict] = None) -> Table: ...
149
150
class Table:
151
def __init__(self, columns: List[str], rows: List[List]): ...
152
def to_dict(self) -> Dict: ...
153
```
154
155
[SQL Execution](./execution.md)
156
157
### Schema Management
158
159
Database schema representation and validation for accurate parsing and optimization. Define table structures, column types, and relationships.
160
161
```python { .api }
162
class Schema:
163
def add_table(self, table: str, column_mapping: Dict = None, **opts) -> None: ...
164
def column_names(self, table: str, **opts) -> List[str]: ...
165
def get_column_type(self, table: str, column: str, **opts) -> Optional[str]: ...
166
167
class MappingSchema(Schema):
168
def __init__(self, schema: Dict = None, **opts): ...
169
```
170
171
[Schema Management](./schema.md)
172
173
### Utility Functions
174
175
Additional tools for SQL analysis, comparison, and manipulation including AST diffing, column lineage analysis, and time parsing utilities.
176
177
```python { .api }
178
def diff(source: Expression, target: Expression, **opts) -> str: ...
179
180
# Lineage analysis
181
def lineage(sql: str, schema: Optional[Schema] = None, **opts) -> Dict: ...
182
183
# Error classes
184
class ParseError(Exception): ...
185
class UnsupportedError(Exception): ...
186
class ErrorLevel: ...
187
```
188
189
[Utilities](./utilities.md)
190
191
## Types
192
193
```python { .api }
194
class Expression:
195
"""Base class for all SQL expressions with AST manipulation methods."""
196
def sql(self, dialect: str = None, **opts) -> str: ...
197
def transform(self, fun: Callable, **opts) -> Expression: ...
198
def find(self, expression_type: Type, **opts) -> Optional[Expression]: ...
199
def find_all(self, expression_type: Type, **opts) -> List[Expression]: ...
200
def replace(self, expression: Expression, **opts) -> Expression: ...
201
202
class Token:
203
"""Represents a lexical token from SQL tokenization."""
204
def __init__(self, token_type: TokenType, text: str, line: int = 1, col: int = 1): ...
205
206
class ErrorLevel:
207
"""Error handling levels for parsing and validation."""
208
IGNORE: str
209
WARN: str
210
RAISE: str
211
IMMEDIATE: str
212
```