0
# Core Parsing and Transpilation
1
2
Essential SQL parsing and dialect translation functionality that forms the foundation of SQLGlot. These functions handle tokenization, parsing SQL into abstract syntax trees, and transpiling between different SQL dialects.
3
4
## Capabilities
5
6
### SQL Parsing
7
8
Parse SQL strings into abstract syntax trees (ASTs) for analysis and manipulation. Supports parsing multiple statements and handling various SQL dialects.
9
10
```python { .api }
11
def parse(sql: str, read: str = None, dialect: str = None, **opts) -> List[Optional[Expression]]:
12
"""
13
Parses SQL string into collection of syntax trees, one per statement.
14
15
Args:
16
sql (str): SQL code string to parse
17
read (str): SQL dialect for parsing (e.g., "spark", "hive", "presto", "mysql")
18
dialect (str): SQL dialect (alias for read)
19
**opts: Additional parser options
20
21
Returns:
22
List[Optional[Expression]]: Collection of parsed expression trees
23
"""
24
```
25
26
### Single Statement Parsing
27
28
Parse a single SQL statement into an expression tree. Most commonly used parsing function for single queries.
29
30
```python { .api }
31
def parse_one(sql: str, read: str = None, dialect: str = None, into: Optional[Type] = None, **opts) -> Expression:
32
"""
33
Parses SQL string and returns syntax tree for the first statement.
34
35
Args:
36
sql (str): SQL code string to parse
37
read (str): SQL dialect for parsing
38
dialect (str): SQL dialect (alias for read)
39
into (Type): Specific SQLGlot Expression type to parse into
40
**opts: Additional parser options
41
42
Returns:
43
Expression: Syntax tree for the first parsed statement
44
45
Raises:
46
ParseError: If no valid expression could be parsed
47
"""
48
```
49
50
### SQL Transpilation
51
52
Convert SQL between different dialects while preserving semantic meaning. Handles dialect-specific syntax, functions, and data types.
53
54
```python { .api }
55
def transpile(
56
sql: str,
57
read: str = None,
58
write: str = None,
59
identity: bool = True,
60
error_level: Optional[ErrorLevel] = None,
61
**opts
62
) -> List[str]:
63
"""
64
Transpiles SQL from source dialect to target dialect.
65
66
Args:
67
sql (str): SQL code string to transpile
68
read (str): Source dialect (e.g., "spark", "hive", "presto", "mysql")
69
write (str): Target dialect (e.g., "postgres", "bigquery", "snowflake")
70
identity (bool): Use source dialect as target if write not specified
71
error_level (ErrorLevel): Desired error handling level
72
**opts: Additional generator options for output formatting
73
74
Returns:
75
List[str]: List of transpiled SQL statements
76
"""
77
```
78
79
### SQL Tokenization
80
81
Break SQL strings into lexical tokens for low-level analysis and custom processing.
82
83
```python { .api }
84
def tokenize(sql: str, read: str = None, dialect: str = None) -> List[Token]:
85
"""
86
Tokenizes SQL string into list of lexical tokens.
87
88
Args:
89
sql (str): SQL code string to tokenize
90
read (str): SQL dialect for tokenization
91
dialect (str): SQL dialect (alias for read)
92
93
Returns:
94
List[Token]: List of tokens representing the SQL input
95
"""
96
```
97
98
### Utility Functions
99
100
Additional parsing utilities for expression handling and analysis.
101
102
```python { .api }
103
def maybe_parse(sql: str | Expression, **opts) -> Expression:
104
"""
105
Parses SQL string or returns Expression if already parsed.
106
107
Args:
108
sql: SQL string or Expression object
109
**opts: Parse options if parsing needed
110
111
Returns:
112
Expression: Parsed or existing expression
113
"""
114
115
def diff(source: Expression, target: Expression, **opts) -> str:
116
"""
117
Compares two SQL expressions and returns a diff string.
118
119
Args:
120
source (Expression): Source expression to compare
121
target (Expression): Target expression to compare against
122
**opts: Additional diff options
123
124
Returns:
125
str: String representation of differences between expressions
126
"""
127
```
128
129
## Usage Examples
130
131
### Basic Parsing
132
133
```python
134
import sqlglot
135
136
# Parse a simple SELECT statement
137
sql = "SELECT name, age FROM users WHERE age > 25"
138
expression = sqlglot.parse_one(sql)
139
140
# Parse with specific dialect
141
spark_sql = "SELECT explode(array_col) FROM table"
142
expression = sqlglot.parse_one(spark_sql, dialect="spark")
143
144
# Parse multiple statements
145
multi_sql = "SELECT 1; SELECT 2; SELECT 3;"
146
expressions = sqlglot.parse(multi_sql)
147
```
148
149
### Dialect Transpilation
150
151
```python
152
import sqlglot
153
154
# Convert Spark SQL to PostgreSQL
155
spark_query = "SELECT DATE_ADD(current_date(), 7) as future_date"
156
postgres_query = sqlglot.transpile(spark_query, read="spark", write="postgres")[0]
157
# Result: "SELECT (CURRENT_DATE + INTERVAL '7' DAY) AS future_date"
158
159
# Convert BigQuery to Snowflake
160
bq_query = "SELECT EXTRACT(YEAR FROM date_col) FROM table"
161
sf_query = sqlglot.transpile(bq_query, read="bigquery", write="snowflake")[0]
162
163
# Format SQL with pretty printing
164
formatted = sqlglot.transpile(
165
"SELECT a,b,c FROM table WHERE x=1 AND y=2",
166
pretty=True
167
)[0]
168
```
169
170
### Working with Tokens
171
172
```python
173
import sqlglot
174
175
sql = "SELECT * FROM users"
176
tokens = sqlglot.tokenize(sql)
177
178
for token in tokens:
179
print(f"{token.token_type}: {token.text}")
180
# Output:
181
# TokenType.SELECT: SELECT
182
# TokenType.STAR: *
183
# TokenType.FROM: FROM
184
# TokenType.IDENTIFIER: users
185
```
186
187
### Error Handling
188
189
```python
190
import sqlglot
191
from sqlglot import ParseError, ErrorLevel
192
193
# Handle parsing errors
194
try:
195
expression = sqlglot.parse_one("SELECT FROM") # Invalid SQL
196
except ParseError as e:
197
print(f"Parse error: {e}")
198
199
# Control error level
200
expressions = sqlglot.parse(
201
"SELECT 1; INVALID SQL; SELECT 2",
202
error_level=ErrorLevel.WARN # Log errors but continue
203
)
204
```
205
206
## Types
207
208
```python { .api }
209
class Token:
210
"""Represents a lexical token from SQL tokenization."""
211
token_type: TokenType
212
text: str
213
line: int
214
col: int
215
216
def __init__(self, token_type: TokenType, text: str, line: int = 1, col: int = 1): ...
217
218
class TokenType:
219
"""Enumeration of all possible token types in SQL."""
220
# Keywords
221
SELECT: str
222
FROM: str
223
WHERE: str
224
# Operators
225
PLUS: str
226
MINUS: str
227
STAR: str
228
# Literals
229
STRING: str
230
NUMBER: str
231
# ... and many more
232
233
class ErrorLevel:
234
"""Error handling levels for parsing operations."""
235
IGNORE: str # Ignore all errors
236
WARN: str # Log errors but continue
237
RAISE: str # Collect errors and raise single exception
238
IMMEDIATE: str # Raise exception on first error
239
```