or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

core-parsing.mddialects.mdexecution.mdexpression-building.mdindex.mdoptimization.mdschema.mdutilities.md

utilities.mddocs/

0

# Utility Functions

1

2

Additional tools for SQL analysis, comparison, and manipulation including AST diffing, column lineage analysis, time parsing utilities, and error handling for comprehensive SQL processing workflows.

3

4

## Capabilities

5

6

### AST Diffing and Comparison

7

8

Compare SQL expressions and identify differences between query structures.

9

10

```python { .api }

11

def diff(

12

source: Expression,

13

target: Expression,

14

**opts

15

) -> str:

16

"""

17

Compare two SQL expressions and generate diff output.

18

19

Args:

20

source (Expression): Source expression for comparison

21

target (Expression): Target expression for comparison

22

**opts: Diff formatting options

23

24

Returns:

25

str: Human-readable diff showing changes between expressions

26

"""

27

```

28

29

### Column Lineage Analysis

30

31

Trace column dependencies and data flow through complex SQL queries.

32

33

```python { .api }

34

def lineage(

35

sql: str,

36

schema: Optional[Schema] = None,

37

**opts

38

) -> Dict:

39

"""

40

Analyze column lineage and dependencies in SQL query.

41

42

Args:

43

sql (str): SQL query to analyze

44

schema (Schema): Schema for column resolution

45

**opts: Lineage analysis options

46

47

Returns:

48

Dict: Column lineage mapping showing data flow

49

"""

50

```

51

52

### Error Handling Classes

53

54

Comprehensive error handling for SQL processing operations.

55

56

```python { .api }

57

class SqlglotError(Exception):

58

"""Base exception class for all SQLGlot errors."""

59

60

class ParseError(SqlglotError):

61

"""Exception raised when SQL parsing fails."""

62

63

def __init__(

64

self,

65

message: str,

66

errors: Optional[List[Dict[str, Any]]] = None

67

):

68

"""

69

Initialize parse error with message and detailed error information.

70

71

Args:

72

message (str): Error message

73

errors (List[Dict]): List of detailed error dictionaries

74

"""

75

76

@classmethod

77

def new(

78

cls,

79

message: str,

80

description: Optional[str] = None,

81

line: Optional[int] = None,

82

col: Optional[int] = None,

83

start_context: Optional[str] = None,

84

highlight: Optional[str] = None,

85

end_context: Optional[str] = None,

86

into_expression: Optional[str] = None

87

) -> ParseError:

88

"""Create new ParseError with detailed location information."""

89

90

class TokenError(SqlglotError):

91

"""Exception raised when tokenization fails."""

92

93

class UnsupportedError(SqlglotError):

94

"""Exception raised when encountering unsupported SQL features."""

95

96

class OptimizeError(SqlglotError):

97

"""Exception raised during query optimization."""

98

99

class SchemaError(SqlglotError):

100

"""Exception raised for schema-related issues."""

101

102

class ExecuteError(SqlglotError):

103

"""Exception raised during SQL execution."""

104

105

class ErrorLevel:

106

"""Error handling levels for parser and validator."""

107

108

IGNORE: str # Ignore all errors

109

WARN: str # Log errors but continue processing

110

RAISE: str # Collect all errors and raise single exception

111

IMMEDIATE: str # Raise exception immediately on first error

112

```

113

114

### Helper Utilities

115

116

General utility functions and classes for SQL processing.

117

118

```python { .api }

119

class AutoName:

120

"""Base class for auto-naming enumerations."""

121

122

def dict_depth(mapping: Dict) -> int:

123

"""

124

Calculate maximum nesting depth of dictionary.

125

126

Args:

127

mapping (Dict): Dictionary to analyze

128

129

Returns:

130

int: Maximum nesting depth

131

"""

132

133

def first(iterable) -> Any:

134

"""

135

Return first element from iterable.

136

137

Args:

138

iterable: Iterable to get first element from

139

140

Returns:

141

Any: First element or None if empty

142

"""

143

144

def concat_messages(errors: List[Any], maximum: int) -> str:

145

"""

146

Concatenate error messages with truncation.

147

148

Args:

149

errors (List): List of error objects

150

maximum (int): Maximum number of errors to include

151

152

Returns:

153

str: Concatenated error message string

154

"""

155

156

def merge_errors(errors: List[ParseError]) -> List[Dict[str, Any]]:

157

"""

158

Merge multiple ParseError instances into unified error list.

159

160

Args:

161

errors (List[ParseError]): Parse errors to merge

162

163

Returns:

164

List[Dict]: Merged error information dictionaries

165

"""

166

```

167

168

### Time and Date Utilities

169

170

Specialized utilities for handling SQL time and date operations.

171

172

```python { .api }

173

# Time parsing and formatting utilities for SQL time literals

174

# Located in sqlglot.time module - provides functions for:

175

# - Parsing time strings in various formats

176

# - Converting between time representations

177

# - Formatting time values for different SQL dialects

178

```

179

180

### Transformation Utilities

181

182

Various SQL transformation and manipulation utilities.

183

184

```python { .api }

185

# SQL transformation utilities for advanced query manipulation

186

# Located in sqlglot.transforms module - provides functions for:

187

# - Custom expression transformations

188

# - Dialect-specific syntax conversions

189

# - Advanced AST manipulation patterns

190

```

191

192

### JSON Path Support

193

194

JSON path expression support for SQL queries with JSON data.

195

196

```python { .api }

197

# JSON path expression utilities for SQL JSON operations

198

# Located in sqlglot.jsonpath module - provides:

199

# - JSON path parsing and evaluation

200

# - Integration with SQL JSON functions

201

# - Cross-dialect JSON path support

202

```

203

204

### Data Structure Utilities

205

206

Trie and other data structure utilities for efficient processing.

207

208

```python { .api }

209

def new_trie() -> Dict:

210

"""

211

Create new trie data structure.

212

213

Returns:

214

Dict: Empty trie structure

215

"""

216

217

def in_trie(trie: Dict, key: str) -> TrieResult:

218

"""

219

Check if key exists in trie structure.

220

221

Args:

222

trie (Dict): Trie to search

223

key (str): Key to search for

224

225

Returns:

226

TrieResult: Result indicating presence and type of match

227

"""

228

229

class TrieResult:

230

"""Result of trie lookup operation."""

231

232

def __init__(self, exists: bool, prefix: bool = False):

233

"""

234

Initialize trie result.

235

236

Args:

237

exists (bool): Whether key exists in trie

238

prefix (bool): Whether key is a prefix of existing keys

239

"""

240

```

241

242

### Serialization Utilities

243

244

Serialization and deserialization utilities for expressions.

245

246

```python { .api }

247

# Serialization utilities for SQLGlot expressions

248

# Located in sqlglot.serde module - provides:

249

# - Expression serialization to JSON/dict format

250

# - Deserialization from stored representations

251

# - Cross-process expression transfer support

252

```

253

254

## Usage Examples

255

256

### Expression Diffing

257

258

```python

259

import sqlglot

260

from sqlglot.diff import diff

261

262

# Compare two similar queries

263

query1 = sqlglot.parse_one("SELECT name, age FROM users WHERE age > 25")

264

query2 = sqlglot.parse_one("SELECT name, age, email FROM users WHERE age >= 25")

265

266

# Generate diff

267

diff_output = diff(query1, query2)

268

print(diff_output)

269

270

# Compare more complex queries

271

original = sqlglot.parse_one("""

272

SELECT u.name, COUNT(o.id) as order_count

273

FROM users u

274

LEFT JOIN orders o ON u.id = o.user_id

275

WHERE u.active = 1

276

GROUP BY u.name

277

""")

278

279

modified = sqlglot.parse_one("""

280

SELECT u.name, u.email, COUNT(o.id) as total_orders

281

FROM users u

282

LEFT JOIN orders o ON u.id = o.user_id

283

WHERE u.active = 1 AND u.verified = 1

284

GROUP BY u.name, u.email

285

""")

286

287

print(diff(original, modified))

288

```

289

290

### Column Lineage Analysis

291

292

```python

293

import sqlglot

294

from sqlglot.lineage import lineage

295

from sqlglot.schema import MappingSchema

296

297

# Define schema for lineage analysis

298

schema = MappingSchema({

299

"raw_data": {

300

"user_id": "INT",

301

"first_name": "VARCHAR",

302

"last_name": "VARCHAR",

303

"email": "VARCHAR",

304

"signup_date": "DATE"

305

},

306

"orders": {

307

"id": "INT",

308

"user_id": "INT",

309

"amount": "DECIMAL",

310

"order_date": "DATE"

311

}

312

})

313

314

# Analyze complex query lineage

315

sql = """

316

SELECT

317

CONCAT(u.first_name, ' ', u.last_name) as full_name,

318

u.email,

319

COUNT(o.id) as total_orders,

320

SUM(o.amount) as total_spent,

321

AVG(o.amount) as avg_order_value,

322

MAX(o.order_date) as last_order_date

323

FROM raw_data u

324

LEFT JOIN orders o ON u.user_id = o.user_id

325

WHERE u.signup_date >= '2023-01-01'

326

GROUP BY u.user_id, u.first_name, u.last_name, u.email

327

"""

328

329

# Get lineage information

330

lineage_info = lineage(sql, schema=schema)

331

print("Column lineage:")

332

for output_col, source_cols in lineage_info.items():

333

print(f"{output_col} <- {source_cols}")

334

```

335

336

### Error Handling and Validation

337

338

```python

339

import sqlglot

340

from sqlglot import ParseError, UnsupportedError, ErrorLevel

341

342

# Handle parsing errors gracefully

343

def safe_parse(sql_queries):

344

results = []

345

errors = []

346

347

for sql in sql_queries:

348

try:

349

parsed = sqlglot.parse_one(sql)

350

results.append(parsed)

351

except ParseError as e:

352

errors.append(f"Parse error in '{sql}': {e}")

353

except UnsupportedError as e:

354

errors.append(f"Unsupported feature in '{sql}': {e}")

355

356

return results, errors

357

358

# Test with mixed valid/invalid SQL

359

test_queries = [

360

"SELECT * FROM users",

361

"SELECT FROM", # Invalid

362

"SELECT name FROM users WHERE age > 25",

363

"INVALID SQL SYNTAX", # Invalid

364

]

365

366

parsed_queries, parse_errors = safe_parse(test_queries)

367

print(f"Successfully parsed: {len(parsed_queries)} queries")

368

print(f"Errors: {len(parse_errors)}")

369

for error in parse_errors:

370

print(f" {error}")

371

372

# Use different error levels

373

try:

374

# Parse with warning level - continues on errors

375

expressions = sqlglot.parse(

376

"SELECT 1; INVALID; SELECT 2;",

377

error_level=ErrorLevel.WARN

378

)

379

print(f"Parsed {len([e for e in expressions if e])} valid expressions")

380

except Exception as e:

381

print(f"Error: {e}")

382

```

383

384

### Working with Helper Utilities

385

386

```python

387

import sqlglot

388

from sqlglot.helper import dict_depth, first

389

390

# Calculate nested dictionary depth

391

schema_dict = {

392

"database1": {

393

"schema1": {

394

"table1": {"col1": "INT", "col2": "VARCHAR"}

395

}

396

},

397

"database2": {

398

"schema1": {

399

"table1": {"col1": "INT"},

400

"table2": {"col1": "INT", "col2": "VARCHAR", "col3": "DATE"}

401

}

402

}

403

}

404

405

depth = dict_depth(schema_dict)

406

print(f"Schema depth: {depth}")

407

408

# Get first valid expression from parse results

409

sql_statements = "INVALID; SELECT 1; SELECT 2;"

410

expressions = sqlglot.parse(sql_statements, error_level=ErrorLevel.WARN)

411

first_valid = first(expr for expr in expressions if expr is not None)

412

if first_valid:

413

print(f"First valid expression: {first_valid.sql()}")

414

```

415

416

### Trie Data Structure Usage

417

418

```python

419

from sqlglot.trie import new_trie, in_trie

420

421

# Create trie for keyword matching

422

keywords_trie = new_trie()

423

424

# Add keywords (this would typically be done internally)

425

keywords = ["SELECT", "FROM", "WHERE", "GROUP", "ORDER", "HAVING"]

426

# Note: Actual trie population would use internal SQLGlot methods

427

428

# Check for keyword presence

429

for word in ["SELECT", "SELEC", "SELECTED", "FROM", "TABLE"]:

430

result = in_trie(keywords_trie, word.upper())

431

if result.exists:

432

print(f"'{word}' is a complete keyword")

433

elif result.prefix:

434

print(f"'{word}' is a keyword prefix")

435

else:

436

print(f"'{word}' is not a keyword or prefix")

437

```

438

439

### Advanced Error Information

440

441

```python

442

import sqlglot

443

from sqlglot import ParseError

444

445

# Parse with detailed error information

446

invalid_sql = """

447

SELECT name, age

448

FROM users

449

WHERE age > 25 AND

450

status = 'active' AND

451

created_date >= '2023-01-01'

452

missing_clause

453

"""

454

455

try:

456

parsed = sqlglot.parse_one(invalid_sql)

457

except ParseError as e:

458

print(f"Parse error: {e}")

459

460

# Access detailed error information

461

for error_detail in e.errors:

462

print(f" Line: {error_detail.get('line')}")

463

print(f" Column: {error_detail.get('col')}")

464

print(f" Description: {error_detail.get('description')}")

465

print(f" Context: {error_detail.get('start_context')}")

466

print(f" Highlight: {error_detail.get('highlight')}")

467

```

468

469

## Types

470

471

```python { .api }

472

class SqlglotError(Exception):

473

"""Base exception for all SQLGlot errors."""

474

475

class ParseError(SqlglotError):

476

"""SQL parsing error with detailed information."""

477

478

errors: List[Dict[str, Any]] # Detailed error information

479

480

def __init__(self, message: str, errors: Optional[List[Dict]] = None): ...

481

482

class TokenError(SqlglotError):

483

"""Tokenization error."""

484

485

class UnsupportedError(SqlglotError):

486

"""Unsupported feature error."""

487

488

class OptimizeError(SqlglotError):

489

"""Query optimization error."""

490

491

class SchemaError(SqlglotError):

492

"""Schema-related error."""

493

494

class ExecuteError(SqlglotError):

495

"""SQL execution error."""

496

497

class ErrorLevel:

498

"""Error handling level enumeration."""

499

500

IGNORE: str # Ignore all errors

501

WARN: str # Log errors but continue

502

RAISE: str # Collect errors and raise exception

503

IMMEDIATE: str # Raise immediately on first error

504

505

class TrieResult:

506

"""Result of trie lookup operation."""

507

508

exists: bool # Whether key exists completely in trie

509

prefix: bool # Whether key is prefix of existing keys

510

511

def __init__(self, exists: bool, prefix: bool = False): ...

512

513

class AutoName:

514

"""Base class for auto-naming enumerations."""

515

```