or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

config-utilities.mdcore-data-structures.mddata-types.mdexpressions.mdfunctions.mdindex.mdio-operations.mdselectors.mdsql-interface.md

sql-interface.mddocs/

0

# SQL Interface

1

2

SQL query interface allowing standard SQL operations on DataFrames and integration with existing SQL workflows. Polars provides a comprehensive SQL engine that supports most standard SQL features while maintaining the performance characteristics of the native API.

3

4

## Capabilities

5

6

### SQLContext Class

7

8

Context manager for executing SQL queries against registered DataFrames and LazyFrames.

9

10

```python { .api }

11

class SQLContext:

12

def __init__(self, frames: dict[str, DataFrame | LazyFrame] | None = None, **named_frames: DataFrame | LazyFrame):

13

"""

14

Create SQL execution context.

15

16

Parameters:

17

- frames: Dictionary mapping table names to DataFrames/LazyFrames

18

- named_frames: Named DataFrames/LazyFrames as keyword arguments

19

20

Examples:

21

ctx = SQLContext({"users": df1, "orders": df2})

22

ctx = SQLContext(users=df1, orders=df2)

23

"""

24

25

def execute(self, query: str, *, eager: bool = True) -> DataFrame | LazyFrame:

26

"""

27

Execute SQL query.

28

29

Parameters:

30

- query: SQL query string

31

- eager: Return DataFrame (True) or LazyFrame (False)

32

33

Returns:

34

Query result as DataFrame or LazyFrame

35

36

Examples:

37

result = ctx.execute("SELECT * FROM users WHERE age > 25")

38

lazy_result = ctx.execute("SELECT * FROM users", eager=False)

39

"""

40

41

def register(self, name: str, frame: DataFrame | LazyFrame) -> SQLContext:

42

"""

43

Register DataFrame/LazyFrame as table.

44

45

Parameters:

46

- name: Table name for SQL queries

47

- frame: DataFrame or LazyFrame to register

48

49

Returns:

50

Self for method chaining

51

52

Examples:

53

ctx.register("customers", customer_df)

54

"""

55

56

def unregister(self, name: str) -> SQLContext:

57

"""

58

Remove registered table.

59

60

Parameters:

61

- name: Table name to remove

62

63

Returns:

64

Self for method chaining

65

66

Examples:

67

ctx.unregister("temp_table")

68

"""

69

70

def tables(self) -> list[str]:

71

"""

72

Get list of registered table names.

73

74

Returns:

75

List of table names

76

"""

77

```

78

79

### SQL Function

80

81

Standalone function for executing SQL queries with inline table registration.

82

83

```python { .api }

84

def sql(query: str, *, eager: bool = True, **named_frames: DataFrame | LazyFrame) -> DataFrame | LazyFrame:

85

"""

86

Execute SQL query with inline table registration.

87

88

Parameters:

89

- query: SQL query string

90

- eager: Return DataFrame (True) or LazyFrame (False)

91

- named_frames: DataFrames/LazyFrames to use as tables

92

93

Returns:

94

Query result as DataFrame or LazyFrame

95

96

Examples:

97

result = pl.sql("SELECT * FROM users WHERE age > 25", users=df)

98

result = pl.sql("SELECT u.name, o.amount FROM users u JOIN orders o ON u.id = o.user_id", users=users_df, orders=orders_df)

99

"""

100

101

def sql_expr(sql: str) -> Expr:

102

"""

103

Create expression from SQL fragment.

104

105

Parameters:

106

- sql: SQL expression string

107

108

Returns:

109

Expression object

110

111

Examples:

112

expr = pl.sql_expr("CASE WHEN age > 18 THEN 'Adult' ELSE 'Minor' END")

113

df.with_columns(expr.alias("category"))

114

"""

115

```

116

117

## Supported SQL Features

118

119

### SELECT Statements

120

121

Standard SELECT syntax with column selection, aliases, and expressions.

122

123

```sql

124

-- Basic selection

125

SELECT name, age FROM users;

126

127

-- Column aliases

128

SELECT name AS full_name, age * 12 AS age_months FROM users;

129

130

-- Expressions and functions

131

SELECT

132

name,

133

UPPER(name) AS name_upper,

134

age + 1 AS next_year_age,

135

CASE WHEN age >= 18 THEN 'Adult' ELSE 'Minor' END AS category

136

FROM users;

137

138

-- All columns

139

SELECT * FROM users;

140

141

-- Distinct values

142

SELECT DISTINCT department FROM employees;

143

```

144

145

### WHERE Clauses

146

147

Filtering with various conditions and operators.

148

149

```sql

150

-- Simple conditions

151

SELECT * FROM users WHERE age > 25;

152

SELECT * FROM users WHERE name = 'John';

153

SELECT * FROM users WHERE active = true;

154

155

-- Multiple conditions

156

SELECT * FROM users WHERE age > 18 AND age < 65;

157

SELECT * FROM users WHERE department = 'Sales' OR department = 'Marketing';

158

159

-- IN and NOT IN

160

SELECT * FROM users WHERE department IN ('Sales', 'Marketing', 'Support');

161

SELECT * FROM users WHERE age NOT IN (25, 30, 35);

162

163

-- BETWEEN

164

SELECT * FROM sales WHERE amount BETWEEN 1000 AND 5000;

165

166

-- LIKE patterns

167

SELECT * FROM users WHERE name LIKE 'John%';

168

SELECT * FROM users WHERE email LIKE '%@company.com';

169

170

-- NULL handling

171

SELECT * FROM users WHERE phone IS NOT NULL;

172

SELECT * FROM users WHERE notes IS NULL;

173

```

174

175

### JOINs

176

177

Various types of joins between tables.

178

179

```sql

180

-- INNER JOIN

181

SELECT u.name, o.amount

182

FROM users u

183

INNER JOIN orders o ON u.id = o.user_id;

184

185

-- LEFT JOIN

186

SELECT u.name, o.amount

187

FROM users u

188

LEFT JOIN orders o ON u.id = o.user_id;

189

190

-- RIGHT JOIN

191

SELECT u.name, o.amount

192

FROM users u

193

RIGHT JOIN orders o ON u.id = o.user_id;

194

195

-- FULL OUTER JOIN

196

SELECT u.name, o.amount

197

FROM users u

198

FULL OUTER JOIN orders o ON u.id = o.user_id;

199

200

-- Multiple joins

201

SELECT u.name, o.amount, p.name AS product_name

202

FROM users u

203

JOIN orders o ON u.id = o.user_id

204

JOIN products p ON o.product_id = p.id;

205

206

-- Self join

207

SELECT u1.name AS user1, u2.name AS user2

208

FROM users u1

209

JOIN users u2 ON u1.manager_id = u2.id;

210

```

211

212

### GROUP BY and Aggregations

213

214

Grouping data with aggregate functions.

215

216

```sql

217

-- Basic grouping

218

SELECT department, COUNT(*) AS employee_count

219

FROM employees

220

GROUP BY department;

221

222

-- Multiple aggregations

223

SELECT

224

department,

225

COUNT(*) AS count,

226

AVG(salary) AS avg_salary,

227

MAX(salary) AS max_salary,

228

MIN(salary) AS min_salary,

229

SUM(salary) AS total_salary

230

FROM employees

231

GROUP BY department;

232

233

-- Multiple grouping columns

234

SELECT department, location, COUNT(*) AS count

235

FROM employees

236

GROUP BY department, location;

237

238

-- HAVING clause

239

SELECT department, AVG(salary) AS avg_salary

240

FROM employees

241

GROUP BY department

242

HAVING AVG(salary) > 50000;

243

```

244

245

### ORDER BY

246

247

Sorting query results.

248

249

```sql

250

-- Single column ascending

251

SELECT * FROM users ORDER BY age;

252

253

-- Single column descending

254

SELECT * FROM users ORDER BY age DESC;

255

256

-- Multiple columns

257

SELECT * FROM users ORDER BY department, age DESC;

258

259

-- Order by expression

260

SELECT name, age, salary FROM employees ORDER BY salary / age DESC;

261

262

-- Order by column position

263

SELECT name, age FROM users ORDER BY 2 DESC;

264

```

265

266

### LIMIT and OFFSET

267

268

Limiting and paginating results.

269

270

```sql

271

-- Limit results

272

SELECT * FROM users LIMIT 10;

273

274

-- Offset and limit (pagination)

275

SELECT * FROM users ORDER BY id LIMIT 10 OFFSET 20;

276

277

-- Top N results

278

SELECT * FROM sales ORDER BY amount DESC LIMIT 5;

279

```

280

281

### Subqueries

282

283

Nested queries for complex operations.

284

285

```sql

286

-- Subquery in WHERE

287

SELECT * FROM users

288

WHERE department IN (

289

SELECT department FROM departments WHERE budget > 100000

290

);

291

292

-- Correlated subquery

293

SELECT u.name, u.salary

294

FROM users u

295

WHERE u.salary > (

296

SELECT AVG(salary) FROM users u2 WHERE u2.department = u.department

297

);

298

299

-- Subquery in SELECT

300

SELECT

301

name,

302

salary,

303

(SELECT AVG(salary) FROM users) AS company_avg

304

FROM users;

305

306

-- EXISTS

307

SELECT * FROM users u

308

WHERE EXISTS (

309

SELECT 1 FROM orders o WHERE o.user_id = u.id

310

);

311

```

312

313

### Window Functions

314

315

Advanced analytical functions with window specifications.

316

317

```sql

318

-- Row number

319

SELECT

320

name,

321

salary,

322

ROW_NUMBER() OVER (ORDER BY salary DESC) AS salary_rank

323

FROM employees;

324

325

-- Partition by

326

SELECT

327

name,

328

department,

329

salary,

330

ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank

331

FROM employees;

332

333

-- Aggregate window functions

334

SELECT

335

name,

336

salary,

337

SUM(salary) OVER (ORDER BY salary) AS running_total,

338

AVG(salary) OVER (PARTITION BY department) AS dept_avg

339

FROM employees;

340

341

-- LAG and LEAD

342

SELECT

343

date,

344

sales,

345

LAG(sales, 1) OVER (ORDER BY date) AS prev_sales,

346

LEAD(sales, 1) OVER (ORDER BY date) AS next_sales

347

FROM daily_sales;

348

```

349

350

### Common Table Expressions (CTEs)

351

352

WITH clauses for complex queries.

353

354

```sql

355

-- Simple CTE

356

WITH high_earners AS (

357

SELECT * FROM employees WHERE salary > 80000

358

)

359

SELECT department, COUNT(*)

360

FROM high_earners

361

GROUP BY department;

362

363

-- Multiple CTEs

364

WITH

365

sales_summary AS (

366

SELECT user_id, SUM(amount) AS total_sales

367

FROM orders

368

GROUP BY user_id

369

),

370

user_categories AS (

371

SELECT

372

user_id,

373

CASE

374

WHEN total_sales > 10000 THEN 'VIP'

375

WHEN total_sales > 5000 THEN 'Premium'

376

ELSE 'Standard'

377

END AS category

378

FROM sales_summary

379

)

380

SELECT category, COUNT(*) AS user_count

381

FROM user_categories

382

GROUP BY category;

383

384

-- Recursive CTE (if supported)

385

WITH RECURSIVE subordinates AS (

386

SELECT id, name, manager_id, 1 as level

387

FROM employees

388

WHERE manager_id IS NULL

389

390

UNION ALL

391

392

SELECT e.id, e.name, e.manager_id, s.level + 1

393

FROM employees e

394

JOIN subordinates s ON e.manager_id = s.id

395

)

396

SELECT * FROM subordinates;

397

```

398

399

### Data Types and Functions

400

401

SQL functions and type operations.

402

403

```sql

404

-- String functions

405

SELECT

406

UPPER(name) AS name_upper,

407

LOWER(name) AS name_lower,

408

LENGTH(name) AS name_length,

409

SUBSTRING(name, 1, 3) AS name_prefix,

410

CONCAT(first_name, ' ', last_name) AS full_name

411

FROM users;

412

413

-- Numeric functions

414

SELECT

415

ROUND(salary, -3) AS salary_rounded,

416

ABS(balance) AS balance_abs,

417

CEIL(rating) AS rating_ceil,

418

FLOOR(rating) AS rating_floor

419

FROM accounts;

420

421

-- Date functions

422

SELECT

423

EXTRACT(YEAR FROM created_at) AS year,

424

EXTRACT(MONTH FROM created_at) AS month,

425

DATE_TRUNC('month', created_at) AS month_start,

426

created_at + INTERVAL '30 days' AS future_date

427

FROM orders;

428

429

-- Conditional functions

430

SELECT

431

name,

432

CASE

433

WHEN age < 18 THEN 'Minor'

434

WHEN age < 65 THEN 'Adult'

435

ELSE 'Senior'

436

END AS age_category,

437

COALESCE(phone, email, 'No contact') AS contact_info,

438

NULLIF(status, 'inactive') AS active_status

439

FROM users;

440

```

441

442

## Usage Examples

443

444

### Basic SQL Operations

445

446

```python

447

import polars as pl

448

449

# Create sample data

450

users = pl.DataFrame({

451

"id": [1, 2, 3, 4, 5],

452

"name": ["Alice", "Bob", "Charlie", "Diana", "Eve"],

453

"age": [25, 30, 35, 28, 32],

454

"department": ["Sales", "Engineering", "Sales", "Marketing", "Engineering"]

455

})

456

457

orders = pl.DataFrame({

458

"id": [101, 102, 103, 104, 105],

459

"user_id": [1, 2, 1, 3, 2],

460

"amount": [100, 250, 150, 300, 200],

461

"date": ["2023-01-15", "2023-01-16", "2023-01-17", "2023-01-18", "2023-01-19"]

462

})

463

464

# Simple SQL query

465

result = pl.sql("""

466

SELECT name, age, department

467

FROM users

468

WHERE age > 30

469

ORDER BY age DESC

470

""", users=users)

471

472

print(result)

473

```

474

475

### Using SQLContext

476

477

```python

478

# Create SQL context with multiple tables

479

ctx = pl.SQLContext({

480

"users": users,

481

"orders": orders

482

})

483

484

# Complex query with joins

485

result = ctx.execute("""

486

SELECT

487

u.name,

488

u.department,

489

COUNT(o.id) as order_count,

490

SUM(o.amount) as total_spent,

491

AVG(o.amount) as avg_order

492

FROM users u

493

LEFT JOIN orders o ON u.id = o.user_id

494

GROUP BY u.id, u.name, u.department

495

ORDER BY total_spent DESC

496

""")

497

498

# Register additional tables

499

products = pl.DataFrame({

500

"id": [1, 2, 3],

501

"name": ["Widget", "Gadget", "Tool"],

502

"category": ["A", "B", "A"]

503

})

504

505

ctx.register("products", products)

506

```

507

508

### Advanced SQL Features

509

510

```python

511

# Window functions

512

result = ctx.execute("""

513

SELECT

514

name,

515

department,

516

age,

517

ROW_NUMBER() OVER (PARTITION BY department ORDER BY age) as dept_rank,

518

AVG(age) OVER (PARTITION BY department) as dept_avg_age,

519

age - AVG(age) OVER (PARTITION BY department) as age_diff

520

FROM users

521

""")

522

523

# CTE with aggregations

524

result = ctx.execute("""

525

WITH department_stats AS (

526

SELECT

527

department,

528

COUNT(*) as emp_count,

529

AVG(age) as avg_age,

530

MAX(age) as max_age

531

FROM users

532

GROUP BY department

533

),

534

order_stats AS (

535

SELECT

536

u.department,

537

SUM(o.amount) as dept_sales

538

FROM users u

539

JOIN orders o ON u.id = o.user_id

540

GROUP BY u.department

541

)

542

SELECT

543

ds.department,

544

ds.emp_count,

545

ds.avg_age,

546

COALESCE(os.dept_sales, 0) as total_sales,

547

COALESCE(os.dept_sales, 0) / ds.emp_count as sales_per_employee

548

FROM department_stats ds

549

LEFT JOIN order_stats os ON ds.department = os.department

550

ORDER BY sales_per_employee DESC

551

""")

552

```

553

554

### Mixing SQL with Polars API

555

556

```python

557

# Start with SQL, continue with Polars API

558

lazy_result = pl.sql("""

559

SELECT u.name, u.department, o.amount, o.date

560

FROM users u

561

JOIN orders o ON u.id = o.user_id

562

WHERE u.age > 25

563

""", users=users, orders=orders, eager=False)

564

565

# Continue with Polars operations

566

final_result = (lazy_result

567

.with_columns([

568

pl.col("date").str.strptime(pl.Date, "%Y-%m-%d"),

569

pl.col("amount").cast(pl.Float64)

570

])

571

.filter(pl.col("amount") > 150)

572

.group_by("department")

573

.agg([

574

pl.col("amount").sum().alias("total_sales"),

575

pl.col("name").n_unique().alias("unique_customers")

576

])

577

.collect()

578

)

579

```

580

581

### SQL Expression Integration

582

583

```python

584

# Use SQL expressions in Polars operations

585

df = users.with_columns([

586

pl.sql_expr("CASE WHEN age >= 30 THEN 'Senior' ELSE 'Junior' END").alias("seniority"),

587

pl.sql_expr("UPPER(name)").alias("name_upper"),

588

pl.sql_expr("age * 12").alias("age_months")

589

])

590

591

# Complex SQL expressions

592

df = users.with_columns([

593

pl.sql_expr("""

594

CASE

595

WHEN department = 'Engineering' AND age > 30 THEN 'Senior Engineer'

596

WHEN department = 'Engineering' THEN 'Engineer'

597

WHEN department = 'Sales' AND age > 28 THEN 'Senior Sales'

598

WHEN department = 'Sales' THEN 'Sales Rep'

599

ELSE 'Other'

600

END

601

""").alias("role")

602

])

603

```

604

605

### Performance Optimization

606

607

```python

608

# Lazy execution with SQL

609

lazy_query = pl.sql("""

610

SELECT

611

department,

612

COUNT(*) as count,

613

AVG(age) as avg_age

614

FROM users

615

WHERE age > 25

616

GROUP BY department

617

""", users=users.lazy(), eager=False)

618

619

# Check execution plan

620

print(lazy_query.explain())

621

622

# Collect with optimizations

623

result = lazy_query.collect(

624

predicate_pushdown=True,

625

projection_pushdown=True

626

)

627

```

628

629

### Error Handling

630

631

```python

632

try:

633

result = ctx.execute("SELECT * FROM nonexistent_table")

634

except pl.SQLSyntaxError as e:

635

print(f"SQL syntax error: {e}")

636

except pl.SQLInterfaceError as e:

637

print(f"SQL interface error: {e}")

638

except Exception as e:

639

print(f"Other error: {e}")

640

```

641

642

## SQL Limitations and Differences

643

644

### Polars-Specific Features

645

646

- Column expressions can use Polars syntax within SQL

647

- Lazy evaluation integration with `.collect()` optimization

648

- Native data type support including nested types (List, Struct)

649

650

### Standard SQL Differences

651

652

- Some advanced SQL features may not be supported

653

- Window function syntax follows standard SQL but may have limitations

654

- Date/time handling uses Polars temporal types and functions

655

- Case sensitivity may differ from traditional SQL databases

656

657

### Performance Considerations

658

659

- SQL queries are translated to Polars expressions for execution

660

- Lazy evaluation provides query optimization opportunities

661

- Large datasets benefit from lazy SQL execution with streaming

662

- Join performance may differ from specialized SQL databases