0
# SQL Interface
1
2
SQL query interface allowing standard SQL operations on DataFrames and integration with existing SQL workflows. Polars provides a comprehensive SQL engine that supports most standard SQL features while maintaining the performance characteristics of the native API.
3
4
## Capabilities
5
6
### SQLContext Class
7
8
Context manager for executing SQL queries against registered DataFrames and LazyFrames.
9
10
```python { .api }
11
class SQLContext:
12
def __init__(self, frames: dict[str, DataFrame | LazyFrame] | None = None, **named_frames: DataFrame | LazyFrame):
13
"""
14
Create SQL execution context.
15
16
Parameters:
17
- frames: Dictionary mapping table names to DataFrames/LazyFrames
18
- named_frames: Named DataFrames/LazyFrames as keyword arguments
19
20
Examples:
21
ctx = SQLContext({"users": df1, "orders": df2})
22
ctx = SQLContext(users=df1, orders=df2)
23
"""
24
25
def execute(self, query: str, *, eager: bool = True) -> DataFrame | LazyFrame:
26
"""
27
Execute SQL query.
28
29
Parameters:
30
- query: SQL query string
31
- eager: Return DataFrame (True) or LazyFrame (False)
32
33
Returns:
34
Query result as DataFrame or LazyFrame
35
36
Examples:
37
result = ctx.execute("SELECT * FROM users WHERE age > 25")
38
lazy_result = ctx.execute("SELECT * FROM users", eager=False)
39
"""
40
41
def register(self, name: str, frame: DataFrame | LazyFrame) -> SQLContext:
42
"""
43
Register DataFrame/LazyFrame as table.
44
45
Parameters:
46
- name: Table name for SQL queries
47
- frame: DataFrame or LazyFrame to register
48
49
Returns:
50
Self for method chaining
51
52
Examples:
53
ctx.register("customers", customer_df)
54
"""
55
56
def unregister(self, name: str) -> SQLContext:
57
"""
58
Remove registered table.
59
60
Parameters:
61
- name: Table name to remove
62
63
Returns:
64
Self for method chaining
65
66
Examples:
67
ctx.unregister("temp_table")
68
"""
69
70
def tables(self) -> list[str]:
71
"""
72
Get list of registered table names.
73
74
Returns:
75
List of table names
76
"""
77
```
78
79
### SQL Function
80
81
Standalone function for executing SQL queries with inline table registration.
82
83
```python { .api }
84
def sql(query: str, *, eager: bool = True, **named_frames: DataFrame | LazyFrame) -> DataFrame | LazyFrame:
85
"""
86
Execute SQL query with inline table registration.
87
88
Parameters:
89
- query: SQL query string
90
- eager: Return DataFrame (True) or LazyFrame (False)
91
- named_frames: DataFrames/LazyFrames to use as tables
92
93
Returns:
94
Query result as DataFrame or LazyFrame
95
96
Examples:
97
result = pl.sql("SELECT * FROM users WHERE age > 25", users=df)
98
result = pl.sql("SELECT u.name, o.amount FROM users u JOIN orders o ON u.id = o.user_id", users=users_df, orders=orders_df)
99
"""
100
101
def sql_expr(sql: str) -> Expr:
102
"""
103
Create expression from SQL fragment.
104
105
Parameters:
106
- sql: SQL expression string
107
108
Returns:
109
Expression object
110
111
Examples:
112
expr = pl.sql_expr("CASE WHEN age > 18 THEN 'Adult' ELSE 'Minor' END")
113
df.with_columns(expr.alias("category"))
114
"""
115
```
116
117
## Supported SQL Features
118
119
### SELECT Statements
120
121
Standard SELECT syntax with column selection, aliases, and expressions.
122
123
```sql
124
-- Basic selection
125
SELECT name, age FROM users;
126
127
-- Column aliases
128
SELECT name AS full_name, age * 12 AS age_months FROM users;
129
130
-- Expressions and functions
131
SELECT
132
name,
133
UPPER(name) AS name_upper,
134
age + 1 AS next_year_age,
135
CASE WHEN age >= 18 THEN 'Adult' ELSE 'Minor' END AS category
136
FROM users;
137
138
-- All columns
139
SELECT * FROM users;
140
141
-- Distinct values
142
SELECT DISTINCT department FROM employees;
143
```
144
145
### WHERE Clauses
146
147
Filtering with various conditions and operators.
148
149
```sql
150
-- Simple conditions
151
SELECT * FROM users WHERE age > 25;
152
SELECT * FROM users WHERE name = 'John';
153
SELECT * FROM users WHERE active = true;
154
155
-- Multiple conditions
156
SELECT * FROM users WHERE age > 18 AND age < 65;
157
SELECT * FROM users WHERE department = 'Sales' OR department = 'Marketing';
158
159
-- IN and NOT IN
160
SELECT * FROM users WHERE department IN ('Sales', 'Marketing', 'Support');
161
SELECT * FROM users WHERE age NOT IN (25, 30, 35);
162
163
-- BETWEEN
164
SELECT * FROM sales WHERE amount BETWEEN 1000 AND 5000;
165
166
-- LIKE patterns
167
SELECT * FROM users WHERE name LIKE 'John%';
168
SELECT * FROM users WHERE email LIKE '%@company.com';
169
170
-- NULL handling
171
SELECT * FROM users WHERE phone IS NOT NULL;
172
SELECT * FROM users WHERE notes IS NULL;
173
```
174
175
### JOINs
176
177
Various types of joins between tables.
178
179
```sql
180
-- INNER JOIN
181
SELECT u.name, o.amount
182
FROM users u
183
INNER JOIN orders o ON u.id = o.user_id;
184
185
-- LEFT JOIN
186
SELECT u.name, o.amount
187
FROM users u
188
LEFT JOIN orders o ON u.id = o.user_id;
189
190
-- RIGHT JOIN
191
SELECT u.name, o.amount
192
FROM users u
193
RIGHT JOIN orders o ON u.id = o.user_id;
194
195
-- FULL OUTER JOIN
196
SELECT u.name, o.amount
197
FROM users u
198
FULL OUTER JOIN orders o ON u.id = o.user_id;
199
200
-- Multiple joins
201
SELECT u.name, o.amount, p.name AS product_name
202
FROM users u
203
JOIN orders o ON u.id = o.user_id
204
JOIN products p ON o.product_id = p.id;
205
206
-- Self join
207
SELECT u1.name AS user1, u2.name AS user2
208
FROM users u1
209
JOIN users u2 ON u1.manager_id = u2.id;
210
```
211
212
### GROUP BY and Aggregations
213
214
Grouping data with aggregate functions.
215
216
```sql
217
-- Basic grouping
218
SELECT department, COUNT(*) AS employee_count
219
FROM employees
220
GROUP BY department;
221
222
-- Multiple aggregations
223
SELECT
224
department,
225
COUNT(*) AS count,
226
AVG(salary) AS avg_salary,
227
MAX(salary) AS max_salary,
228
MIN(salary) AS min_salary,
229
SUM(salary) AS total_salary
230
FROM employees
231
GROUP BY department;
232
233
-- Multiple grouping columns
234
SELECT department, location, COUNT(*) AS count
235
FROM employees
236
GROUP BY department, location;
237
238
-- HAVING clause
239
SELECT department, AVG(salary) AS avg_salary
240
FROM employees
241
GROUP BY department
242
HAVING AVG(salary) > 50000;
243
```
244
245
### ORDER BY
246
247
Sorting query results.
248
249
```sql
250
-- Single column ascending
251
SELECT * FROM users ORDER BY age;
252
253
-- Single column descending
254
SELECT * FROM users ORDER BY age DESC;
255
256
-- Multiple columns
257
SELECT * FROM users ORDER BY department, age DESC;
258
259
-- Order by expression
260
SELECT name, age, salary FROM employees ORDER BY salary / age DESC;
261
262
-- Order by column position
263
SELECT name, age FROM users ORDER BY 2 DESC;
264
```
265
266
### LIMIT and OFFSET
267
268
Limiting and paginating results.
269
270
```sql
271
-- Limit results
272
SELECT * FROM users LIMIT 10;
273
274
-- Offset and limit (pagination)
275
SELECT * FROM users ORDER BY id LIMIT 10 OFFSET 20;
276
277
-- Top N results
278
SELECT * FROM sales ORDER BY amount DESC LIMIT 5;
279
```
280
281
### Subqueries
282
283
Nested queries for complex operations.
284
285
```sql
286
-- Subquery in WHERE
287
SELECT * FROM users
288
WHERE department IN (
289
SELECT department FROM departments WHERE budget > 100000
290
);
291
292
-- Correlated subquery
293
SELECT u.name, u.salary
294
FROM users u
295
WHERE u.salary > (
296
SELECT AVG(salary) FROM users u2 WHERE u2.department = u.department
297
);
298
299
-- Subquery in SELECT
300
SELECT
301
name,
302
salary,
303
(SELECT AVG(salary) FROM users) AS company_avg
304
FROM users;
305
306
-- EXISTS
307
SELECT * FROM users u
308
WHERE EXISTS (
309
SELECT 1 FROM orders o WHERE o.user_id = u.id
310
);
311
```
312
313
### Window Functions
314
315
Advanced analytical functions with window specifications.
316
317
```sql
318
-- Row number
319
SELECT
320
name,
321
salary,
322
ROW_NUMBER() OVER (ORDER BY salary DESC) AS salary_rank
323
FROM employees;
324
325
-- Partition by
326
SELECT
327
name,
328
department,
329
salary,
330
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank
331
FROM employees;
332
333
-- Aggregate window functions
334
SELECT
335
name,
336
salary,
337
SUM(salary) OVER (ORDER BY salary) AS running_total,
338
AVG(salary) OVER (PARTITION BY department) AS dept_avg
339
FROM employees;
340
341
-- LAG and LEAD
342
SELECT
343
date,
344
sales,
345
LAG(sales, 1) OVER (ORDER BY date) AS prev_sales,
346
LEAD(sales, 1) OVER (ORDER BY date) AS next_sales
347
FROM daily_sales;
348
```
349
350
### Common Table Expressions (CTEs)
351
352
WITH clauses for complex queries.
353
354
```sql
355
-- Simple CTE
356
WITH high_earners AS (
357
SELECT * FROM employees WHERE salary > 80000
358
)
359
SELECT department, COUNT(*)
360
FROM high_earners
361
GROUP BY department;
362
363
-- Multiple CTEs
364
WITH
365
sales_summary AS (
366
SELECT user_id, SUM(amount) AS total_sales
367
FROM orders
368
GROUP BY user_id
369
),
370
user_categories AS (
371
SELECT
372
user_id,
373
CASE
374
WHEN total_sales > 10000 THEN 'VIP'
375
WHEN total_sales > 5000 THEN 'Premium'
376
ELSE 'Standard'
377
END AS category
378
FROM sales_summary
379
)
380
SELECT category, COUNT(*) AS user_count
381
FROM user_categories
382
GROUP BY category;
383
384
-- Recursive CTE (if supported)
385
WITH RECURSIVE subordinates AS (
386
SELECT id, name, manager_id, 1 as level
387
FROM employees
388
WHERE manager_id IS NULL
389
390
UNION ALL
391
392
SELECT e.id, e.name, e.manager_id, s.level + 1
393
FROM employees e
394
JOIN subordinates s ON e.manager_id = s.id
395
)
396
SELECT * FROM subordinates;
397
```
398
399
### Data Types and Functions
400
401
SQL functions and type operations.
402
403
```sql
404
-- String functions
405
SELECT
406
UPPER(name) AS name_upper,
407
LOWER(name) AS name_lower,
408
LENGTH(name) AS name_length,
409
SUBSTRING(name, 1, 3) AS name_prefix,
410
CONCAT(first_name, ' ', last_name) AS full_name
411
FROM users;
412
413
-- Numeric functions
414
SELECT
415
ROUND(salary, -3) AS salary_rounded,
416
ABS(balance) AS balance_abs,
417
CEIL(rating) AS rating_ceil,
418
FLOOR(rating) AS rating_floor
419
FROM accounts;
420
421
-- Date functions
422
SELECT
423
EXTRACT(YEAR FROM created_at) AS year,
424
EXTRACT(MONTH FROM created_at) AS month,
425
DATE_TRUNC('month', created_at) AS month_start,
426
created_at + INTERVAL '30 days' AS future_date
427
FROM orders;
428
429
-- Conditional functions
430
SELECT
431
name,
432
CASE
433
WHEN age < 18 THEN 'Minor'
434
WHEN age < 65 THEN 'Adult'
435
ELSE 'Senior'
436
END AS age_category,
437
COALESCE(phone, email, 'No contact') AS contact_info,
438
NULLIF(status, 'inactive') AS active_status
439
FROM users;
440
```
441
442
## Usage Examples
443
444
### Basic SQL Operations
445
446
```python
447
import polars as pl
448
449
# Create sample data
450
users = pl.DataFrame({
451
"id": [1, 2, 3, 4, 5],
452
"name": ["Alice", "Bob", "Charlie", "Diana", "Eve"],
453
"age": [25, 30, 35, 28, 32],
454
"department": ["Sales", "Engineering", "Sales", "Marketing", "Engineering"]
455
})
456
457
orders = pl.DataFrame({
458
"id": [101, 102, 103, 104, 105],
459
"user_id": [1, 2, 1, 3, 2],
460
"amount": [100, 250, 150, 300, 200],
461
"date": ["2023-01-15", "2023-01-16", "2023-01-17", "2023-01-18", "2023-01-19"]
462
})
463
464
# Simple SQL query
465
result = pl.sql("""
466
SELECT name, age, department
467
FROM users
468
WHERE age > 30
469
ORDER BY age DESC
470
""", users=users)
471
472
print(result)
473
```
474
475
### Using SQLContext
476
477
```python
478
# Create SQL context with multiple tables
479
ctx = pl.SQLContext({
480
"users": users,
481
"orders": orders
482
})
483
484
# Complex query with joins
485
result = ctx.execute("""
486
SELECT
487
u.name,
488
u.department,
489
COUNT(o.id) as order_count,
490
SUM(o.amount) as total_spent,
491
AVG(o.amount) as avg_order
492
FROM users u
493
LEFT JOIN orders o ON u.id = o.user_id
494
GROUP BY u.id, u.name, u.department
495
ORDER BY total_spent DESC
496
""")
497
498
# Register additional tables
499
products = pl.DataFrame({
500
"id": [1, 2, 3],
501
"name": ["Widget", "Gadget", "Tool"],
502
"category": ["A", "B", "A"]
503
})
504
505
ctx.register("products", products)
506
```
507
508
### Advanced SQL Features
509
510
```python
511
# Window functions
512
result = ctx.execute("""
513
SELECT
514
name,
515
department,
516
age,
517
ROW_NUMBER() OVER (PARTITION BY department ORDER BY age) as dept_rank,
518
AVG(age) OVER (PARTITION BY department) as dept_avg_age,
519
age - AVG(age) OVER (PARTITION BY department) as age_diff
520
FROM users
521
""")
522
523
# CTE with aggregations
524
result = ctx.execute("""
525
WITH department_stats AS (
526
SELECT
527
department,
528
COUNT(*) as emp_count,
529
AVG(age) as avg_age,
530
MAX(age) as max_age
531
FROM users
532
GROUP BY department
533
),
534
order_stats AS (
535
SELECT
536
u.department,
537
SUM(o.amount) as dept_sales
538
FROM users u
539
JOIN orders o ON u.id = o.user_id
540
GROUP BY u.department
541
)
542
SELECT
543
ds.department,
544
ds.emp_count,
545
ds.avg_age,
546
COALESCE(os.dept_sales, 0) as total_sales,
547
COALESCE(os.dept_sales, 0) / ds.emp_count as sales_per_employee
548
FROM department_stats ds
549
LEFT JOIN order_stats os ON ds.department = os.department
550
ORDER BY sales_per_employee DESC
551
""")
552
```
553
554
### Mixing SQL with Polars API
555
556
```python
557
# Start with SQL, continue with Polars API
558
lazy_result = pl.sql("""
559
SELECT u.name, u.department, o.amount, o.date
560
FROM users u
561
JOIN orders o ON u.id = o.user_id
562
WHERE u.age > 25
563
""", users=users, orders=orders, eager=False)
564
565
# Continue with Polars operations
566
final_result = (lazy_result
567
.with_columns([
568
pl.col("date").str.strptime(pl.Date, "%Y-%m-%d"),
569
pl.col("amount").cast(pl.Float64)
570
])
571
.filter(pl.col("amount") > 150)
572
.group_by("department")
573
.agg([
574
pl.col("amount").sum().alias("total_sales"),
575
pl.col("name").n_unique().alias("unique_customers")
576
])
577
.collect()
578
)
579
```
580
581
### SQL Expression Integration
582
583
```python
584
# Use SQL expressions in Polars operations
585
df = users.with_columns([
586
pl.sql_expr("CASE WHEN age >= 30 THEN 'Senior' ELSE 'Junior' END").alias("seniority"),
587
pl.sql_expr("UPPER(name)").alias("name_upper"),
588
pl.sql_expr("age * 12").alias("age_months")
589
])
590
591
# Complex SQL expressions
592
df = users.with_columns([
593
pl.sql_expr("""
594
CASE
595
WHEN department = 'Engineering' AND age > 30 THEN 'Senior Engineer'
596
WHEN department = 'Engineering' THEN 'Engineer'
597
WHEN department = 'Sales' AND age > 28 THEN 'Senior Sales'
598
WHEN department = 'Sales' THEN 'Sales Rep'
599
ELSE 'Other'
600
END
601
""").alias("role")
602
])
603
```
604
605
### Performance Optimization
606
607
```python
608
# Lazy execution with SQL
609
lazy_query = pl.sql("""
610
SELECT
611
department,
612
COUNT(*) as count,
613
AVG(age) as avg_age
614
FROM users
615
WHERE age > 25
616
GROUP BY department
617
""", users=users.lazy(), eager=False)
618
619
# Check execution plan
620
print(lazy_query.explain())
621
622
# Collect with optimizations
623
result = lazy_query.collect(
624
predicate_pushdown=True,
625
projection_pushdown=True
626
)
627
```
628
629
### Error Handling
630
631
```python
632
try:
633
result = ctx.execute("SELECT * FROM nonexistent_table")
634
except pl.SQLSyntaxError as e:
635
print(f"SQL syntax error: {e}")
636
except pl.SQLInterfaceError as e:
637
print(f"SQL interface error: {e}")
638
except Exception as e:
639
print(f"Other error: {e}")
640
```
641
642
## SQL Limitations and Differences
643
644
### Polars-Specific Features
645
646
- Column expressions can use Polars syntax within SQL
647
- Lazy evaluation integration with `.collect()` optimization
648
- Native data type support including nested types (List, Struct)
649
650
### Standard SQL Differences
651
652
- Some advanced SQL features may not be supported
653
- Window function syntax follows standard SQL but may have limitations
654
- Date/time handling uses Polars temporal types and functions
655
- Case sensitivity may differ from traditional SQL databases
656
657
### Performance Considerations
658
659
- SQL queries are translated to Polars expressions for execution
660
- Lazy evaluation provides query optimization opportunities
661
- Large datasets benefit from lazy SQL execution with streaming
662
- Join performance may differ from specialized SQL databases