Tessl Tile for pypi/polars-lts-cpu@1.33.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

configuration.md core-classes.md data-types.md expressions.md functions.md index.md io-operations.md sql-functionality.md

sql-functionality.mddocs/

0
# SQL Functionality
1

2
Native SQL support allowing you to query DataFrames and LazyFrames using familiar SQL syntax with full integration into the polars ecosystem. Execute SQL queries directly on your data structures or within the context of broader data pipelines.
3

4
## Capabilities
5

6
### SQL Context
7

8
Manage and execute SQL queries across multiple named DataFrames and LazyFrames.
9

10
```python { .api }
11
class SQLContext:
12
    def __init__(self, frames: dict[str, DataFrame | LazyFrame] | None = None):
13
        """
14
        Create SQL context for managing multiple data sources.
15
        
16
        Parameters:
17
        - frames: Dictionary of {name: DataFrame/LazyFrame} to register
18
        """
19

20
    def register(self, name: str, frame: DataFrame | LazyFrame) -> SQLContext:
21
        """
22
        Register DataFrame or LazyFrame with a name for SQL queries.
23
        
24
        Parameters:
25
        - name: Table name to use in SQL queries
26
        - frame: DataFrame or LazyFrame to register
27
        
28
        Returns:
29
        - SQLContext: Self for method chaining
30
        """
31

32
    def register_globals(self, n: int = 1) -> SQLContext:
33
        """
34
        Register DataFrames/LazyFrames from global namespace.
35
        
36
        Parameters:
37
        - n: Number of stack frames to look back
38
        
39
        Returns:
40
        - SQLContext: Self for method chaining
41
        """
42

43
    def register_many(self, frames: dict[str, DataFrame | LazyFrame]) -> SQLContext:
44
        """
45
        Register multiple frames at once.
46
        
47
        Parameters:
48
        - frames: Dictionary of {name: frame} pairs
49
        
50
        Returns:
51
        - SQLContext: Self for method chaining
52
        """
53

54
    def unregister(self, name: str) -> SQLContext:
55
        """
56
        Unregister a table from the context.
57
        
58
        Parameters:
59
        - name: Table name to unregister
60
        
61
        Returns:
62
        - SQLContext: Self for method chaining
63
        """
64

65
    def execute(
66
        self,
67
        query: str,
68
        *,
69
        eager: bool = False
70
    ) -> DataFrame | LazyFrame:
71
        """
72
        Execute SQL query and return result.
73
        
74
        Parameters:
75
        - query: SQL query string
76
        - eager: Return DataFrame (True) or LazyFrame (False)
77
        
78
        Returns:
79
        - DataFrame | LazyFrame: Query result
80
        """
81

82
    def tables(self) -> list[str]:
83
        """
84
        Get list of registered table names.
85
        
86
        Returns:
87
        - list[str]: Registered table names
88
        """
89
```
90

91
### SQL Query Function
92

93
Execute SQL queries directly on DataFrames and LazyFrames.
94

95
```python { .api }
96
def sql(
97
    query: str,
98
    *,
99
    eager: bool = True,
100
    **frames: DataFrame | LazyFrame
101
) -> DataFrame | LazyFrame:
102
    """
103
    Execute SQL query on provided frames.
104
    
105
    Parameters:
106
    - query: SQL query string
107
    - eager: Return DataFrame (True) or LazyFrame (False)
108
    - frames: Keyword arguments mapping table names to DataFrames/LazyFrames
109
    
110
    Returns:
111
    - DataFrame | LazyFrame: Query result
112
    """
113
```
114

115
### DataFrame/LazyFrame SQL Methods
116

117
Execute SQL queries directly on individual frames.
118

119
```python { .api }
120
# Available on DataFrame and LazyFrame classes
121
def sql(self, query: str, *, table_name: str = "self") -> DataFrame | LazyFrame:
122
    """
123
    Execute SQL query with this frame as the primary table.
124
    
125
    Parameters:
126
    - query: SQL query string (can reference table_name)
127
    - table_name: Name to use for this frame in the query
128
    
129
    Returns:
130
    - DataFrame | LazyFrame: Query result (same type as caller)
131
    """
132
```
133

134
### SQL Expression Function
135

136
Create expressions from SQL fragments for use in select, filter, etc.
137

138
```python { .api }
139
def sql_expr(sql: str) -> Expr:
140
    """
141
    Create expression from SQL fragment.
142
    
143
    Parameters:
144
    - sql: SQL expression string
145
    
146
    Returns:
147
    - Expr: Expression from SQL fragment
148
    """
149
```
150

151
## Supported SQL Features
152

153
### DDL Operations
154
- **SELECT**: Column selection, aliases, expressions
155
- **FROM**: Table references, subqueries
156
- **WHERE**: Filtering conditions
157
- **GROUP BY**: Grouping operations
158
- **HAVING**: Group filtering
159
- **ORDER BY**: Sorting
160
- **LIMIT/OFFSET**: Result limiting
161
- **WITH**: Common Table Expressions (CTEs)
162

163
### DML Operations
164
- **UNION/UNION ALL**: Set operations
165
- **INTERSECT**: Set intersection
166
- **EXCEPT**: Set difference
167

168
### Joins
169
- **INNER JOIN**: Inner joins
170
- **LEFT/RIGHT JOIN**: Outer joins
171
- **FULL OUTER JOIN**: Full outer joins
172
- **CROSS JOIN**: Cartesian product
173

174
### Functions
175
- **Aggregate**: SUM, COUNT, AVG, MIN, MAX, STDDEV, VARIANCE
176
- **String**: UPPER, LOWER, LENGTH, SUBSTRING, CONCAT
177
- **Math**: ABS, ROUND, CEIL, FLOOR, SQRT, POW
178
- **Date/Time**: EXTRACT, DATE_TRUNC, NOW, CURRENT_DATE
179
- **Conditional**: CASE WHEN, COALESCE, NULLIF
180
- **Window**: ROW_NUMBER, RANK, DENSE_RANK, LAG, LEAD
181

182
### Data Types
183
- **Numeric**: INTEGER, BIGINT, FLOAT, DOUBLE, DECIMAL
184
- **Text**: VARCHAR, TEXT, CHAR
185
- **Temporal**: DATE, TIME, TIMESTAMP
186
- **Boolean**: BOOLEAN
187
- **Complex**: ARRAY, STRUCT
188

189
## Usage Examples
190

191
### Basic SQL Queries
192

193
```python
194
import polars as pl
195

196
# Create sample data
197
df = pl.DataFrame({
198
    "id": [1, 2, 3, 4, 5],
199
    "name": ["Alice", "Bob", "Charlie", "Diana", "Eve"],
200
    "age": [25, 30, 35, 28, 32],
201
    "salary": [50000, 60000, 70000, 55000, 65000],
202
    "department": ["IT", "HR", "IT", "Finance", "IT"]
203
})
204

205
# Simple SELECT query
206
result = df.sql("SELECT name, age, salary FROM self WHERE age > 30")
207
print(result)
208

209
# Aggregation query
210
agg_result = df.sql("""
211
    SELECT 
212
        department,
213
        COUNT(*) as employee_count,
214
        AVG(salary) as avg_salary,
215
        MAX(age) as max_age
216
    FROM self 
217
    GROUP BY department
218
    ORDER BY avg_salary DESC
219
""")
220
```
221

222
### SQL Context Usage
223

224
```python
225
# Create multiple DataFrames
226
employees = pl.DataFrame({
227
    "emp_id": [1, 2, 3, 4],
228
    "name": ["Alice", "Bob", "Charlie", "Diana"],
229
    "dept_id": [1, 2, 1, 3]
230
})
231

232
departments = pl.DataFrame({
233
    "dept_id": [1, 2, 3],
234
    "dept_name": ["IT", "HR", "Finance"],
235
    "budget": [100000, 50000, 75000]
236
})
237

238
# Create SQL context and register tables
239
ctx = pl.SQLContext()
240
ctx.register("employees", employees)
241
ctx.register("departments", departments)
242

243
# Execute join query
244
result = ctx.execute("""
245
    SELECT 
246
        e.name,
247
        d.dept_name,
248
        d.budget
249
    FROM employees e
250
    JOIN departments d ON e.dept_id = d.dept_id
251
    ORDER BY d.budget DESC
252
""")
253
```
254

255
### Using sql() Function
256

257
```python
258
# Execute SQL with keyword arguments
259
result = pl.sql(
260
    """
261
    SELECT 
262
        e.name,
263
        d.dept_name,
264
        e.salary / d.budget * 100 as salary_percentage
265
    FROM emp e
266
    JOIN dept d ON e.dept_id = d.dept_id
267
    """,
268
    emp=employees,
269
    dept=departments
270
)
271
```
272

273
### Complex Queries with CTEs
274

275
```python
276
sales_data = pl.DataFrame({
277
    "product": ["A", "B", "A", "C", "B", "A"],
278
    "quarter": ["Q1", "Q1", "Q2", "Q2", "Q3", "Q3"],
279
    "revenue": [1000, 1500, 1200, 800, 1800, 1100]
280
})
281

282
result = sales_data.sql("""
283
    WITH quarterly_totals AS (
284
        SELECT 
285
            quarter,
286
            SUM(revenue) as total_revenue,
287
            COUNT(*) as product_count
288
        FROM self
289
        GROUP BY quarter
290
    ),
291
    product_performance AS (
292
        SELECT 
293
            product,
294
            SUM(revenue) as total_product_revenue,
295
            AVG(revenue) as avg_product_revenue
296
        FROM self
297
        GROUP BY product
298
    )
299
    SELECT 
300
        qt.quarter,
301
        qt.total_revenue,
302
        pp.product,
303
        pp.avg_product_revenue
304
    FROM quarterly_totals qt
305
    CROSS JOIN product_performance pp
306
    WHERE pp.avg_product_revenue > 1000
307
    ORDER BY qt.quarter, pp.avg_product_revenue DESC
308
""")
309
```
310

311
### Window Functions
312

313
```python
314
time_series = pl.DataFrame({
315
    "date": pl.date_range(pl.date(2023, 1, 1), pl.date(2023, 1, 10), "1d", eager=True),
316
    "value": [100, 105, 102, 108, 110, 107, 112, 115, 109, 118]
317
})
318

319
result = time_series.sql("""
320
    SELECT 
321
        date,
322
        value,
323
        LAG(value, 1) OVER (ORDER BY date) as prev_value,
324
        value - LAG(value, 1) OVER (ORDER BY date) as change,
325
        AVG(value) OVER (ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) as moving_avg_3,
326
        ROW_NUMBER() OVER (ORDER BY value DESC) as value_rank
327
    FROM self
328
    ORDER BY date
329
""")
330
```
331

332
### Subqueries and Set Operations
333

334
```python
335
large_dataset = pl.DataFrame({
336
    "category": ["A", "B", "A", "C", "B", "A", "C", "B"],
337
    "value": [10, 20, 15, 25, 30, 12, 28, 22],
338
    "region": ["North", "South", "North", "West", "South", "West", "North", "West"]
339
})
340

341
result = large_dataset.sql("""
342
    SELECT category, region, value
343
    FROM self
344
    WHERE value > (
345
        SELECT AVG(value) 
346
        FROM self s2 
347
        WHERE s2.category = self.category
348
    )
349
    
350
    UNION ALL
351
    
352
    SELECT category, region, value
353
    FROM self
354
    WHERE region = 'North' AND value > 20
355
    
356
    ORDER BY category, value DESC
357
""")
358
```
359

360
### SQL Expressions in Polars Operations
361

362
```python
363
# Use SQL expressions within regular Polars operations
364
df_with_sql_expr = df.select([
365
    pl.col("name"),
366
    pl.sql_expr("age * 12").alias("age_in_months"),
367
    pl.sql_expr("CASE WHEN salary > 60000 THEN 'High' ELSE 'Low' END").alias("salary_category"),
368
    pl.sql_expr("UPPER(department)").alias("dept_upper")
369
])
370

371
# Filter using SQL expressions
372
filtered_df = df.filter(
373
    pl.sql_expr("age BETWEEN 25 AND 35 AND department IN ('IT', 'Finance')")
374
)
375
```
376

377
### LazyFrame SQL Integration
378

379
```python
380
# Create LazyFrame with SQL
381
lazy_df = pl.scan_csv("large_dataset.csv")
382

383
# Execute SQL on LazyFrame (returns LazyFrame)
384
lazy_result = lazy_df.sql("""
385
    SELECT 
386
        category,
387
        COUNT(*) as count,
388
        AVG(value) as avg_value,
389
        SUM(CASE WHEN value > 100 THEN 1 ELSE 0 END) as high_value_count
390
    FROM self
391
    WHERE date >= '2023-01-01'
392
    GROUP BY category
393
    HAVING avg_value > 50
394
    ORDER BY avg_value DESC
395
""")
396

397
# Collect the lazy result
398
final_result = lazy_result.collect()
399
```
400

401
### Advanced SQL Context Operations
402

403
```python
404
# Register DataFrames from global namespace
405
df1 = pl.DataFrame({"a": [1, 2, 3]})
406
df2 = pl.DataFrame({"b": [4, 5, 6]})
407

408
ctx = pl.SQLContext()
409
ctx.register_globals()  # Automatically registers df1 and df2
410

411
# Register many frames at once
412
additional_frames = {
413
    "customers": customer_df,
414
    "orders": order_df,
415
    "products": product_df
416
}
417
ctx.register_many(additional_frames)
418

419
# Complex multi-table query
420
result = ctx.execute("""
421
    SELECT 
422
        c.customer_name,
423
        p.product_name,
424
        o.quantity,
425
        o.price * o.quantity as total_amount,
426
        RANK() OVER (PARTITION BY c.customer_id ORDER BY o.price * o.quantity DESC) as purchase_rank
427
    FROM customers c
428
    JOIN orders o ON c.customer_id = o.customer_id
429
    JOIN products p ON o.product_id = p.product_id
430
    WHERE o.order_date >= '2023-01-01'
431
      AND p.category = 'Electronics'
432
    ORDER BY c.customer_name, purchase_rank
433
""")
434

435
# Get list of registered tables
436
tables_list = ctx.tables()
437
print(f"Registered tables: {tables_list}")
438
```
439

440
## SQL and Polars Integration Benefits
441

442
### Performance Optimization
443
- **Predicate Pushdown**: WHERE conditions pushed to scan level
444
- **Projection Pushdown**: Only required columns read from storage
445
- **Query Optimization**: Polars optimizer works with SQL queries
446
- **Lazy Evaluation**: SQL queries on LazyFrames remain lazy
447

448
### Type Safety
449
- **Schema Inference**: Automatic type inference for SQL results
450
- **Type Preservation**: Polars types maintained through SQL operations
451
- **Error Handling**: Clear error messages for type mismatches
452

453
### Interoperability
454
- **Mixed Syntax**: Combine SQL with Polars expressions
455
- **DataFrame Methods**: SQL results work with all DataFrame methods
456
- **Streaming**: SQL queries work with streaming operations
457
- **Memory Efficiency**: Zero-copy operations where possible
458

459
## Limitations and Considerations
460

461
### SQL Feature Limitations
462
- Some advanced SQL features may not be supported
463
- Complex recursive queries not available
464
- Database-specific functions not included
465
- DDL operations (CREATE, DROP) not supported
466

467
### Performance Considerations
468
- SQL parsing adds slight overhead compared to native Polars
469
- Complex queries may benefit from native Polars expressions
470
- String-based queries lack compile-time type checking
471
- Very large queries may hit parser limits
472

473
### Best Practices
474
- Use SQL for complex analytical queries
475
- Prefer native Polars for simple transformations
476
- Test SQL query performance against native equivalents
477
- Use parameterized approaches for dynamic queries

Version

Tile

Files

sql-functionality.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

sql-functionality.mddocs/