0
# SQL Interface
1
2
SQL query execution capabilities with SQLContext for managing multiple DataFrames and native SQL expression support within DataFrame operations for familiar SQL-based data manipulation.
3
4
## Capabilities
5
6
### SQLContext
7
8
Context manager for executing SQL queries across multiple DataFrames with table registration and management.
9
10
```python { .api }
11
class SQLContext:
12
def __init__(self):
13
"""Create new SQL context."""
14
15
def register(self, name: str, frame) -> None:
16
"""
17
Register DataFrame or LazyFrame for SQL queries.
18
19
Parameters:
20
- name: Table name for SQL queries
21
- frame: DataFrame or LazyFrame to register
22
"""
23
24
def execute(self, query: str, *, eager=True) -> DataFrame | LazyFrame:
25
"""
26
Execute SQL query.
27
28
Parameters:
29
- query: SQL query string
30
- eager: Return DataFrame if True, LazyFrame if False
31
32
Returns:
33
Query results as DataFrame or LazyFrame
34
"""
35
36
def tables(self) -> list[str]:
37
"""
38
Get list of registered table names.
39
40
Returns:
41
List of registered table names
42
"""
43
44
def unregister(self, name: str) -> None:
45
"""
46
Unregister table from context.
47
48
Parameters:
49
- name: Table name to unregister
50
"""
51
```
52
53
### SQL Functions
54
55
Direct SQL execution functions for single queries and SQL expressions within DataFrame operations.
56
57
```python { .api }
58
def sql(query: str, *, eager=True, **kwargs) -> DataFrame | LazyFrame:
59
"""
60
Execute SQL query on DataFrames.
61
62
Parameters:
63
- query: SQL query string
64
- eager: Return DataFrame if True, LazyFrame if False
65
- kwargs: DataFrames accessible by name in SQL
66
67
Returns:
68
Query results as DataFrame or LazyFrame
69
"""
70
71
def sql_expr(sql: str) -> Expr:
72
"""
73
Create expression from SQL string.
74
75
Parameters:
76
- sql: SQL expression string
77
78
Returns:
79
Expression object
80
"""
81
```
82
83
## Usage Examples
84
85
### Basic SQL Queries
86
87
```python
88
import polars as pl
89
90
# Create sample data
91
df1 = pl.DataFrame({
92
"id": [1, 2, 3],
93
"name": ["Alice", "Bob", "Charlie"],
94
"age": [25, 30, 35]
95
})
96
97
df2 = pl.DataFrame({
98
"id": [1, 2, 3],
99
"department": ["Engineering", "Sales", "Marketing"],
100
"salary": [70000, 50000, 60000]
101
})
102
103
# Direct SQL execution
104
result = pl.sql(
105
"SELECT * FROM df1 WHERE age > 28",
106
df1=df1
107
)
108
109
# Join tables with SQL
110
result = pl.sql("""
111
SELECT u.name, u.age, d.department, d.salary
112
FROM df1 u
113
JOIN df2 d ON u.id = d.id
114
WHERE u.age > 25
115
""", df1=df1, df2=df2)
116
```
117
118
### SQLContext Usage
119
120
```python
121
# Create SQL context
122
ctx = pl.SQLContext()
123
124
# Register tables
125
ctx.register("users", df1)
126
ctx.register("departments", df2)
127
128
# Execute queries
129
result = ctx.execute("""
130
SELECT
131
u.name,
132
u.age,
133
d.department,
134
d.salary,
135
CASE
136
WHEN d.salary > 60000 THEN 'High'
137
WHEN d.salary > 50000 THEN 'Medium'
138
ELSE 'Low'
139
END as salary_tier
140
FROM users u
141
JOIN departments d ON u.id = d.id
142
ORDER BY d.salary DESC
143
""")
144
145
# Check registered tables
146
print(ctx.tables()) # ['users', 'departments']
147
```
148
149
### SQL Expressions in DataFrame Operations
150
151
```python
152
# Use SQL expressions within DataFrame operations
153
df = pl.DataFrame({
154
"x": [1, 2, 3, 4, 5],
155
"y": [10, 20, 30, 40, 50]
156
})
157
158
result = df.with_columns([
159
pl.sql_expr("x * 2 + y").alias("calculated"),
160
pl.sql_expr("CASE WHEN x > 3 THEN 'high' ELSE 'low' END").alias("category")
161
])
162
```