Tessl Tile for pypi/daft@0.6.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

ai-ml.md catalog.md data-io.md dataframe-operations.md expressions.md index.md session.md sql.md udf.md

sql.mddocs/

0
# SQL Interface
1

2
Execute SQL queries directly on DataFrames and registered tables with full support for standard SQL syntax, joins, aggregations, and Daft-specific extensions for multimodal data operations.
3

4
## Capabilities
5

6
### SQL Query Execution
7

8
Execute SQL queries that return DataFrames.
9

10
```python { .api }
11
def sql(query: str) -> DataFrame:
12
    """
13
    Execute SQL query and return DataFrame.
14
    
15
    Parameters:
16
    - query: SQL query string with standard SQL syntax
17
    
18
    Returns:
19
    DataFrame: Result of SQL query
20
    
21
    Examples:
22
    >>> df = daft.sql("SELECT name, age FROM my_table WHERE age > 25")
23
    >>> result = daft.sql("SELECT department, AVG(salary) FROM employees GROUP BY department")
24
    """
25
```
26

27
### SQL Expression Creation
28

29
Create SQL expressions for use in DataFrame operations.
30

31
```python { .api }
32
def sql_expr(expression: str) -> Expression:
33
    """
34
    Create expression from SQL string.
35
    
36
    Parameters:
37
    - expression: SQL expression string
38
    
39
    Returns:
40
    Expression: Expression object for DataFrame operations
41
    
42
    Examples:
43
    >>> expr = daft.sql_expr("age * 2 + 1")
44
    >>> df.select(sql_expr("UPPER(name)").alias("upper_name"))
45
    """
46
```
47

48
## Usage Examples
49

50
### Basic SQL Queries
51
```python
52
import daft
53

54
# Create sample data
55
df = daft.from_pydict({
56
    "name": ["Alice", "Bob", "Charlie", "Diana"],
57
    "age": [25, 30, 35, 28],
58
    "department": ["Engineering", "Sales", "Engineering", "Marketing"],
59
    "salary": [75000, 65000, 85000, 70000]
60
})
61

62
# Register DataFrame for SQL queries
63
daft.attach_table(df, "employees")
64

65
# Simple SELECT query
66
result1 = daft.sql("SELECT name, age FROM employees WHERE age > 27")
67

68
# Aggregation query
69
result2 = daft.sql("""
70
    SELECT department, 
71
           COUNT(*) as employee_count,
72
           AVG(salary) as avg_salary,
73
           MAX(age) as max_age
74
    FROM employees 
75
    GROUP BY department
76
    ORDER BY avg_salary DESC
77
""")
78
```
79

80
### Advanced SQL Operations
81
```python
82
# Complex JOIN operations
83
orders = daft.from_pydict({
84
    "order_id": [1, 2, 3, 4],
85
    "employee_name": ["Alice", "Bob", "Alice", "Charlie"],
86
    "amount": [1500, 2000, 1200, 1800],
87
    "order_date": ["2024-01-15", "2024-01-16", "2024-01-17", "2024-01-18"]
88
})
89

90
daft.attach_table(orders, "orders") 
91

92
# JOIN with aggregation
93
sales_summary = daft.sql("""
94
    SELECT e.name,
95
           e.department,
96
           COUNT(o.order_id) as total_orders,
97
           SUM(o.amount) as total_sales,
98
           AVG(o.amount) as avg_order_value
99
    FROM employees e
100
    LEFT JOIN orders o ON e.name = o.employee_name
101
    GROUP BY e.name, e.department
102
    HAVING COUNT(o.order_id) > 0
103
    ORDER BY total_sales DESC
104
""")
105
```
106

107
### Window Functions in SQL
108
```python
109
# Window functions for ranking and running totals
110
windowed_results = daft.sql("""
111
    SELECT name,
112
           department,
113
           salary,
114
           ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) as dept_rank,
115
           RANK() OVER (ORDER BY salary DESC) as overall_rank,
116
           SUM(salary) OVER (PARTITION BY department) as dept_total_salary,
117
           AVG(salary) OVER () as company_avg_salary
118
    FROM employees
119
    ORDER BY department, salary DESC
120
""")
121
```
122

123
### Common Table Expressions (CTEs)
124
```python
125
# Using WITH clauses for complex queries
126
cte_query = daft.sql("""
127
    WITH department_stats AS (
128
        SELECT department,
129
               COUNT(*) as emp_count,
130
               AVG(salary) as avg_salary,
131
               MIN(age) as min_age,
132
               MAX(age) as max_age
133
        FROM employees
134
        GROUP BY department
135
    ),
136
    high_performing_depts AS (
137
        SELECT department
138
        FROM department_stats
139
        WHERE avg_salary > 70000 AND emp_count >= 2
140
    )
141
    SELECT e.name, e.age, e.salary, e.department
142
    FROM employees e
143
    INNER JOIN high_performing_depts h ON e.department = h.department
144
    ORDER BY e.salary DESC
145
""")
146
```
147

148
### Subqueries and Nested Operations
149
```python
150
# Subqueries for complex filtering
151
subquery_result = daft.sql("""
152
    SELECT name, age, salary, department,
153
           salary - (SELECT AVG(salary) FROM employees) as salary_diff
154
    FROM employees
155
    WHERE salary > (
156
        SELECT AVG(salary) 
157
        FROM employees 
158
        WHERE department = 'Engineering'
159
    )
160
    ORDER BY salary_diff DESC
161
""")
162
```
163

164
### Mixing SQL with DataFrame Operations
165
```python
166
# Combine SQL queries with DataFrame methods
167
sql_df = daft.sql("SELECT * FROM employees WHERE age BETWEEN 25 AND 35")
168

169
# Continue with DataFrame operations
170
final_result = (sql_df
171
    .filter(daft.col("salary") > 70000)
172
    .select("name", "department", (daft.col("salary") * 1.1).alias("with_bonus"))
173
    .collect()
174
)
175
```
176

177
### Using SQL Expressions in DataFrames
178
```python
179
# SQL expressions in DataFrame select
180
df_with_sql_expr = df.select(
181
    daft.col("name"),
182
    daft.sql_expr("age * 2").alias("double_age"),
183
    daft.sql_expr("UPPER(department)").alias("dept_upper"),
184
    daft.sql_expr("CASE WHEN salary > 70000 THEN 'High' ELSE 'Standard' END").alias("salary_tier")
185
)
186

187
# SQL expressions in filters  
188
filtered_df = df.filter(
189
    daft.sql_expr("age BETWEEN 25 AND 35 AND department IN ('Engineering', 'Sales')")
190
)
191
```
192

193
### Working with Multiple Tables
194
```python
195
# Register multiple tables
196
customers = daft.from_pydict({
197
    "customer_id": [1, 2, 3],
198
    "name": ["Corp A", "Corp B", "Corp C"],
199
    "industry": ["Tech", "Finance", "Healthcare"]
200
})
201

202
daft.attach_table(customers, "customers")
203

204
# Multi-table JOIN
205
multi_table_query = daft.sql("""
206
    SELECT c.name as customer_name,
207
           c.industry,
208
           e.name as employee_name,
209
           e.department,
210
           o.amount,
211
           o.order_date
212
    FROM customers c
213
    JOIN orders o ON c.customer_id = o.order_id  -- Simplified relationship
214
    JOIN employees e ON o.employee_name = e.name
215
    WHERE c.industry = 'Tech'
216
    ORDER BY o.order_date DESC
217
""")
218
```
219

220
### Data Type Handling in SQL
221
```python
222
# Working with different data types
223
mixed_data = daft.from_pydict({
224
    "id": [1, 2, 3],
225
    "created_at": ["2024-01-01 10:00:00", "2024-01-02 15:30:00", "2024-01-03 09:15:00"],
226
    "tags": [["python", "sql"], ["data", "analytics"], ["machine", "learning"]],
227
    "metadata": ['{"priority": 1}', '{"priority": 2}', '{"priority": 3}']
228
})
229

230
daft.attach_table(mixed_data, "mixed_data")
231

232
# SQL with complex data types
233
complex_query = daft.sql("""
234
    SELECT id,
235
           EXTRACT(YEAR FROM created_at) as year,
236
           EXTRACT(MONTH FROM created_at) as month,
237
           ARRAY_LENGTH(tags) as tag_count,
238
           JSON_EXTRACT(metadata, '$.priority') as priority
239
    FROM mixed_data
240
    WHERE ARRAY_LENGTH(tags) > 1
241
    ORDER BY priority DESC
242
""")
243
```
244

245
### Temporary Tables and Views
246
```python
247
# Create temporary table from query result
248
temp_result = daft.sql("""
249
    SELECT department, AVG(salary) as avg_salary
250
    FROM employees
251
    GROUP BY department
252
""")
253

254
daft.attach_table(temp_result, "dept_averages")
255

256
# Use temporary table in subsequent queries
257
comparison = daft.sql("""
258
    SELECT e.name, 
259
           e.department, 
260
           e.salary,
261
           d.avg_salary,
262
           e.salary - d.avg_salary as salary_diff
263
    FROM employees e
264
    JOIN dept_averages d ON e.department = d.department
265
    ORDER BY salary_diff DESC
266
""")
267
```
268

269
### Error Handling and Validation
270
```python
271
try:
272
    # Execute potentially problematic query
273
    result = daft.sql("""
274
        SELECT name, salary / 0 as invalid_calc
275
        FROM employees
276
    """)
277
except Exception as e:
278
    print(f"SQL error: {e}")
279

280
# Validate table exists before querying
281
if daft.has_table("employees"):
282
    result = daft.sql("SELECT COUNT(*) FROM employees")
283
else:
284
    print("Table 'employees' not found")
285
```
286

287
## SQL Feature Support
288

289
Daft's SQL interface supports:
290

291
### Standard SQL Features
292
- SELECT, FROM, WHERE, GROUP BY, HAVING, ORDER BY
293
- JOINs (INNER, LEFT, RIGHT, FULL OUTER)
294
- Subqueries and CTEs (WITH clauses)
295
- Window functions (ROW_NUMBER, RANK, DENSE_RANK, etc.)
296
- Aggregate functions (COUNT, SUM, AVG, MIN, MAX, etc.)
297
- Mathematical and string functions
298
- Date/time functions
299
- CASE expressions and conditional logic
300

301
### Daft-Specific Extensions
302
- Integration with registered DataFrames and temporary tables
303
- Support for complex data types (arrays, structs, nested data)
304
- Multimodal data operations
305
- Cloud storage path references
306
- Custom UDF integration in SQL context
307

308
## Table Management for SQL
309

310
```python { .api }
311
def attach_table(df: DataFrame, name: str) -> None:
312
    """Register DataFrame as named table for SQL queries."""
313

314
def detach_table(name: str) -> None:
315
    """Remove registered table from SQL context."""
316

317
def has_table(name: str) -> bool:
318
    """Check if table is registered for SQL queries."""
319

320
def list_tables() -> List[str]:
321
    """List all registered tables."""
322
```
323

324
SQL queries can reference any registered table by name, enabling complex multi-table operations and reusable query patterns.

Version

Tile

Files

sql.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

sql.mddocs/