or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

ai-ml.mdcatalog.mddata-io.mddataframe-operations.mdexpressions.mdindex.mdsession.mdsql.mdudf.md

sql.mddocs/

0

# SQL Interface

1

2

Execute SQL queries directly on DataFrames and registered tables with full support for standard SQL syntax, joins, aggregations, and Daft-specific extensions for multimodal data operations.

3

4

## Capabilities

5

6

### SQL Query Execution

7

8

Execute SQL queries that return DataFrames.

9

10

```python { .api }

11

def sql(query: str) -> DataFrame:

12

"""

13

Execute SQL query and return DataFrame.

14

15

Parameters:

16

- query: SQL query string with standard SQL syntax

17

18

Returns:

19

DataFrame: Result of SQL query

20

21

Examples:

22

>>> df = daft.sql("SELECT name, age FROM my_table WHERE age > 25")

23

>>> result = daft.sql("SELECT department, AVG(salary) FROM employees GROUP BY department")

24

"""

25

```

26

27

### SQL Expression Creation

28

29

Create SQL expressions for use in DataFrame operations.

30

31

```python { .api }

32

def sql_expr(expression: str) -> Expression:

33

"""

34

Create expression from SQL string.

35

36

Parameters:

37

- expression: SQL expression string

38

39

Returns:

40

Expression: Expression object for DataFrame operations

41

42

Examples:

43

>>> expr = daft.sql_expr("age * 2 + 1")

44

>>> df.select(sql_expr("UPPER(name)").alias("upper_name"))

45

"""

46

```

47

48

## Usage Examples

49

50

### Basic SQL Queries

51

```python

52

import daft

53

54

# Create sample data

55

df = daft.from_pydict({

56

"name": ["Alice", "Bob", "Charlie", "Diana"],

57

"age": [25, 30, 35, 28],

58

"department": ["Engineering", "Sales", "Engineering", "Marketing"],

59

"salary": [75000, 65000, 85000, 70000]

60

})

61

62

# Register DataFrame for SQL queries

63

daft.attach_table(df, "employees")

64

65

# Simple SELECT query

66

result1 = daft.sql("SELECT name, age FROM employees WHERE age > 27")

67

68

# Aggregation query

69

result2 = daft.sql("""

70

SELECT department,

71

COUNT(*) as employee_count,

72

AVG(salary) as avg_salary,

73

MAX(age) as max_age

74

FROM employees

75

GROUP BY department

76

ORDER BY avg_salary DESC

77

""")

78

```

79

80

### Advanced SQL Operations

81

```python

82

# Complex JOIN operations

83

orders = daft.from_pydict({

84

"order_id": [1, 2, 3, 4],

85

"employee_name": ["Alice", "Bob", "Alice", "Charlie"],

86

"amount": [1500, 2000, 1200, 1800],

87

"order_date": ["2024-01-15", "2024-01-16", "2024-01-17", "2024-01-18"]

88

})

89

90

daft.attach_table(orders, "orders")

91

92

# JOIN with aggregation

93

sales_summary = daft.sql("""

94

SELECT e.name,

95

e.department,

96

COUNT(o.order_id) as total_orders,

97

SUM(o.amount) as total_sales,

98

AVG(o.amount) as avg_order_value

99

FROM employees e

100

LEFT JOIN orders o ON e.name = o.employee_name

101

GROUP BY e.name, e.department

102

HAVING COUNT(o.order_id) > 0

103

ORDER BY total_sales DESC

104

""")

105

```

106

107

### Window Functions in SQL

108

```python

109

# Window functions for ranking and running totals

110

windowed_results = daft.sql("""

111

SELECT name,

112

department,

113

salary,

114

ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) as dept_rank,

115

RANK() OVER (ORDER BY salary DESC) as overall_rank,

116

SUM(salary) OVER (PARTITION BY department) as dept_total_salary,

117

AVG(salary) OVER () as company_avg_salary

118

FROM employees

119

ORDER BY department, salary DESC

120

""")

121

```

122

123

### Common Table Expressions (CTEs)

124

```python

125

# Using WITH clauses for complex queries

126

cte_query = daft.sql("""

127

WITH department_stats AS (

128

SELECT department,

129

COUNT(*) as emp_count,

130

AVG(salary) as avg_salary,

131

MIN(age) as min_age,

132

MAX(age) as max_age

133

FROM employees

134

GROUP BY department

135

),

136

high_performing_depts AS (

137

SELECT department

138

FROM department_stats

139

WHERE avg_salary > 70000 AND emp_count >= 2

140

)

141

SELECT e.name, e.age, e.salary, e.department

142

FROM employees e

143

INNER JOIN high_performing_depts h ON e.department = h.department

144

ORDER BY e.salary DESC

145

""")

146

```

147

148

### Subqueries and Nested Operations

149

```python

150

# Subqueries for complex filtering

151

subquery_result = daft.sql("""

152

SELECT name, age, salary, department,

153

salary - (SELECT AVG(salary) FROM employees) as salary_diff

154

FROM employees

155

WHERE salary > (

156

SELECT AVG(salary)

157

FROM employees

158

WHERE department = 'Engineering'

159

)

160

ORDER BY salary_diff DESC

161

""")

162

```

163

164

### Mixing SQL with DataFrame Operations

165

```python

166

# Combine SQL queries with DataFrame methods

167

sql_df = daft.sql("SELECT * FROM employees WHERE age BETWEEN 25 AND 35")

168

169

# Continue with DataFrame operations

170

final_result = (sql_df

171

.filter(daft.col("salary") > 70000)

172

.select("name", "department", (daft.col("salary") * 1.1).alias("with_bonus"))

173

.collect()

174

)

175

```

176

177

### Using SQL Expressions in DataFrames

178

```python

179

# SQL expressions in DataFrame select

180

df_with_sql_expr = df.select(

181

daft.col("name"),

182

daft.sql_expr("age * 2").alias("double_age"),

183

daft.sql_expr("UPPER(department)").alias("dept_upper"),

184

daft.sql_expr("CASE WHEN salary > 70000 THEN 'High' ELSE 'Standard' END").alias("salary_tier")

185

)

186

187

# SQL expressions in filters

188

filtered_df = df.filter(

189

daft.sql_expr("age BETWEEN 25 AND 35 AND department IN ('Engineering', 'Sales')")

190

)

191

```

192

193

### Working with Multiple Tables

194

```python

195

# Register multiple tables

196

customers = daft.from_pydict({

197

"customer_id": [1, 2, 3],

198

"name": ["Corp A", "Corp B", "Corp C"],

199

"industry": ["Tech", "Finance", "Healthcare"]

200

})

201

202

daft.attach_table(customers, "customers")

203

204

# Multi-table JOIN

205

multi_table_query = daft.sql("""

206

SELECT c.name as customer_name,

207

c.industry,

208

e.name as employee_name,

209

e.department,

210

o.amount,

211

o.order_date

212

FROM customers c

213

JOIN orders o ON c.customer_id = o.order_id -- Simplified relationship

214

JOIN employees e ON o.employee_name = e.name

215

WHERE c.industry = 'Tech'

216

ORDER BY o.order_date DESC

217

""")

218

```

219

220

### Data Type Handling in SQL

221

```python

222

# Working with different data types

223

mixed_data = daft.from_pydict({

224

"id": [1, 2, 3],

225

"created_at": ["2024-01-01 10:00:00", "2024-01-02 15:30:00", "2024-01-03 09:15:00"],

226

"tags": [["python", "sql"], ["data", "analytics"], ["machine", "learning"]],

227

"metadata": ['{"priority": 1}', '{"priority": 2}', '{"priority": 3}']

228

})

229

230

daft.attach_table(mixed_data, "mixed_data")

231

232

# SQL with complex data types

233

complex_query = daft.sql("""

234

SELECT id,

235

EXTRACT(YEAR FROM created_at) as year,

236

EXTRACT(MONTH FROM created_at) as month,

237

ARRAY_LENGTH(tags) as tag_count,

238

JSON_EXTRACT(metadata, '$.priority') as priority

239

FROM mixed_data

240

WHERE ARRAY_LENGTH(tags) > 1

241

ORDER BY priority DESC

242

""")

243

```

244

245

### Temporary Tables and Views

246

```python

247

# Create temporary table from query result

248

temp_result = daft.sql("""

249

SELECT department, AVG(salary) as avg_salary

250

FROM employees

251

GROUP BY department

252

""")

253

254

daft.attach_table(temp_result, "dept_averages")

255

256

# Use temporary table in subsequent queries

257

comparison = daft.sql("""

258

SELECT e.name,

259

e.department,

260

e.salary,

261

d.avg_salary,

262

e.salary - d.avg_salary as salary_diff

263

FROM employees e

264

JOIN dept_averages d ON e.department = d.department

265

ORDER BY salary_diff DESC

266

""")

267

```

268

269

### Error Handling and Validation

270

```python

271

try:

272

# Execute potentially problematic query

273

result = daft.sql("""

274

SELECT name, salary / 0 as invalid_calc

275

FROM employees

276

""")

277

except Exception as e:

278

print(f"SQL error: {e}")

279

280

# Validate table exists before querying

281

if daft.has_table("employees"):

282

result = daft.sql("SELECT COUNT(*) FROM employees")

283

else:

284

print("Table 'employees' not found")

285

```

286

287

## SQL Feature Support

288

289

Daft's SQL interface supports:

290

291

### Standard SQL Features

292

- SELECT, FROM, WHERE, GROUP BY, HAVING, ORDER BY

293

- JOINs (INNER, LEFT, RIGHT, FULL OUTER)

294

- Subqueries and CTEs (WITH clauses)

295

- Window functions (ROW_NUMBER, RANK, DENSE_RANK, etc.)

296

- Aggregate functions (COUNT, SUM, AVG, MIN, MAX, etc.)

297

- Mathematical and string functions

298

- Date/time functions

299

- CASE expressions and conditional logic

300

301

### Daft-Specific Extensions

302

- Integration with registered DataFrames and temporary tables

303

- Support for complex data types (arrays, structs, nested data)

304

- Multimodal data operations

305

- Cloud storage path references

306

- Custom UDF integration in SQL context

307

308

## Table Management for SQL

309

310

```python { .api }

311

def attach_table(df: DataFrame, name: str) -> None:

312

"""Register DataFrame as named table for SQL queries."""

313

314

def detach_table(name: str) -> None:

315

"""Remove registered table from SQL context."""

316

317

def has_table(name: str) -> bool:

318

"""Check if table is registered for SQL queries."""

319

320

def list_tables() -> List[str]:

321

"""List all registered tables."""

322

```

323

324

SQL queries can reference any registered table by name, enabling complex multi-table operations and reusable query patterns.