0
# Table Operations and Transformations
1
2
Comprehensive table operations including filtering, selection, aggregation, joins, sorting, and transformations for data analysis.
3
4
## Capabilities
5
6
### Column Selection
7
8
Select specific columns or computed expressions from tables.
9
10
```python { .api }
11
def select(*exprs):
12
"""
13
Select columns or computed expressions.
14
15
Parameters:
16
- *exprs: column names, expressions, or keyword expressions
17
18
Returns:
19
Table with selected columns
20
"""
21
```
22
23
**Usage Examples:**
24
```python
25
# Select specific columns
26
result = table.select('name', 'age')
27
28
# Select with expressions
29
result = table.select(
30
'name',
31
age_next_year=table.age + 1,
32
is_adult=table.age >= 18
33
)
34
35
# Select all columns
36
result = table.select('*')
37
```
38
39
### Row Filtering
40
41
Filter rows based on boolean predicates.
42
43
```python { .api }
44
def filter(*predicates):
45
"""
46
Filter rows based on predicates.
47
48
Parameters:
49
- *predicates: boolean expressions
50
51
Returns:
52
Table with filtered rows
53
"""
54
```
55
56
**Usage Examples:**
57
```python
58
# Single condition
59
result = table.filter(table.age > 25)
60
61
# Multiple conditions (AND)
62
result = table.filter(table.age > 25, table.salary > 50000)
63
64
# Complex conditions
65
result = table.filter(
66
(table.age > 25) & (table.department == 'Engineering')
67
)
68
```
69
70
### Grouping and Aggregation
71
72
Group rows and compute aggregate statistics.
73
74
```python { .api }
75
def group_by(*exprs):
76
"""
77
Group table by expressions.
78
79
Parameters:
80
- *exprs: column names or expressions to group by
81
82
Returns:
83
GroupedTable for aggregation
84
"""
85
86
def aggregate(**kwargs):
87
"""
88
Aggregate entire table or grouped table.
89
90
Parameters:
91
- **kwargs: name=expression pairs for aggregations
92
93
Returns:
94
Table with aggregated results
95
"""
96
```
97
98
**Usage Examples:**
99
```python
100
# Group by single column
101
result = table.group_by('department').aggregate(
102
avg_salary=table.salary.mean(),
103
count=table.count()
104
)
105
106
# Group by multiple columns
107
result = table.group_by('department', 'level').aggregate(
108
total_salary=table.salary.sum(),
109
employee_count=table.count()
110
)
111
112
# Aggregate without grouping
113
result = table.aggregate(
114
total_employees=table.count(),
115
avg_age=table.age.mean()
116
)
117
```
118
119
### Sorting
120
121
Sort table rows by one or more expressions.
122
123
```python { .api }
124
def order_by(*exprs):
125
"""
126
Sort table by expressions.
127
128
Parameters:
129
- *exprs: column names, expressions, or sort keys
130
131
Returns:
132
Sorted table
133
"""
134
135
def asc(expr):
136
"""Create ascending sort key."""
137
138
def desc(expr):
139
"""Create descending sort key."""
140
```
141
142
**Usage Examples:**
143
```python
144
# Sort by single column
145
result = table.order_by('name')
146
147
# Sort by multiple columns
148
result = table.order_by('department', 'salary')
149
150
# Explicit sort direction
151
result = table.order_by(
152
ibis.asc('department'),
153
ibis.desc('salary')
154
)
155
```
156
157
### Row Limiting
158
159
Limit the number of rows returned.
160
161
```python { .api }
162
def limit(n, offset=0):
163
"""
164
Limit number of rows.
165
166
Parameters:
167
- n: int, maximum number of rows
168
- offset: int, number of rows to skip
169
170
Returns:
171
Table with limited rows
172
"""
173
174
def head(n=5):
175
"""Return first n rows."""
176
177
def tail(n=5):
178
"""Return last n rows (requires ordering)."""
179
```
180
181
**Usage Examples:**
182
```python
183
# Top 10 rows
184
result = table.limit(10)
185
186
# Skip first 20, take next 10
187
result = table.limit(10, offset=20)
188
189
# Convenient methods
190
top_5 = table.head(5)
191
bottom_5 = table.order_by('salary').tail(5)
192
```
193
194
### Joins
195
196
Join tables using various join types and conditions.
197
198
```python { .api }
199
def join(other, predicates=None, how='inner'):
200
"""
201
Join with another table.
202
203
Parameters:
204
- other: Table to join with
205
- predicates: join conditions or None for cross join
206
- how: join type ('inner', 'left', 'right', 'outer', 'semi', 'anti')
207
208
Returns:
209
Joined table
210
"""
211
212
def cross_join(other):
213
"""Cross join (Cartesian product)."""
214
215
def asof_join(other, predicates, tolerance=None):
216
"""As-of join for time-series data."""
217
```
218
219
**Usage Examples:**
220
```python
221
# Inner join
222
result = employees.join(
223
departments,
224
employees.dept_id == departments.id
225
)
226
227
# Left join with multiple conditions
228
result = employees.join(
229
departments,
230
[employees.dept_id == departments.id,
231
employees.location == departments.location],
232
how='left'
233
)
234
235
# Cross join
236
result = table1.cross_join(table2)
237
```
238
239
### Set Operations
240
241
Combine tables using set operations.
242
243
```python { .api }
244
def union(table, /, *rest, distinct=False):
245
"""
246
Compute multiset (or set) union of multiple tables.
247
248
Parameters:
249
- table: Table to union with (positional-only)
250
- *rest: Additional tables to union
251
- distinct: bool, use set union (True) or multiset union (False, default)
252
253
Returns:
254
Union of all tables
255
"""
256
257
def intersect(other):
258
"""Intersection with another table."""
259
260
def difference(other):
261
"""Difference from another table."""
262
```
263
264
**Usage Examples:**
265
```python
266
# Union tables (multiset - keeps duplicates by default)
267
combined = table1.union(table2)
268
269
# Union with distinct values only
270
combined = table1.union(table2, distinct=True)
271
272
# Set operations
273
common = table1.intersect(table2)
274
unique_to_first = table1.difference(table2)
275
```
276
277
### Column Operations
278
279
Add, drop, and rename columns.
280
281
```python { .api }
282
def mutate(**kwargs):
283
"""
284
Add or modify columns.
285
286
Parameters:
287
- **kwargs: name=expression pairs
288
289
Returns:
290
Table with new/modified columns
291
"""
292
293
def drop(*columns):
294
"""
295
Drop columns.
296
297
Parameters:
298
- *columns: column names to drop
299
300
Returns:
301
Table without specified columns
302
"""
303
304
def rename(**kwargs):
305
"""
306
Rename columns.
307
308
Parameters:
309
- **kwargs: old_name=new_name pairs
310
311
Returns:
312
Table with renamed columns
313
"""
314
```
315
316
**Usage Examples:**
317
```python
318
# Add computed columns
319
result = table.mutate(
320
age_next_year=table.age + 1,
321
full_name=table.first_name + ' ' + table.last_name
322
)
323
324
# Drop columns
325
result = table.drop('temp_column', 'unused_field')
326
327
# Rename columns
328
result = table.rename(old_name='new_name', id='employee_id')
329
```
330
331
### Distinct Values
332
333
Remove duplicate rows or get unique values.
334
335
```python { .api }
336
def distinct(*exprs):
337
"""
338
Get distinct rows or distinct values for expressions.
339
340
Parameters:
341
- *exprs: expressions to consider for distinctness (all columns if empty)
342
343
Returns:
344
Table with distinct rows
345
"""
346
```
347
348
**Usage Examples:**
349
```python
350
# Distinct rows
351
unique_rows = table.distinct()
352
353
# Distinct values for specific columns
354
unique_combinations = table.distinct('department', 'level')
355
```
356
357
### Sampling
358
359
Sample random rows from the table.
360
361
```python { .api }
362
def sample(fraction=None, method='row'):
363
"""
364
Sample rows from table.
365
366
Parameters:
367
- fraction: float, fraction of rows to sample (0.0 to 1.0)
368
- method: str, sampling method
369
370
Returns:
371
Sampled table
372
"""
373
```
374
375
**Usage Example:**
376
```python
377
# Sample 10% of rows
378
sample = table.sample(0.1)
379
```