0
# Transformations
1
2
Data transformation operations that can be applied to datasets within chart specifications, including filtering, aggregation, binning, and calculations. Transformations allow data preprocessing without modifying the original dataset.
3
4
## Capabilities
5
6
### Filtering Transformations
7
8
Transform methods for filtering data based on predicates and conditions.
9
10
```python { .api }
11
def transform_filter(self, predicate):
12
"""
13
Filter data based on a predicate expression.
14
15
Parameters:
16
- predicate: Filter expression (string, dict, or Predicate object)
17
18
Returns:
19
Chart: New chart with filter transformation
20
"""
21
22
def transform_sample(self, sample=1000):
23
"""
24
Sample random subset of data.
25
26
Parameters:
27
- sample: Number of rows to sample
28
29
Returns:
30
Chart: New chart with sample transformation
31
"""
32
```
33
34
### Aggregation Transformations
35
36
Transform methods for aggregating and summarizing data.
37
38
```python { .api }
39
def transform_aggregate(self, aggregate=None, groupby=None, **kwargs):
40
"""
41
Aggregate data with summary statistics.
42
43
Parameters:
44
- aggregate: List of aggregation specifications
45
- groupby: List of fields to group by
46
47
Returns:
48
Chart: New chart with aggregate transformation
49
"""
50
51
def transform_joinaggregate(self, joinaggregate=None, groupby=None, **kwargs):
52
"""
53
Join aggregated values back to individual records.
54
55
Parameters:
56
- joinaggregate: List of aggregate calculations
57
- groupby: List of fields to group by
58
59
Returns:
60
Chart: New chart with joinaggregate transformation
61
"""
62
63
def transform_window(
64
self,
65
window=None,
66
frame=None,
67
groupby=None,
68
sort=None,
69
ignorePeers=None,
70
**kwargs
71
):
72
"""
73
Apply sliding window calculations.
74
75
Parameters:
76
- window: List of window operations
77
- frame: Window frame specification
78
- groupby: List of fields to group by
79
- sort: Sort specification for window
80
- ignorePeers: Whether to ignore peer values
81
82
Returns:
83
Chart: New chart with window transformation
84
"""
85
```
86
87
### Data Reshaping Transformations
88
89
Transform methods for reshaping and restructuring data.
90
91
```python { .api }
92
def transform_fold(self, fold, as_=None, **kwargs):
93
"""
94
Fold specified fields into key-value pairs.
95
96
Parameters:
97
- fold: List of field names to fold
98
- as_: Output field names [key, value]
99
100
Returns:
101
Chart: New chart with fold transformation
102
"""
103
104
def transform_pivot(
105
self,
106
pivot,
107
value,
108
groupby=None,
109
limit=None,
110
op=None,
111
**kwargs
112
):
113
"""
114
Pivot data from long to wide format.
115
116
Parameters:
117
- pivot: Field to use for new column names
118
- value: Field containing values to pivot
119
- groupby: Fields to group by
120
- limit: Maximum number of pivot columns
121
- op: Aggregation operation for duplicate values
122
123
Returns:
124
Chart: New chart with pivot transformation
125
"""
126
127
def transform_flatten(self, flatten, as_=None, **kwargs):
128
"""
129
Flatten array fields into separate records.
130
131
Parameters:
132
- flatten: List of array fields to flatten
133
- as_: Output field names
134
135
Returns:
136
Chart: New chart with flatten transformation
137
"""
138
```
139
140
### Calculation Transformations
141
142
Transform methods for creating calculated fields and derived data.
143
144
```python { .api }
145
def transform_calculate(self, calculate=None, as_=None, **kwargs):
146
"""
147
Calculate new fields using expressions.
148
149
Parameters:
150
- calculate: Expression string or list of calculations
151
- as_: Output field name(s)
152
153
Returns:
154
Chart: New chart with calculate transformation
155
"""
156
157
def transform_timeunit(self, timeUnit=None, field=None, as_=None, **kwargs):
158
"""
159
Apply time unit transformations to temporal data.
160
161
Parameters:
162
- timeUnit: Time unit specification
163
- field: Input temporal field
164
- as_: Output field name
165
166
Returns:
167
Chart: New chart with timeunit transformation
168
"""
169
```
170
171
### Binning Transformations
172
173
Transform methods for binning continuous data into discrete groups.
174
175
```python { .api }
176
def transform_bin(
177
self,
178
field=None,
179
as_=None,
180
bin=None,
181
anchor=None,
182
base=None,
183
divide=None,
184
extent=None,
185
maxbins=None,
186
minstep=None,
187
nice=None,
188
step=None,
189
steps=None,
190
**kwargs
191
):
192
"""
193
Bin continuous data into discrete groups.
194
195
Parameters:
196
- field: Field to bin
197
- as_: Output field names [start, end]
198
- bin: Binning parameters (bool or BinParams)
199
- anchor: Anchor value for bin boundaries
200
- base: Base value for logarithmic binning
201
- divide: Division factors for nice bins
202
- extent: Data extent for binning
203
- maxbins: Maximum number of bins
204
- minstep: Minimum step size
205
- nice: Whether to use nice bin boundaries
206
- step: Explicit step size
207
- steps: List of allowable step sizes
208
209
Returns:
210
Chart: New chart with bin transformation
211
"""
212
```
213
214
### Data Enrichment Transformations
215
216
Transform methods for enriching data through lookups and joins.
217
218
```python { .api }
219
def transform_lookup(
220
self,
221
lookup=None,
222
from_=None,
223
as_=None,
224
default=None,
225
**kwargs
226
):
227
"""
228
Lookup and join data from external sources.
229
230
Parameters:
231
- lookup: Field in current data to lookup
232
- from_: External data source specification
233
- as_: Output field names
234
- default: Default value for missing lookups
235
236
Returns:
237
Chart: New chart with lookup transformation
238
"""
239
240
def transform_impute(
241
self,
242
impute=None,
243
key=None,
244
value=None,
245
method=None,
246
frame=None,
247
groupby=None,
248
keyvals=None,
249
**kwargs
250
):
251
"""
252
Impute missing values using various methods.
253
254
Parameters:
255
- impute: Field to impute
256
- key: Key field for imputation
257
- value: Value to impute
258
- method: Imputation method ('value', 'mean', 'median', 'max', 'min')
259
- frame: Window frame for imputation
260
- groupby: Fields to group by
261
- keyvals: Explicit key values for imputation
262
263
Returns:
264
Chart: New chart with impute transformation
265
"""
266
```
267
268
### Statistical Transformations
269
270
Transform methods for statistical analysis and modeling.
271
272
```python { .api }
273
def transform_regression(
274
self,
275
regression=None,
276
on=None,
277
method=None,
278
order=None,
279
extent=None,
280
params=None,
281
as_=None,
282
**kwargs
283
):
284
"""
285
Fit regression models to data.
286
287
Parameters:
288
- regression: Y field for regression
289
- on: X field for regression
290
- method: Regression method ('linear', 'log', 'exp', 'pow', 'quad', 'poly')
291
- order: Polynomial order for 'poly' method
292
- extent: X extent for predictions
293
- params: Whether to output model parameters
294
- as_: Output field names
295
296
Returns:
297
Chart: New chart with regression transformation
298
"""
299
300
def transform_loess(
301
self,
302
loess=None,
303
on=None,
304
bandwidth=None,
305
as_=None,
306
**kwargs
307
):
308
"""
309
Apply LOESS smoothing regression.
310
311
Parameters:
312
- loess: Y field for smoothing
313
- on: X field for smoothing
314
- bandwidth: LOESS bandwidth parameter
315
- as_: Output field names
316
317
Returns:
318
Chart: New chart with loess transformation
319
"""
320
321
def transform_quantile(
322
self,
323
quantile=None,
324
probs=None,
325
step=None,
326
as_=None,
327
**kwargs
328
):
329
"""
330
Calculate quantile values.
331
332
Parameters:
333
- quantile: Field to calculate quantiles for
334
- probs: List of quantile probabilities
335
- step: Step size for quantile sequence
336
- as_: Output field names
337
338
Returns:
339
Chart: New chart with quantile transformation
340
"""
341
342
def transform_density(
343
self,
344
density=None,
345
bandwidth=None,
346
extent=None,
347
as_=None,
348
counts=None,
349
cumulative=None,
350
**kwargs
351
):
352
"""
353
Estimate probability density functions.
354
355
Parameters:
356
- density: Field to estimate density for
357
- bandwidth: Kernel bandwidth
358
- extent: Data extent for estimation
359
- as_: Output field names
360
- counts: Whether to output counts instead of densities
361
- cumulative: Whether to output cumulative distribution
362
363
Returns:
364
Chart: New chart with density transformation
365
"""
366
```
367
368
### Spatial Transformations
369
370
Transform methods for spatial and geographic data processing.
371
372
```python { .api }
373
def transform_extent(self, extent=None, param=None, **kwargs):
374
"""
375
Calculate spatial extent of geographic features.
376
377
Parameters:
378
- extent: Geographic field
379
- param: Parameter to store extent
380
381
Returns:
382
Chart: New chart with extent transformation
383
"""
384
```
385
386
### Layout Transformations
387
388
Transform methods for arranging and stacking data for specific visualizations.
389
390
```python { .api }
391
def transform_stack(
392
self,
393
stack=None,
394
groupby=None,
395
sort=None,
396
offset=None,
397
as_=None,
398
**kwargs
399
):
400
"""
401
Stack data for cumulative visualizations.
402
403
Parameters:
404
- stack: Field to stack
405
- groupby: Fields to group by
406
- sort: Sort specification
407
- offset: Stack offset ('zero', 'center', 'normalize')
408
- as_: Output field names [start, end]
409
410
Returns:
411
Chart: New chart with stack transformation
412
"""
413
```
414
415
## Usage Examples
416
417
### Basic Filtering
418
419
```python
420
import altair as alt
421
422
# Filter with expression string
423
chart = alt.Chart(data).mark_point().encode(
424
x='x:Q',
425
y='y:Q'
426
).transform_filter(
427
'datum.value > 100'
428
)
429
430
# Filter with field predicate
431
chart = alt.Chart(data).mark_point().encode(
432
x='x:Q',
433
y='y:Q'
434
).transform_filter(
435
alt.datum.category == 'A'
436
)
437
```
438
439
### Aggregation
440
441
```python
442
# Basic aggregation
443
chart = alt.Chart(data).mark_bar().encode(
444
x='category:N',
445
y='mean_value:Q'
446
).transform_aggregate(
447
mean_value='mean(value)',
448
groupby=['category']
449
)
450
451
# Window calculation
452
chart = alt.Chart(data).mark_line().encode(
453
x='date:T',
454
y='cumulative_sum:Q'
455
).transform_window(
456
cumulative_sum='sum(value)',
457
frame=[None, 0]
458
)
459
```
460
461
### Data Reshaping
462
463
```python
464
# Fold wide data to long format
465
chart = alt.Chart(data).mark_bar().encode(
466
x='key:N',
467
y='value:Q',
468
color='key:N'
469
).transform_fold(
470
['col1', 'col2', 'col3'],
471
as_=['key', 'value']
472
)
473
474
# Pivot long data to wide format
475
chart = alt.Chart(data).mark_rect().encode(
476
x='category:N',
477
y='metric:N',
478
color='value:Q'
479
).transform_pivot(
480
'metric',
481
value='value',
482
groupby=['category']
483
)
484
```
485
486
### Calculated Fields
487
488
```python
489
# Simple calculation
490
chart = alt.Chart(data).mark_point().encode(
491
x='x:Q',
492
y='calculated_y:Q'
493
).transform_calculate(
494
calculated_y='datum.y * 2 + 10'
495
)
496
497
# Multiple calculations
498
chart = alt.Chart(data).mark_point().encode(
499
x='x:Q',
500
y='ratio:Q',
501
size='total:Q'
502
).transform_calculate(
503
ratio='datum.numerator / datum.denominator',
504
total='datum.numerator + datum.denominator'
505
)
506
```
507
508
### Statistical Analysis
509
510
```python
511
# Linear regression
512
chart = alt.Chart(data).mark_line().encode(
513
x='x:Q',
514
y='y:Q'
515
).transform_regression(
516
'y', 'x',
517
method='linear'
518
)
519
520
# LOESS smoothing
521
chart = alt.Chart(data).mark_line().encode(
522
x='x:Q',
523
y='y:Q'
524
).transform_loess(
525
'y', 'x',
526
bandwidth=0.3
527
)
528
```
529
530
## Types
531
532
```python { .api }
533
from typing import Union, Dict, Any, Optional, List
534
535
# Predicate types
536
Predicate = Union[str, Dict[str, Any]]
537
538
# Aggregation operation
539
AggregateOp = Union[
540
'argmax', 'argmin', 'average', 'count', 'distinct', 'max', 'mean',
541
'median', 'min', 'missing', 'q1', 'q3', 'ci0', 'ci1', 'stderr',
542
'stdev', 'stdevp', 'sum', 'valid', 'values', 'variance', 'variancep'
543
]
544
545
# Window operation
546
WindowOp = Union[AggregateOp, 'row_number', 'rank', 'dense_rank', 'percent_rank',
547
'cume_dist', 'ntile', 'lag', 'lead', 'first_value', 'last_value', 'nth_value']
548
549
# Imputation method
550
ImputeMethod = Union['value', 'mean', 'median', 'max', 'min']
551
552
# Stack offset
553
StackOffset = Union['zero', 'center', 'normalize']
554
555
# Regression method
556
RegressionMethod = Union['linear', 'log', 'exp', 'pow', 'quad', 'poly']
557
558
# Transform specifications
559
AggregateTransform = Dict[str, Any]
560
FilterTransform = Dict[str, Any]
561
CalculateTransform = Dict[str, Any]
562
BinTransform = Dict[str, Any]
563
WindowTransform = Dict[str, Any]
564
```