or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

core-data-structures.mddata-manipulation.mdexpression-system.mdfile-io.mdindex.mdmathematical-functions.mdreductions-aggregations.mdrow-operations.mdset-operations.mdstring-operations.mdtime-operations.mdtype-system.md

expression-system.mddocs/

0

# Expression System

1

2

datatable's expression system provides a powerful and flexible way to reference columns and build complex data transformations using f and g objects.

3

4

## Capabilities

5

6

### Column References

7

8

Primary objects for referencing columns in Frame operations and building expressions.

9

10

```python { .api }

11

f: object

12

"""

13

Primary column reference object for selecting and transforming columns.

14

Used in most datatable operations for column selection and expressions.

15

"""

16

17

g: object

18

"""

19

Secondary column reference object primarily used in join operations

20

to distinguish between columns from different frames.

21

"""

22

```

23

24

### Expression Object

25

26

```python { .api }

27

class FExpr:

28

"""

29

Expression object representing column operations and transformations.

30

Created automatically when using f/g objects with operators.

31

"""

32

33

def __add__(self, other): ...

34

def __sub__(self, other): ...

35

def __mul__(self, other): ...

36

def __truediv__(self, other): ...

37

def __mod__(self, other): ...

38

def __pow__(self, other): ...

39

40

def __eq__(self, other): ...

41

def __ne__(self, other): ...

42

def __lt__(self, other): ...

43

def __le__(self, other): ...

44

def __gt__(self, other): ...

45

def __ge__(self, other): ...

46

47

def __and__(self, other): ...

48

def __or__(self, other): ...

49

def __invert__(self): ...

50

```

51

52

### Grouping and Joining Helpers

53

54

```python { .api }

55

def by(*args):

56

"""

57

Create grouping specification for group-by operations.

58

59

Parameters:

60

- *args: Column expressions or names to group by

61

62

Returns:

63

Grouping object for use in Frame operations

64

"""

65

66

def join(frame, on=None, how='inner'):

67

"""

68

Create join specification for joining frames.

69

70

Parameters:

71

- frame: Frame to join with

72

- on: Columns to join on (auto-detected if None)

73

- how: Join type ('inner', 'left', 'right', 'outer')

74

75

Returns:

76

Join object for use in Frame operations

77

"""

78

```

79

80

### Update Operations

81

82

```python { .api }

83

def update(**kwargs):

84

"""

85

Create update specification for adding or modifying columns.

86

87

Parameters:

88

- **kwargs: column_name=expression pairs

89

90

Returns:

91

Update object for use in Frame operations

92

"""

93

```

94

95

## Expression Examples

96

97

### Basic Column Selection

98

99

```python

100

import datatable as dt

101

from datatable import f, g

102

103

DT = dt.Frame({

104

'A': [1, 2, 3, 4, 5],

105

'B': [10, 20, 30, 40, 50],

106

'C': ['x', 'y', 'z', 'x', 'y']

107

})

108

109

# Column selection

110

DT[:, f.A] # Select column A

111

DT[:, f[0]] # Select first column by index

112

DT[:, f['A']] # Select column by name string

113

DT[:, [f.A, f.B]] # Select multiple columns

114

DT[:, f.A:f.C] # Select column range

115

DT[:, f[:]] # Select all columns

116

```

117

118

### Mathematical Expressions

119

120

```python

121

# Arithmetic operations

122

DT[:, f.A + f.B] # Add columns

123

DT[:, f.A * 2] # Multiply by scalar

124

DT[:, f.A / f.B] # Division

125

DT[:, f.A ** 2] # Power

126

DT[:, f.A % 3] # Modulo

127

128

# Create new columns with expressions

129

DT[:, dt.update(D=f.A + f.B, E=f.A * f.B)]

130

```

131

132

### Conditional Expressions

133

134

```python

135

# Boolean operations

136

DT[f.A > 2, :] # Filter rows

137

DT[f.A > f.B, :] # Compare columns

138

DT[(f.A > 2) & (f.B < 40), :] # Logical AND

139

DT[(f.A < 2) | (f.A > 4), :] # Logical OR

140

DT[~(f.A == 3), :] # Logical NOT

141

142

# Conditional assignment

143

DT[:, dt.update(flag=dt.ifelse(f.A > 3, "high", "low"))]

144

```

145

146

### String Operations

147

148

```python

149

# String operations using dt.str module

150

DT[:, dt.str.len(f.C)] # String length

151

DT[:, dt.str.slice(f.C, 0, 1)] # String slicing

152

153

# String comparisons

154

DT[f.C == 'x', :] # String equality

155

DT[dt.re.match(f.C, '[xy]'), :] # Regex matching

156

```

157

158

### Aggregation with Expressions

159

160

```python

161

# Basic aggregations

162

DT[:, dt.sum(f.A)] # Sum of column A

163

DT[:, dt.mean(f.B)] # Mean of column B

164

DT[:, dt.count()] # Row count

165

166

# Group-by operations

167

DT[:, dt.sum(f.A), dt.by(f.C)] # Sum A grouped by C

168

DT[:, [dt.sum(f.A), dt.mean(f.B)], dt.by(f.C)] # Multiple aggregations

169

170

# Multiple grouping columns

171

DT[:, dt.sum(f.A), dt.by(f.C, f.A > 2)]

172

```

173

174

### Join Operations with g

175

176

```python

177

DT2 = dt.Frame({

178

'C': ['x', 'y', 'z'],

179

'X': [100, 200, 300]

180

})

181

182

# Join using g to reference right frame columns

183

result = DT[:, [f.A, f.B, g.X], dt.join(DT2)]

184

185

# Join with conditions

186

result = DT[:, :, dt.join(DT2, on=f.C==g.C)]

187

```

188

189

### Time Series Expressions

190

191

```python

192

# Time component extraction

193

DT_time = dt.Frame({

194

'date': [dt.datetime(2023, 1, 1), dt.datetime(2023, 6, 15)]

195

})

196

197

DT_time[:, dt.time.year(f.date)] # Extract year

198

DT_time[:, dt.time.month(f.date)] # Extract month

199

DT_time[:, dt.time.day(f.date)] # Extract day

200

```

201

202

### Mathematical Functions

203

204

```python

205

# Mathematical functions

206

DT[:, dt.abs(f.A - 3)] # Absolute value

207

DT[:, dt.exp(f.A)] # Exponential

208

DT[:, dt.log(f.A)] # Natural logarithm

209

DT[:, dt.sqrt(f.A)] # Square root

210

DT[:, dt.isna(f.A)] # Check for missing values

211

```

212

213

### Row-wise Operations

214

215

```python

216

# Row-wise operations across columns

217

DT[:, dt.rowsum(f.A, f.B)] # Row-wise sum

218

DT[:, dt.rowmean(f.A, f.B)] # Row-wise mean

219

DT[:, dt.rowmax(f.A, f.B)] # Row-wise maximum

220

DT[:, dt.rowcount(f.A, f.B)] # Row-wise count of non-missing

221

```

222

223

### Complex Expressions

224

225

```python

226

# Chained operations

227

result = DT[f.A > dt.mean(f.A),

228

[f.A, f.B, dt.update(ratio=f.A/f.B)]]

229

230

# Conditional aggregations

231

DT[:, dt.sum(dt.ifelse(f.A > 3, f.B, 0)), dt.by(f.C)]

232

233

# Window functions with shift

234

DT[:, dt.update(prev_A=dt.shift(f.A, 1)), dt.by(f.C)]

235

```

236

237

## Types

238

239

### Expression Types

240

241

```python { .api }

242

class Expr:

243

"""Base expression class for column operations"""

244

pass

245

```