0
# Expression System and Evaluation
1
2
Catalyst's expression system provides a comprehensive framework for representing computations, predicates, and data transformations. Expressions form the foundation of SQL operations and can be evaluated both interpretively and through code generation.
3
4
## Capabilities
5
6
### Expression Base Class
7
8
Core interface for all expressions in Catalyst with evaluation and code generation capabilities.
9
10
```scala { .api }
11
/**
12
* Base class for all expressions in Catalyst
13
* Expressions represent computations that can be evaluated on input data
14
*/
15
abstract class Expression extends TreeNode[Expression] {
16
/** Data type of the expression result */
17
def dataType: DataType
18
19
/** Whether expression can produce null values */
20
def nullable: Boolean
21
22
/** Whether expression can be constant folded */
23
def foldable: Boolean
24
25
/** Whether expression is deterministic (same input -> same output) */
26
def deterministic: Boolean = true
27
28
/** Set of attributes referenced by this expression */
29
def references: AttributeSet
30
31
/** Evaluate expression on input row (interpreted mode) */
32
def eval(input: InternalRow = null): Any
33
34
/** Generate Java code for expression (code generation mode) */
35
def genCode(ctx: CodegenContext): ExprCode
36
37
/** Validate input data types */
38
def checkInputDataTypes(): TypeCheckResult
39
40
/** Human-readable name for expression */
41
def prettyName: String
42
43
/** String representation for SQL */
44
def sql: String
45
46
/** Clone expression with new children */
47
def withNewChildren(newChildren: Seq[Expression]): Expression
48
}
49
```
50
51
**Usage Examples:**
52
53
```scala
54
import org.apache.spark.sql.catalyst.expressions._
55
import org.apache.spark.sql.types._
56
57
// Create literal expression
58
val literal = Literal(42, IntegerType)
59
println(literal.dataType) // IntegerType
60
println(literal.nullable) // false
61
println(literal.foldable) // true
62
println(literal.deterministic) // true
63
64
// Evaluate expression
65
val result = literal.eval() // Returns 42
66
67
// Expression properties
68
println(literal.prettyName) // "42"
69
println(literal.sql) // "42"
70
```
71
72
### Expression Traits and Hierarchy
73
74
Specialized traits for different expression patterns:
75
76
```scala { .api }
77
/** Expression with no child expressions */
78
trait LeafExpression extends Expression {
79
def children: Seq[Expression] = Nil
80
}
81
82
/** Expression with one child expression */
83
trait UnaryExpression extends Expression {
84
def child: Expression
85
def children: Seq[Expression] = child :: Nil
86
}
87
88
/** Expression with two child expressions */
89
trait BinaryExpression extends Expression {
90
def left: Expression
91
def right: Expression
92
def children: Seq[Expression] = Seq(left, right)
93
}
94
95
/** Expression with three child expressions */
96
trait TernaryExpression extends Expression {
97
def first: Expression
98
def second: Expression
99
def third: Expression
100
def children: Seq[Expression] = Seq(first, second, third)
101
}
102
103
/** Special case of BinaryExpression requiring same data type for both children */
104
trait BinaryOperator extends BinaryExpression {
105
def dataType: DataType = left.dataType
106
}
107
108
/** Marker for expressions that are not deterministic */
109
trait Nondeterministic extends Expression {
110
final override def deterministic: Boolean = false
111
112
/** Initialize per-partition state */
113
def initializeInternal(partitionIndex: Int): Unit
114
115
/** Evaluate with nondeterministic behavior */
116
def evalInternal(input: InternalRow): Any
117
}
118
119
/** Expression that should not be evaluated (analysis-time only) */
120
trait Unevaluable extends Expression {
121
final override def eval(input: InternalRow = null): Any =
122
throw new UnsupportedOperationException(s"Cannot evaluate expression: $this")
123
}
124
125
/** Expression that falls back to interpreted mode (no code generation) */
126
trait CodegenFallback extends Expression
127
128
/** Expression that cannot be expressed in SQL */
129
trait NonSQLExpression extends Expression
130
131
/** Expression replaced by another expression at runtime */
132
trait RuntimeReplaceable extends UnaryExpression with Unevaluable {
133
def replacement: Expression
134
}
135
```
136
137
### Named Expressions
138
139
Expressions that have names and can be referenced in query plans:
140
141
```scala { .api }
142
/**
143
* Expression that has a name and can be referenced
144
*/
145
trait NamedExpression extends Expression {
146
/** Expression name */
147
def name: String
148
149
/** Unique expression ID */
150
def exprId: ExprId
151
152
/** Qualification path (e.g., table name) */
153
def qualifiers: Seq[String]
154
155
/** Convert to attribute reference */
156
def toAttribute: Attribute
157
158
/** Create new instance with same name but different ID */
159
def newInstance(): NamedExpression
160
161
/** Create copy with new qualifiers */
162
def withQualifier(newQualifier: Seq[String]): NamedExpression
163
164
/** Metadata associated with this expression */
165
def metadata: Metadata
166
}
167
168
/**
169
* Reference to an attribute/column
170
*/
171
abstract class Attribute extends LeafExpression with NamedExpression with Unevaluable {
172
/** Create copy with new name */
173
def withName(newName: String): Attribute
174
175
/** Create copy with new qualifier */
176
def withQualifier(newQualifier: Seq[String]): Attribute
177
178
/** Create copy with new expression ID */
179
def withExprId(newExprId: ExprId): Attribute
180
181
/** Create copy with new data type */
182
def withDataType(newType: DataType): Attribute
183
184
/** Create copy with new nullability */
185
def withNullability(newNullability: Boolean): Attribute
186
187
/** Create copy with new metadata */
188
def withMetadata(newMetadata: Metadata): Attribute
189
}
190
191
/**
192
* Reference to an attribute with name, data type, and nullability
193
*/
194
case class AttributeReference(
195
name: String,
196
dataType: DataType,
197
nullable: Boolean = true,
198
metadata: Metadata = Metadata.empty)(
199
val exprId: ExprId = NamedExpression.newExprId,
200
val qualifier: Seq[String] = Seq.empty[String]) extends Attribute {
201
202
def qualifiers: Seq[String] = qualifier
203
204
// Implementation of withX methods for creating copies
205
def withName(newName: String): AttributeReference = copy(name = newName)(exprId, qualifier)
206
def withQualifier(newQualifier: Seq[String]): AttributeReference = copy()(exprId, newQualifier)
207
def withExprId(newExprId: ExprId): AttributeReference = copy()(newExprId, qualifier)
208
def withDataType(newType: DataType): AttributeReference = copy(dataType = newType)(exprId, qualifier)
209
def withNullability(newNullability: Boolean): AttributeReference = copy(nullable = newNullability)(exprId, qualifier)
210
def withMetadata(newMetadata: Metadata): AttributeReference = copy(metadata = newMetadata)(exprId, qualifier)
211
212
def newInstance(): AttributeReference = copy()(NamedExpression.newExprId, qualifier)
213
def toAttribute: AttributeReference = this
214
}
215
216
/**
217
* Expression with an alias name
218
*/
219
case class Alias(
220
child: Expression,
221
name: String)(
222
val exprId: ExprId = NamedExpression.newExprId,
223
val qualifier: Seq[String] = Seq.empty[String],
224
val explicitMetadata: Option[Metadata] = None) extends UnaryExpression with NamedExpression {
225
226
def qualifiers: Seq[String] = qualifier
227
def dataType: DataType = child.dataType
228
def nullable: Boolean = child.nullable
229
def metadata: Metadata = explicitMetadata.getOrElse(Metadata.empty)
230
231
def newInstance(): NamedExpression = copy()(NamedExpression.newExprId, qualifier, explicitMetadata)
232
def toAttribute: Attribute = AttributeReference(name, dataType, nullable, metadata)(exprId, qualifier)
233
}
234
```
235
236
**Usage Examples:**
237
238
```scala
239
// Attribute reference creation
240
val nameAttr = AttributeReference("name", StringType, nullable = false)()
241
val ageAttr = AttributeReference("age", IntegerType, nullable = false)()
242
243
// Qualified attribute
244
val qualifiedAttr = AttributeReference("id", LongType)(
245
exprId = NamedExpression.newExprId,
246
qualifier = Seq("users")
247
)
248
249
// Alias creation
250
val ageAlias = Alias(ageAttr, "user_age")()
251
val computedAlias = Alias(Add(ageAttr, Literal(1)), "age_plus_one")()
252
253
// Attribute properties
254
println(nameAttr.name) // "name"
255
println(nameAttr.dataType) // StringType
256
println(nameAttr.nullable) // false
257
println(nameAttr.exprId) // Unique ExprId
258
259
// Create copies with modifications
260
val nullableNameAttr = nameAttr.withNullability(true)
261
val renamedAttr = nameAttr.withName("full_name")
262
```
263
264
### Literals and Constants
265
266
Constant value expressions for representing literal data:
267
268
```scala { .api }
269
/**
270
* Constant literal value expression
271
*/
272
case class Literal(value: Any, dataType: DataType) extends LeafExpression {
273
def nullable: Boolean = value == null
274
def foldable: Boolean = true
275
276
def eval(input: InternalRow = null): Any = value
277
278
override def toString: String = if (value == null) "null" else value.toString
279
}
280
281
object Literal {
282
/** Create literal with inferred type */
283
def apply(v: Any): Literal = v match {
284
case null => Literal(null, NullType)
285
case b: Boolean => Literal(b, BooleanType)
286
case b: Byte => Literal(b, ByteType)
287
case s: Short => Literal(s, ShortType)
288
case i: Int => Literal(i, IntegerType)
289
case l: Long => Literal(l, LongType)
290
case f: Float => Literal(f, FloatType)
291
case d: Double => Literal(d, DoubleType)
292
case s: String => Literal(s, StringType)
293
case d: java.math.BigDecimal => Literal(d, DecimalType.fromBigDecimal(d))
294
case d: java.sql.Date => Literal(d, DateType)
295
case t: java.sql.Timestamp => Literal(t, TimestampType)
296
case a: Array[Byte] => Literal(a, BinaryType)
297
case _ => throw new RuntimeException(s"Unsupported literal type: ${v.getClass}")
298
}
299
300
/** Create literal with explicit type */
301
def create(v: Any, dataType: DataType): Literal = Literal(v, dataType)
302
303
/** Create from Scala object with proper internal representation */
304
def fromObject(obj: Any): Literal = apply(obj)
305
306
/** Create default value for data type */
307
def default(dataType: DataType): Literal = dataType match {
308
case BooleanType => Literal(false)
309
case ByteType => Literal(0.toByte)
310
case ShortType => Literal(0.toShort)
311
case IntegerType => Literal(0)
312
case LongType => Literal(0L)
313
case FloatType => Literal(0.0f)
314
case DoubleType => Literal(0.0)
315
case StringType => Literal("")
316
case _ => Literal(null, dataType)
317
}
318
}
319
```
320
321
**Usage Examples:**
322
323
```scala
324
// Literal creation with type inference
325
val intLit = Literal(42) // Literal(42, IntegerType)
326
val stringLit = Literal("hello") // Literal("hello", StringType)
327
val nullLit = Literal(null) // Literal(null, NullType)
328
329
// Explicit type creation
330
val decimalLit = Literal.create(new java.math.BigDecimal("123.45"), DecimalType(5, 2))
331
332
// Default values
333
val defaultInt = Literal.default(IntegerType) // Literal(0, IntegerType)
334
val defaultString = Literal.default(StringType) // Literal("", StringType)
335
336
// Evaluation
337
val result = intLit.eval() // Returns 42
338
println(intLit.foldable) // true (can be constant folded)
339
```
340
341
### Predicates and Comparisons
342
343
Boolean expressions for filtering and conditional logic:
344
345
```scala { .api }
346
/**
347
* Base class for boolean expressions/predicates
348
*/
349
abstract class Predicate extends Expression {
350
def dataType: DataType = BooleanType
351
}
352
353
/**
354
* Base class for binary comparison expressions
355
*/
356
abstract class Comparison extends BinaryExpression with Predicate
357
358
/** Equality comparison (=) */
359
case class EqualTo(left: Expression, right: Expression) extends BinaryOperator with Comparison {
360
def symbol: String = "="
361
362
def nullSafeEval(input1: Any, input2: Any): Any = input1 == input2
363
}
364
365
/** Null-safe equality comparison (<=>) */
366
case class EqualNullSafe(left: Expression, right: Expression) extends BinaryOperator with Comparison {
367
def symbol: String = "<=>"
368
override def nullable: Boolean = false
369
370
def nullSafeEval(input1: Any, input2: Any): Any = {
371
if (input1 == null && input2 == null) true
372
else if (input1 == null || input2 == null) false
373
else input1 == input2
374
}
375
}
376
377
/** Greater than comparison (>) */
378
case class GreaterThan(left: Expression, right: Expression) extends BinaryOperator with Comparison {
379
def symbol: String = ">"
380
381
def nullSafeEval(input1: Any, input2: Any): Any = {
382
RowOrdering.compare(input1, input2, left.dataType) > 0
383
}
384
}
385
386
/** Greater than or equal comparison (>=) */
387
case class GreaterThanOrEqual(left: Expression, right: Expression) extends BinaryOperator with Comparison {
388
def symbol: String = ">="
389
390
def nullSafeEval(input1: Any, input2: Any): Any = {
391
RowOrdering.compare(input1, input2, left.dataType) >= 0
392
}
393
}
394
395
/** Less than comparison (<) */
396
case class LessThan(left: Expression, right: Expression) extends BinaryOperator with Comparison {
397
def symbol: String = "<"
398
399
def nullSafeEval(input1: Any, input2: Any): Any = {
400
RowOrdering.compare(input1, input2, left.dataType) < 0
401
}
402
}
403
404
/** Less than or equal comparison (<=) */
405
case class LessThanOrEqual(left: Expression, right: Expression) extends BinaryOperator with Comparison {
406
def symbol: String = "<="
407
408
def nullSafeEval(input1: Any, input2: Any): Any = {
409
RowOrdering.compare(input1, input2, left.dataType) <= 0
410
}
411
}
412
413
/** Logical AND operation */
414
case class And(left: Expression, right: Expression) extends BinaryOperator with Predicate {
415
def symbol: String = "&&"
416
417
def eval(input: InternalRow): Any = {
418
val leftResult = left.eval(input)
419
if (leftResult == false) {
420
false
421
} else {
422
val rightResult = right.eval(input)
423
if (rightResult == false) false
424
else if (leftResult == null || rightResult == null) null
425
else true
426
}
427
}
428
}
429
430
/** Logical OR operation */
431
case class Or(left: Expression, right: Expression) extends BinaryOperator with Predicate {
432
def symbol: String = "||"
433
434
def eval(input: InternalRow): Any = {
435
val leftResult = left.eval(input)
436
if (leftResult == true) {
437
true
438
} else {
439
val rightResult = right.eval(input)
440
if (rightResult == true) true
441
else if (leftResult == null || rightResult == null) null
442
else false
443
}
444
}
445
}
446
447
/** Logical NOT operation */
448
case class Not(child: Expression) extends UnaryExpression with Predicate {
449
def dataType: DataType = BooleanType
450
451
def eval(input: InternalRow): Any = {
452
val childResult = child.eval(input)
453
if (childResult == null) null
454
else !childResult.asInstanceOf[Boolean]
455
}
456
}
457
458
/** IS NULL predicate */
459
case class IsNull(child: Expression) extends UnaryExpression with Predicate {
460
override def nullable: Boolean = false
461
462
def eval(input: InternalRow): Any = child.eval(input) == null
463
}
464
465
/** IS NOT NULL predicate */
466
case class IsNotNull(child: Expression) extends UnaryExpression with Predicate {
467
override def nullable: Boolean = false
468
469
def eval(input: InternalRow): Any = child.eval(input) != null
470
}
471
472
/** IN predicate (value IN (list)) */
473
case class In(value: Expression, list: Seq[Expression]) extends Predicate {
474
def children: Seq[Expression] = value +: list
475
def nullable: Boolean = children.exists(_.nullable)
476
477
def eval(input: InternalRow): Any = {
478
val evaluatedValue = value.eval(input)
479
if (evaluatedValue == null) {
480
null
481
} else {
482
list.exists { item =>
483
val itemValue = item.eval(input)
484
itemValue != null && evaluatedValue == itemValue
485
}
486
}
487
}
488
}
489
```
490
491
**Usage Examples:**
492
493
```scala
494
// Comparison expressions
495
val ageAttr = AttributeReference("age", IntegerType)()
496
val nameAttr = AttributeReference("name", StringType)()
497
498
val ageFilter = GreaterThan(ageAttr, Literal(25))
499
val nameFilter = EqualTo(nameAttr, Literal("Alice"))
500
501
// Logical operations
502
val combinedFilter = And(ageFilter, nameFilter)
503
val eitherFilter = Or(ageFilter, nameFilter)
504
val notFilter = Not(ageFilter)
505
506
// Null checks
507
val nullCheck = IsNull(nameAttr)
508
val notNullCheck = IsNotNull(nameAttr)
509
510
// IN predicate
511
val statusIn = In(
512
AttributeReference("status", StringType)(),
513
Seq(Literal("active"), Literal("pending"), Literal("verified"))
514
)
515
516
// Expression evaluation with sample data
517
val sampleRow = InternalRow(30, "Alice", "active")
518
val result1 = ageFilter.eval(sampleRow) // true (30 > 25)
519
val result2 = nameFilter.eval(sampleRow) // true ("Alice" == "Alice")
520
val result3 = combinedFilter.eval(sampleRow) // true (both conditions true)
521
```
522
523
### Arithmetic Expressions
524
525
Mathematical operations and numeric computations:
526
527
```scala { .api }
528
/**
529
* Base trait for binary arithmetic expressions
530
*/
531
trait BinaryArithmetic extends BinaryOperator {
532
/** Symbol for the arithmetic operation */
533
def symbol: String
534
535
/** Null-safe evaluation of the arithmetic operation */
536
def nullSafeEval(input1: Any, input2: Any): Any
537
538
def eval(input: InternalRow): Any = {
539
val leftResult = left.eval(input)
540
if (leftResult == null) {
541
null
542
} else {
543
val rightResult = right.eval(input)
544
if (rightResult == null) null
545
else nullSafeEval(leftResult, rightResult)
546
}
547
}
548
}
549
550
/** Addition operation (+) */
551
case class Add(left: Expression, right: Expression) extends BinaryArithmetic {
552
def symbol: String = "+"
553
554
def nullSafeEval(input1: Any, input2: Any): Any = dataType match {
555
case ByteType => input1.asInstanceOf[Byte] + input2.asInstanceOf[Byte]
556
case ShortType => input1.asInstanceOf[Short] + input2.asInstanceOf[Short]
557
case IntegerType => input1.asInstanceOf[Int] + input2.asInstanceOf[Int]
558
case LongType => input1.asInstanceOf[Long] + input2.asInstanceOf[Long]
559
case FloatType => input1.asInstanceOf[Float] + input2.asInstanceOf[Float]
560
case DoubleType => input1.asInstanceOf[Double] + input2.asInstanceOf[Double]
561
case DecimalType() => input1.asInstanceOf[Decimal] + input2.asInstanceOf[Decimal]
562
}
563
}
564
565
/** Subtraction operation (-) */
566
case class Subtract(left: Expression, right: Expression) extends BinaryArithmetic {
567
def symbol: String = "-"
568
569
def nullSafeEval(input1: Any, input2: Any): Any = dataType match {
570
case ByteType => input1.asInstanceOf[Byte] - input2.asInstanceOf[Byte]
571
case ShortType => input1.asInstanceOf[Short] - input2.asInstanceOf[Short]
572
case IntegerType => input1.asInstanceOf[Int] - input2.asInstanceOf[Int]
573
case LongType => input1.asInstanceOf[Long] - input2.asInstanceOf[Long]
574
case FloatType => input1.asInstanceOf[Float] - input2.asInstanceOf[Float]
575
case DoubleType => input1.asInstanceOf[Double] - input2.asInstanceOf[Double]
576
case DecimalType() => input1.asInstanceOf[Decimal] - input2.asInstanceOf[Decimal]
577
}
578
}
579
580
/** Multiplication operation (*) */
581
case class Multiply(left: Expression, right: Expression) extends BinaryArithmetic {
582
def symbol: String = "*"
583
584
def nullSafeEval(input1: Any, input2: Any): Any = dataType match {
585
case ByteType => input1.asInstanceOf[Byte] * input2.asInstanceOf[Byte]
586
case ShortType => input1.asInstanceOf[Short] * input2.asInstanceOf[Short]
587
case IntegerType => input1.asInstanceOf[Int] * input2.asInstanceOf[Int]
588
case LongType => input1.asInstanceOf[Long] * input2.asInstanceOf[Long]
589
case FloatType => input1.asInstanceOf[Float] * input2.asInstanceOf[Float]
590
case DoubleType => input1.asInstanceOf[Double] * input2.asInstanceOf[Double]
591
case DecimalType() => input1.asInstanceOf[Decimal] * input2.asInstanceOf[Decimal]
592
}
593
}
594
595
/** Division operation (/) */
596
case class Divide(left: Expression, right: Expression) extends BinaryArithmetic {
597
def symbol: String = "/"
598
599
def nullSafeEval(input1: Any, input2: Any): Any = dataType match {
600
case DoubleType => input1.asInstanceOf[Double] / input2.asInstanceOf[Double]
601
case DecimalType() => input1.asInstanceOf[Decimal] / input2.asInstanceOf[Decimal]
602
case _ => sys.error(s"Type $dataType not supported.")
603
}
604
}
605
606
/** Modulo/remainder operation (%) */
607
case class Remainder(left: Expression, right: Expression) extends BinaryArithmetic {
608
def symbol: String = "%"
609
610
def nullSafeEval(input1: Any, input2: Any): Any = dataType match {
611
case ByteType => input1.asInstanceOf[Byte] % input2.asInstanceOf[Byte]
612
case ShortType => input1.asInstanceOf[Short] % input2.asInstanceOf[Short]
613
case IntegerType => input1.asInstanceOf[Int] % input2.asInstanceOf[Int]
614
case LongType => input1.asInstanceOf[Long] % input2.asInstanceOf[Long]
615
case FloatType => input1.asInstanceOf[Float] % input2.asInstanceOf[Float]
616
case DoubleType => input1.asInstanceOf[Double] % input2.asInstanceOf[Double]
617
case DecimalType() => input1.asInstanceOf[Decimal] % input2.asInstanceOf[Decimal]
618
}
619
}
620
621
/** Unary minus operation (-) */
622
case class UnaryMinus(child: Expression) extends UnaryExpression {
623
def dataType: DataType = child.dataType
624
def nullable: Boolean = child.nullable
625
626
def eval(input: InternalRow): Any = {
627
val childResult = child.eval(input)
628
if (childResult == null) {
629
null
630
} else {
631
dataType match {
632
case ByteType => -childResult.asInstanceOf[Byte]
633
case ShortType => -childResult.asInstanceOf[Short]
634
case IntegerType => -childResult.asInstanceOf[Int]
635
case LongType => -childResult.asInstanceOf[Long]
636
case FloatType => -childResult.asInstanceOf[Float]
637
case DoubleType => -childResult.asInstanceOf[Double]
638
case DecimalType() => -childResult.asInstanceOf[Decimal]
639
}
640
}
641
}
642
}
643
644
/** Unary plus operation (+) */
645
case class UnaryPositive(child: Expression) extends UnaryExpression {
646
def dataType: DataType = child.dataType
647
def nullable: Boolean = child.nullable
648
649
def eval(input: InternalRow): Any = child.eval(input)
650
}
651
652
/** Absolute value function */
653
case class Abs(child: Expression) extends UnaryExpression {
654
def dataType: DataType = child.dataType
655
def nullable: Boolean = child.nullable
656
657
def eval(input: InternalRow): Any = {
658
val childResult = child.eval(input)
659
if (childResult == null) {
660
null
661
} else {
662
dataType match {
663
case ByteType => math.abs(childResult.asInstanceOf[Byte])
664
case ShortType => math.abs(childResult.asInstanceOf[Short])
665
case IntegerType => math.abs(childResult.asInstanceOf[Int])
666
case LongType => math.abs(childResult.asInstanceOf[Long])
667
case FloatType => math.abs(childResult.asInstanceOf[Float])
668
case DoubleType => math.abs(childResult.asInstanceOf[Double])
669
case DecimalType() => childResult.asInstanceOf[Decimal].abs
670
}
671
}
672
}
673
}
674
```
675
676
**Usage Examples:**
677
678
```scala
679
// Arithmetic expression creation
680
val ageAttr = AttributeReference("age", IntegerType)()
681
val salaryAttr = AttributeReference("salary", DoubleType)()
682
683
val agePlus10 = Add(ageAttr, Literal(10))
684
val salaryMinus1000 = Subtract(salaryAttr, Literal(1000.0))
685
val doubleSalary = Multiply(salaryAttr, Literal(2.0))
686
val halfSalary = Divide(salaryAttr, Literal(2.0))
687
688
// Unary operations
689
val negativeAge = UnaryMinus(ageAttr)
690
val absoluteValue = Abs(UnaryMinus(salaryAttr))
691
692
// Evaluation with sample data
693
val row = InternalRow(25, 50000.0)
694
val result1 = agePlus10.eval(row) // 35
695
val result2 = doubleSalary.eval(row) // 100000.0
696
val result3 = negativeAge.eval(row) // -25
697
```
698
699
### String Expressions
700
701
String manipulation and text processing operations:
702
703
```scala { .api }
704
/** String substring operation */
705
case class Substring(str: Expression, pos: Expression, len: Expression) extends TernaryExpression {
706
def dataType: DataType = StringType
707
def nullable: Boolean = str.nullable || pos.nullable || len.nullable
708
709
def first: Expression = str
710
def second: Expression = pos
711
def third: Expression = len
712
713
def eval(input: InternalRow): Any = {
714
val string = str.eval(input)
715
if (string == null) return null
716
717
val position = pos.eval(input)
718
if (position == null) return null
719
720
val length = len.eval(input)
721
if (length == null) return null
722
723
val s = string.asInstanceOf[UTF8String]
724
val start = position.asInstanceOf[Int] - 1 // 1-based to 0-based
725
val len = length.asInstanceOf[Int]
726
727
if (start < 0 || len < 0) UTF8String.EMPTY_UTF8
728
else s.substring(start, start + len)
729
}
730
}
731
732
/** String length function */
733
case class Length(child: Expression) extends UnaryExpression {
734
def dataType: DataType = IntegerType
735
def nullable: Boolean = child.nullable
736
737
def eval(input: InternalRow): Any = {
738
val childResult = child.eval(input)
739
if (childResult == null) null
740
else childResult.asInstanceOf[UTF8String].numChars()
741
}
742
}
743
744
/** Convert string to uppercase */
745
case class Upper(child: Expression) extends UnaryExpression {
746
def dataType: DataType = StringType
747
def nullable: Boolean = child.nullable
748
749
def eval(input: InternalRow): Any = {
750
val childResult = child.eval(input)
751
if (childResult == null) null
752
else childResult.asInstanceOf[UTF8String].toUpperCase
753
}
754
}
755
756
/** Convert string to lowercase */
757
case class Lower(child: Expression) extends UnaryExpression {
758
def dataType: DataType = StringType
759
def nullable: Boolean = child.nullable
760
761
def eval(input: InternalRow): Any = {
762
val childResult = child.eval(input)
763
if (childResult == null) null
764
else childResult.asInstanceOf[UTF8String].toLowerCase
765
}
766
}
767
768
/** String concatenation */
769
case class Concat(children: Seq[Expression]) extends Expression {
770
def dataType: DataType = StringType
771
def nullable: Boolean = children.exists(_.nullable)
772
773
def eval(input: InternalRow): Any = {
774
val inputs = children.map(_.eval(input))
775
if (inputs.contains(null)) return null
776
777
val strings = inputs.map(_.asInstanceOf[UTF8String])
778
UTF8String.concat(strings: _*)
779
}
780
}
781
782
/** String trimming (remove leading/trailing whitespace) */
783
case class StringTrim(child: Expression) extends UnaryExpression {
784
def dataType: DataType = StringType
785
def nullable: Boolean = child.nullable
786
787
def eval(input: InternalRow): Any = {
788
val childResult = child.eval(input)
789
if (childResult == null) null
790
else childResult.asInstanceOf[UTF8String].trim()
791
}
792
}
793
794
/** Left trim (remove leading whitespace) */
795
case class StringLTrim(child: Expression) extends UnaryExpression {
796
def dataType: DataType = StringType
797
def nullable: Boolean = child.nullable
798
799
def eval(input: InternalRow): Any = {
800
val childResult = child.eval(input)
801
if (childResult == null) null
802
else childResult.asInstanceOf[UTF8String].trimLeft()
803
}
804
}
805
806
/** Right trim (remove trailing whitespace) */
807
case class StringRTrim(child: Expression) extends UnaryExpression {
808
def dataType: DataType = StringType
809
def nullable: Boolean = child.nullable
810
811
def eval(input: InternalRow): Any = {
812
val childResult = child.eval(input)
813
if (childResult == null) null
814
else childResult.asInstanceOf[UTF8String].trimRight()
815
}
816
}
817
```
818
819
**Usage Examples:**
820
821
```scala
822
// String expression creation
823
val nameAttr = AttributeReference("name", StringType)()
824
val messageAttr = AttributeReference("message", StringType)()
825
826
val upperName = Upper(nameAttr)
827
val lowerName = Lower(nameAttr)
828
val nameLength = Length(nameAttr)
829
val substring = Substring(nameAttr, Literal(1), Literal(3)) // First 3 characters
830
831
// String concatenation
832
val greeting = Concat(Seq(Literal("Hello "), nameAttr, Literal("!")))
833
834
// String trimming
835
val trimmed = StringTrim(messageAttr)
836
837
// Evaluation with sample data
838
val row = InternalRow(UTF8String.fromString(" Alice "), UTF8String.fromString(" Hello World "))
839
val result1 = upperName.eval(row) // " ALICE "
840
val result2 = nameLength.eval(row) // 8 (including spaces)
841
val result3 = trimmed.eval(row) // "Hello World" (spaces removed)
842
```
843
844
### Mathematical Expressions
845
846
Mathematical functions and operations:
847
848
```scala { .api }
849
/**
850
* Base trait for unary mathematical expressions
851
*/
852
trait UnaryMathExpression extends UnaryExpression {
853
def dataType: DataType = DoubleType
854
def nullable: Boolean = child.nullable
855
856
/** The mathematical function to apply */
857
def mathFunction(input: Double): Double
858
859
def eval(input: InternalRow): Any = {
860
val childResult = child.eval(input)
861
if (childResult == null) {
862
null
863
} else {
864
val doubleValue = childResult match {
865
case d: Double => d
866
case f: Float => f.toDouble
867
case i: Int => i.toDouble
868
case l: Long => l.toDouble
869
case _ => sys.error(s"Unsupported data type: ${child.dataType}")
870
}
871
mathFunction(doubleValue)
872
}
873
}
874
}
875
876
/** Sine function */
877
case class Sin(child: Expression) extends UnaryMathExpression {
878
def mathFunction(input: Double): Double = math.sin(input)
879
override def prettyName: String = "sin"
880
}
881
882
/** Cosine function */
883
case class Cos(child: Expression) extends UnaryMathExpression {
884
def mathFunction(input: Double): Double = math.cos(input)
885
override def prettyName: String = "cos"
886
}
887
888
/** Tangent function */
889
case class Tan(child: Expression) extends UnaryMathExpression {
890
def mathFunction(input: Double): Double = math.tan(input)
891
override def prettyName: String = "tan"
892
}
893
894
/** Square root function */
895
case class Sqrt(child: Expression) extends UnaryMathExpression {
896
def mathFunction(input: Double): Double = math.sqrt(input)
897
override def prettyName: String = "sqrt"
898
}
899
900
/** Natural logarithm function */
901
case class Log(child: Expression) extends UnaryMathExpression {
902
def mathFunction(input: Double): Double = math.log(input)
903
override def prettyName: String = "ln"
904
}
905
906
/** Exponential function (e^x) */
907
case class Exp(child: Expression) extends UnaryMathExpression {
908
def mathFunction(input: Double): Double = math.exp(input)
909
override def prettyName: String = "exp"
910
}
911
912
/** Floor function */
913
case class Floor(child: Expression) extends UnaryMathExpression {
914
def mathFunction(input: Double): Double = math.floor(input)
915
override def prettyName: String = "floor"
916
}
917
918
/** Ceiling function */
919
case class Ceil(child: Expression) extends UnaryMathExpression {
920
def mathFunction(input: Double): Double = math.ceil(input)
921
override def prettyName: String = "ceil"
922
}
923
924
/**
925
* Base trait for binary mathematical expressions
926
*/
927
trait BinaryMathExpression extends BinaryExpression {
928
def dataType: DataType = DoubleType
929
def nullable: Boolean = left.nullable || right.nullable
930
931
/** The mathematical function to apply */
932
def mathFunction(left: Double, right: Double): Double
933
934
def eval(input: InternalRow): Any = {
935
val leftResult = left.eval(input)
936
if (leftResult == null) return null
937
938
val rightResult = right.eval(input)
939
if (rightResult == null) return null
940
941
val leftDouble = leftResult.asInstanceOf[Double]
942
val rightDouble = rightResult.asInstanceOf[Double]
943
mathFunction(leftDouble, rightDouble)
944
}
945
}
946
947
/** Round function with precision */
948
case class Round(child: Expression, scale: Expression) extends BinaryExpression {
949
def left: Expression = child
950
def right: Expression = scale
951
def dataType: DataType = child.dataType
952
def nullable: Boolean = child.nullable || scale.nullable
953
954
def eval(input: InternalRow): Any = {
955
val childResult = child.eval(input)
956
if (childResult == null) return null
957
958
val scaleResult = scale.eval(input)
959
if (scaleResult == null) return null
960
961
val scaleValue = scaleResult.asInstanceOf[Int]
962
// Implementation depends on child data type
963
childResult match {
964
case d: Double => BigDecimal(d).setScale(scaleValue, BigDecimal.RoundingMode.HALF_UP).doubleValue
965
case f: Float => BigDecimal(f.toDouble).setScale(scaleValue, BigDecimal.RoundingMode.HALF_UP).floatValue
966
case _ => childResult
967
}
968
}
969
}
970
971
/** Power function (base^exponent) */
972
case class Pow(left: Expression, right: Expression) extends BinaryMathExpression {
973
def mathFunction(left: Double, right: Double): Double = math.pow(left, right)
974
override def prettyName: String = "power"
975
}
976
```
977
978
**Usage Examples:**
979
980
```scala
981
// Mathematical expression creation
982
val angleAttr = AttributeReference("angle", DoubleType)()
983
val valueAttr = AttributeReference("value", DoubleType)()
984
985
val sineValue = Sin(angleAttr)
986
val sqrtValue = Sqrt(valueAttr)
987
val logValue = Log(valueAttr)
988
val powerValue = Pow(valueAttr, Literal(2.0)) // Square
989
990
// Rounding
991
val rounded = Round(valueAttr, Literal(2)) // Round to 2 decimal places
992
993
// Evaluation with sample data
994
val row = InternalRow(math.Pi / 2, 16.0) // 90 degrees, value 16
995
val result1 = sineValue.eval(row) // ~1.0 (sin(π/2))
996
val result2 = sqrtValue.eval(row) // 4.0 (√16)
997
val result3 = powerValue.eval(row) // 256.0 (16²)
998
```
999
1000
### Complex Type Operations
1001
1002
Operations for working with arrays, maps, and structs:
1003
1004
```scala { .api }
1005
/** Extract item from array by index */
1006
case class GetArrayItem(left: Expression, right: Expression) extends BinaryExpression {
1007
def dataType: DataType = left.dataType.asInstanceOf[ArrayType].elementType
1008
def nullable: Boolean = true // Array access can return null
1009
1010
def eval(input: InternalRow): Any = {
1011
val arrayResult = left.eval(input)
1012
if (arrayResult == null) return null
1013
1014
val indexResult = right.eval(input)
1015
if (indexResult == null) return null
1016
1017
val array = arrayResult.asInstanceOf[ArrayData]
1018
val index = indexResult.asInstanceOf[Int]
1019
1020
if (index < 0 || index >= array.numElements() || array.isNullAt(index)) {
1021
null
1022
} else {
1023
array.get(index, dataType)
1024
}
1025
}
1026
}
1027
1028
/** Extract value from map by key */
1029
case class GetMapValue(left: Expression, right: Expression) extends BinaryExpression {
1030
def dataType: DataType = left.dataType.asInstanceOf[MapType].valueType
1031
def nullable: Boolean = true
1032
1033
def eval(input: InternalRow): Any = {
1034
val mapResult = left.eval(input)
1035
if (mapResult == null) return null
1036
1037
val keyResult = right.eval(input)
1038
if (keyResult == null) return null
1039
1040
val map = mapResult.asInstanceOf[MapData]
1041
val key = keyResult
1042
1043
// Find key in map and return corresponding value
1044
val keys = map.keyArray()
1045
val values = map.valueArray()
1046
val keyType = left.dataType.asInstanceOf[MapType].keyType
1047
1048
for (i <- 0 until keys.numElements()) {
1049
if (!keys.isNullAt(i) && keys.get(i, keyType) == key) {
1050
if (values.isNullAt(i)) return null
1051
else return values.get(i, dataType)
1052
}
1053
}
1054
null // Key not found
1055
}
1056
}
1057
1058
/** Extract field from struct by index */
1059
case class GetStructField(child: Expression, ordinal: Int, name: Option[String] = None) extends UnaryExpression {
1060
def dataType: DataType = child.dataType.asInstanceOf[StructType].fields(ordinal).dataType
1061
def nullable: Boolean = child.nullable || child.dataType.asInstanceOf[StructType].fields(ordinal).nullable
1062
1063
def eval(input: InternalRow): Any = {
1064
val structResult = child.eval(input)
1065
if (structResult == null) return null
1066
1067
val struct = structResult.asInstanceOf[InternalRow]
1068
if (struct.isNullAt(ordinal)) null
1069
else struct.get(ordinal, dataType)
1070
}
1071
}
1072
1073
/** Create array from expressions */
1074
case class CreateArray(children: Seq[Expression]) extends Expression {
1075
def dataType: DataType = {
1076
val elementType = if (children.nonEmpty) children.head.dataType else NullType
1077
ArrayType(elementType, containsNull = children.exists(_.nullable))
1078
}
1079
def nullable: Boolean = false // Array itself is never null (but elements can be)
1080
1081
def eval(input: InternalRow): Any = {
1082
val values = children.map(_.eval(input))
1083
new GenericArrayData(values)
1084
}
1085
}
1086
1087
/** Create map from key-value expressions (alternating keys and values) */
1088
case class CreateMap(children: Seq[Expression]) extends Expression {
1089
require(children.size % 2 == 0, "CreateMap should have an even number of arguments")
1090
1091
def dataType: DataType = {
1092
val keyType = if (children.nonEmpty) children.head.dataType else StringType
1093
val valueType = if (children.size > 1) children(1).dataType else StringType
1094
val valueContainsNull = children.indices.filter(_ % 2 == 1).exists(i => children(i).nullable)
1095
MapType(keyType, valueType, valueContainsNull)
1096
}
1097
def nullable: Boolean = false
1098
1099
def eval(input: InternalRow): Any = {
1100
val evaluatedChildren = children.map(_.eval(input))
1101
val keys = evaluatedChildren.indices.filter(_ % 2 == 0).map(evaluatedChildren(_))
1102
val values = evaluatedChildren.indices.filter(_ % 2 == 1).map(evaluatedChildren(_))
1103
1104
ArrayBasedMapData(new GenericArrayData(keys), new GenericArrayData(values))
1105
}
1106
}
1107
1108
/** Create struct from field expressions */
1109
case class CreateStruct(children: Seq[Expression]) extends Expression {
1110
def dataType: DataType = {
1111
val fields = children.zipWithIndex.map { case (child, index) =>
1112
StructField(s"col${index + 1}", child.dataType, child.nullable)
1113
}
1114
StructType(fields)
1115
}
1116
def nullable: Boolean = false
1117
1118
def eval(input: InternalRow): Any = {
1119
val values = children.map(_.eval(input))
1120
InternalRow.fromSeq(values)
1121
}
1122
}
1123
```
1124
1125
**Usage Examples:**
1126
1127
```scala
1128
// Array operations
1129
val arrayAttr = AttributeReference("items", ArrayType(StringType))()
1130
val arrayAccess = GetArrayItem(arrayAttr, Literal(0)) // First element
1131
1132
val newArray = CreateArray(Seq(
1133
Literal("apple"),
1134
Literal("banana"),
1135
Literal("orange")
1136
))
1137
1138
// Map operations
1139
val mapAttr = AttributeReference("properties", MapType(StringType, IntegerType))()
1140
val mapAccess = GetMapValue(mapAttr, Literal("age"))
1141
1142
val newMap = CreateMap(Seq(
1143
Literal("name"), Literal("Alice"),
1144
Literal("age"), Literal(25)
1145
))
1146
1147
// Struct operations
1148
val structAttr = AttributeReference("person", StructType(Seq(
1149
StructField("name", StringType),
1150
StructField("age", IntegerType)
1151
)))()
1152
1153
val nameField = GetStructField(structAttr, ordinal = 0, name = Some("name"))
1154
val ageField = GetStructField(structAttr, ordinal = 1, name = Some("age"))
1155
1156
val newStruct = CreateStruct(Seq(
1157
Literal("Bob"),
1158
Literal(30)
1159
))
1160
```
1161
1162
### Type Conversion and Casting
1163
1164
Type conversion operations for data transformation:
1165
1166
```scala { .api }
1167
/**
1168
* Type casting expression
1169
*/
1170
case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String] = None)
1171
extends UnaryExpression {
1172
1173
def nullable: Boolean = child.nullable
1174
1175
def eval(input: InternalRow): Any = {
1176
val childResult = child.eval(input)
1177
if (childResult == null) return null
1178
1179
// Cast implementation depends on source and target types
1180
cast(childResult, child.dataType, dataType)
1181
}
1182
1183
private def cast(value: Any, from: DataType, to: DataType): Any = {
1184
(from, to) match {
1185
case (StringType, IntegerType) =>
1186
try { value.asInstanceOf[UTF8String].toString.toInt }
1187
catch { case _: NumberFormatException => null }
1188
1189
case (IntegerType, StringType) =>
1190
UTF8String.fromString(value.asInstanceOf[Int].toString)
1191
1192
case (DoubleType, IntegerType) =>
1193
value.asInstanceOf[Double].toInt
1194
1195
case (IntegerType, DoubleType) =>
1196
value.asInstanceOf[Int].toDouble
1197
1198
// Many more casting rules...
1199
case _ if from == to => value // No cast needed
1200
case _ => null // Unsupported cast
1201
}
1202
}
1203
}
1204
1205
/** Safe cast that returns null on conversion failure */
1206
case class TryCast(child: Expression, dataType: DataType, timeZoneId: Option[String] = None)
1207
extends UnaryExpression {
1208
1209
def nullable: Boolean = true // Always nullable since cast can fail
1210
1211
def eval(input: InternalRow): Any = {
1212
try {
1213
Cast(child, dataType, timeZoneId).eval(input)
1214
} catch {
1215
case _: Exception => null // Return null on any cast failure
1216
}
1217
}
1218
}
1219
```
1220
1221
**Usage Examples:**
1222
1223
```scala
1224
// Type casting
1225
val stringAttr = AttributeReference("stringValue", StringType)()
1226
val numberAttr = AttributeReference("numberValue", DoubleType)()
1227
1228
val stringToInt = Cast(stringAttr, IntegerType)
1229
val doubleToInt = Cast(numberAttr, IntegerType)
1230
val intToString = Cast(Literal(42), StringType)
1231
1232
// Safe casting (returns null on failure)
1233
val safeCast = TryCast(stringAttr, IntegerType)
1234
1235
// Evaluation
1236
val row1 = InternalRow(UTF8String.fromString("123"), 45.67)
1237
val result1 = stringToInt.eval(row1) // 123
1238
val result2 = doubleToInt.eval(row1) // 45
1239
val result3 = intToString.eval(row1) // "42"
1240
1241
val row2 = InternalRow(UTF8String.fromString("invalid"), 45.67)
1242
val result4 = stringToInt.eval(row2) // null (cast failure)
1243
val result5 = safeCast.eval(row2) // null (safe cast)
1244
```
1245
1246
### Date and Time Expressions
1247
1248
Date and time operations for temporal data processing:
1249
1250
```scala { .api }
1251
/** Extract year from date/timestamp */
1252
case class Year(child: Expression) extends UnaryExpression {
1253
def dataType: DataType = IntegerType
1254
def nullable: Boolean = child.nullable
1255
1256
def eval(input: InternalRow): Any = {
1257
val childResult = child.eval(input)
1258
if (childResult == null) return null
1259
1260
child.dataType match {
1261
case DateType =>
1262
val days = childResult.asInstanceOf[Int]
1263
DateTimeUtils.getYear(days)
1264
case TimestampType =>
1265
val microseconds = childResult.asInstanceOf[Long]
1266
DateTimeUtils.getYear(microseconds)
1267
}
1268
}
1269
}
1270
1271
/** Extract month from date/timestamp */
1272
case class Month(child: Expression) extends UnaryExpression {
1273
def dataType: DataType = IntegerType
1274
def nullable: Boolean = child.nullable
1275
1276
def eval(input: InternalRow): Any = {
1277
val childResult = child.eval(input)
1278
if (childResult == null) return null
1279
1280
child.dataType match {
1281
case DateType =>
1282
val days = childResult.asInstanceOf[Int]
1283
DateTimeUtils.getMonth(days)
1284
case TimestampType =>
1285
val microseconds = childResult.asInstanceOf[Long]
1286
DateTimeUtils.getMonth(microseconds)
1287
}
1288
}
1289
}
1290
1291
/** Extract day of month from date/timestamp */
1292
case class DayOfMonth(child: Expression) extends UnaryExpression {
1293
def dataType: DataType = IntegerType
1294
def nullable: Boolean = child.nullable
1295
1296
def eval(input: InternalRow): Any = {
1297
val childResult = child.eval(input)
1298
if (childResult == null) return null
1299
1300
child.dataType match {
1301
case DateType =>
1302
val days = childResult.asInstanceOf[Int]
1303
DateTimeUtils.getDayOfMonth(days)
1304
case TimestampType =>
1305
val microseconds = childResult.asInstanceOf[Long]
1306
DateTimeUtils.getDayOfMonth(microseconds)
1307
}
1308
}
1309
}
1310
1311
/** Extract hour from timestamp */
1312
case class Hour(child: Expression) extends UnaryExpression {
1313
def dataType: DataType = IntegerType
1314
def nullable: Boolean = child.nullable
1315
1316
def eval(input: InternalRow): Any = {
1317
val childResult = child.eval(input)
1318
if (childResult == null) return null
1319
1320
val microseconds = childResult.asInstanceOf[Long]
1321
DateTimeUtils.getHour(microseconds)
1322
}
1323
}
1324
1325
/** Extract minute from timestamp */
1326
case class Minute(child: Expression) extends UnaryExpression {
1327
def dataType: DataType = IntegerType
1328
def nullable: Boolean = child.nullable
1329
1330
def eval(input: InternalRow): Any = {
1331
val childResult = child.eval(input)
1332
if (childResult == null) return null
1333
1334
val microseconds = childResult.asInstanceOf[Long]
1335
DateTimeUtils.getMinute(microseconds)
1336
}
1337
}
1338
1339
/** Extract second from timestamp */
1340
case class Second(child: Expression) extends UnaryExpression {
1341
def dataType: DataType = IntegerType
1342
def nullable: Boolean = child.nullable
1343
1344
def eval(input: InternalRow): Any = {
1345
val childResult = child.eval(input)
1346
if (childResult == null) return null
1347
1348
val microseconds = childResult.asInstanceOf[Long]
1349
DateTimeUtils.getSecond(microseconds)
1350
}
1351
}
1352
1353
/** Current timestamp function */
1354
case class CurrentTimestamp() extends LeafExpression with Nondeterministic {
1355
def dataType: DataType = TimestampType
1356
def nullable: Boolean = false
1357
1358
def initializeInternal(partitionIndex: Int): Unit = {
1359
// Initialize with current time at task start
1360
}
1361
1362
def evalInternal(input: InternalRow): Any = {
1363
System.currentTimeMillis() * 1000L // Convert to microseconds
1364
}
1365
}
1366
1367
/** Current date function */
1368
case class CurrentDate(timeZoneId: Option[String] = None) extends LeafExpression with Nondeterministic {
1369
def dataType: DataType = DateType
1370
def nullable: Boolean = false
1371
1372
def initializeInternal(partitionIndex: Int): Unit = {
1373
// Initialize with current date at task start
1374
}
1375
1376
def evalInternal(input: InternalRow): Any = {
1377
val currentTimeMillis = System.currentTimeMillis()
1378
DateTimeUtils.millisToDays(currentTimeMillis)
1379
}
1380
}
1381
1382
/** Add interval to timestamp */
1383
case class DateAdd(startDate: Expression, days: Expression) extends BinaryExpression {
1384
def left: Expression = startDate
1385
def right: Expression = days
1386
def dataType: DataType = DateType
1387
def nullable: Boolean = startDate.nullable || days.nullable
1388
1389
def eval(input: InternalRow): Any = {
1390
val startResult = startDate.eval(input)
1391
if (startResult == null) return null
1392
1393
val daysResult = days.eval(input)
1394
if (daysResult == null) return null
1395
1396
val startDays = startResult.asInstanceOf[Int]
1397
val additionalDays = daysResult.asInstanceOf[Int]
1398
startDays + additionalDays
1399
}
1400
}
1401
1402
/** Subtract interval from timestamp */
1403
case class DateSub(startDate: Expression, days: Expression) extends BinaryExpression {
1404
def left: Expression = startDate
1405
def right: Expression = days
1406
def dataType: DataType = DateType
1407
def nullable: Boolean = startDate.nullable || days.nullable
1408
1409
def eval(input: InternalRow): Any = {
1410
val startResult = startDate.eval(input)
1411
if (startResult == null) return null
1412
1413
val daysResult = days.eval(input)
1414
if (daysResult == null) return null
1415
1416
val startDays = startResult.asInstanceOf[Int]
1417
val subtractDays = daysResult.asInstanceOf[Int]
1418
startDays - subtractDays
1419
}
1420
}
1421
```
1422
1423
**Usage Examples:**
1424
1425
```scala
1426
// Date/time expression creation
1427
val timestampAttr = AttributeReference("created_at", TimestampType)()
1428
val dateAttr = AttributeReference("birth_date", DateType)()
1429
1430
val yearExtracted = Year(timestampAttr)
1431
val monthExtracted = Month(dateAttr)
1432
val hourExtracted = Hour(timestampAttr)
1433
1434
// Current date/time
1435
val now = CurrentTimestamp()
1436
val today = CurrentDate()
1437
1438
// Date arithmetic
1439
val futureDate = DateAdd(dateAttr, Literal(30)) // Add 30 days
1440
val pastDate = DateSub(dateAttr, Literal(7)) // Subtract 7 days
1441
1442
// Evaluation with sample data
1443
val currentTime = System.currentTimeMillis() * 1000L
1444
val sampleRow = InternalRow(currentTime, DateTimeUtils.millisToDays(System.currentTimeMillis()))
1445
1446
val yearResult = yearExtracted.eval(sampleRow) // Current year
1447
val monthResult = monthExtracted.eval(sampleRow) // Current month
1448
val hourResult = hourExtracted.eval(sampleRow) // Current hour
1449
```
1450
1451
### Conditional Expressions
1452
1453
Conditional logic and branching expressions:
1454
1455
```scala { .api }
1456
/** Conditional if-then-else expression */
1457
case class If(predicate: Expression, trueValue: Expression, falseValue: Expression) extends TernaryExpression {
1458
def first: Expression = predicate
1459
def second: Expression = trueValue
1460
def third: Expression = falseValue
1461
1462
def dataType: DataType = trueValue.dataType
1463
def nullable: Boolean = trueValue.nullable || falseValue.nullable
1464
1465
def eval(input: InternalRow): Any = {
1466
val predicateResult = predicate.eval(input)
1467
if (predicateResult == null) {
1468
null
1469
} else if (predicateResult.asInstanceOf[Boolean]) {
1470
trueValue.eval(input)
1471
} else {
1472
falseValue.eval(input)
1473
}
1474
}
1475
}
1476
1477
/** Case-when conditional expression */
1478
case class CaseWhen(
1479
branches: Seq[(Expression, Expression)],
1480
elseValue: Option[Expression] = None) extends Expression {
1481
1482
def children: Seq[Expression] = branches.flatMap { case (condition, value) =>
1483
Seq(condition, value)
1484
} ++ elseValue.toSeq
1485
1486
def dataType: DataType = branches.head._2.dataType
1487
def nullable: Boolean = branches.exists(_._2.nullable) || elseValue.exists(_.nullable)
1488
1489
def eval(input: InternalRow): Any = {
1490
for ((condition, value) <- branches) {
1491
val conditionResult = condition.eval(input)
1492
if (conditionResult != null && conditionResult.asInstanceOf[Boolean]) {
1493
return value.eval(input)
1494
}
1495
}
1496
1497
elseValue match {
1498
case Some(expr) => expr.eval(input)
1499
case None => null
1500
}
1501
}
1502
}
1503
1504
/** Return first non-null expression */
1505
case class Coalesce(children: Seq[Expression]) extends Expression {
1506
require(children.nonEmpty, "Coalesce must have at least one child")
1507
1508
def dataType: DataType = children.head.dataType
1509
def nullable: Boolean = children.forall(_.nullable)
1510
1511
def eval(input: InternalRow): Any = {
1512
for (child <- children) {
1513
val result = child.eval(input)
1514
if (result != null) {
1515
return result
1516
}
1517
}
1518
null
1519
}
1520
}
1521
1522
/** Null if two expressions are equal, otherwise returns first expression */
1523
case class NullIf(left: Expression, right: Expression) extends BinaryExpression {
1524
def dataType: DataType = left.dataType
1525
def nullable: Boolean = true // Always nullable since it can return null
1526
1527
def eval(input: InternalRow): Any = {
1528
val leftResult = left.eval(input)
1529
val rightResult = right.eval(input)
1530
1531
if (leftResult == rightResult) {
1532
null
1533
} else {
1534
leftResult
1535
}
1536
}
1537
}
1538
1539
/** If first expression is null, return second expression, otherwise return first */
1540
case class Nvl(left: Expression, right: Expression) extends BinaryExpression {
1541
def dataType: DataType = left.dataType
1542
def nullable: Boolean = right.nullable
1543
1544
def eval(input: InternalRow): Any = {
1545
val leftResult = left.eval(input)
1546
if (leftResult == null) {
1547
right.eval(input)
1548
} else {
1549
leftResult
1550
}
1551
}
1552
}
1553
1554
/** Three-argument null handling: if first is null return third, otherwise return second */
1555
case class Nvl2(expr1: Expression, expr2: Expression, expr3: Expression) extends TernaryExpression {
1556
def first: Expression = expr1
1557
def second: Expression = expr2
1558
def third: Expression = expr3
1559
1560
def dataType: DataType = expr2.dataType
1561
def nullable: Boolean = expr2.nullable || expr3.nullable
1562
1563
def eval(input: InternalRow): Any = {
1564
val firstResult = expr1.eval(input)
1565
if (firstResult == null) {
1566
expr3.eval(input)
1567
} else {
1568
expr2.eval(input)
1569
}
1570
}
1571
}
1572
```
1573
1574
**Usage Examples:**
1575
1576
```scala
1577
// Conditional expressions
1578
val ageAttr = AttributeReference("age", IntegerType)()
1579
val nameAttr = AttributeReference("name", StringType)()
1580
val statusAttr = AttributeReference("status", StringType)()
1581
1582
// If-then-else
1583
val ageCategory = If(
1584
GreaterThan(ageAttr, Literal(18)),
1585
Literal("Adult"),
1586
Literal("Minor")
1587
)
1588
1589
// Case-when with multiple branches
1590
val riskLevel = CaseWhen(
1591
branches = Seq(
1592
(GreaterThan(ageAttr, Literal(65)), Literal("High Risk")),
1593
(GreaterThan(ageAttr, Literal(40)), Literal("Medium Risk")),
1594
(GreaterThan(ageAttr, Literal(18)), Literal("Low Risk"))
1595
),
1596
elseValue = Some(Literal("No Risk"))
1597
)
1598
1599
// Coalesce - return first non-null
1600
val preferredName = Coalesce(Seq(
1601
AttributeReference("nickname", StringType)(),
1602
AttributeReference("first_name", StringType)(),
1603
Literal("Unknown")
1604
))
1605
1606
// Null handling
1607
val statusOrDefault = Nvl(statusAttr, Literal("Active"))
1608
1609
// Evaluation with sample data
1610
val sampleRow = InternalRow(25, UTF8String.fromString("Alice"), null)
1611
val result1 = ageCategory.eval(sampleRow) // "Adult" (25 > 18)
1612
val result2 = riskLevel.eval(sampleRow) // "Low Risk" (25 > 18 but not > 40)
1613
val result3 = statusOrDefault.eval(sampleRow) // "Active" (status is null)
1614
```
1615
1616
### Aggregate Expressions
1617
1618
Aggregation operations for computing statistics across multiple rows:
1619
1620
```scala { .api }
1621
/**
1622
* Base class for all aggregate expressions
1623
*/
1624
abstract class AggregateExpression extends Expression {
1625
/** Whether this aggregate is distinct */
1626
def isDistinct: Boolean
1627
1628
/** Aggregate function implementation */
1629
def aggregateFunction: AggregateFunction
1630
1631
def dataType: DataType = aggregateFunction.dataType
1632
def nullable: Boolean = aggregateFunction.nullable
1633
def children: Seq[Expression] = aggregateFunction.children
1634
}
1635
1636
/**
1637
* Base trait for aggregate functions
1638
*/
1639
abstract class AggregateFunction extends Expression {
1640
/** Attributes representing the aggregation buffer */
1641
def aggBufferAttributes: Seq[AttributeReference]
1642
1643
/** Initial values for the buffer */
1644
def initialValues: Seq[Expression]
1645
1646
/** Update buffer with new input row */
1647
def updateExpressions: Seq[Expression]
1648
1649
/** Merge two buffers */
1650
def mergeExpressions: Seq[Expression]
1651
1652
/** Extract final result from buffer */
1653
def evaluateExpression: Expression
1654
}
1655
1656
/**
1657
* Declarative aggregate functions using expressions
1658
*/
1659
abstract class DeclarativeAggregate extends AggregateFunction
1660
1661
/** Count aggregate function */
1662
case class Count(children: Seq[Expression]) extends DeclarativeAggregate {
1663
def this(child: Expression) = this(Seq(child))
1664
1665
def dataType: DataType = LongType
1666
def nullable: Boolean = false
1667
1668
// Buffer has single long value for count
1669
lazy val count = AttributeReference("count", LongType, nullable = false)()
1670
def aggBufferAttributes: Seq[AttributeReference] = Seq(count)
1671
1672
// Initialize count to 0
1673
def initialValues: Seq[Expression] = Seq(Literal(0L))
1674
1675
// Increment count for non-null values (or all rows if no children)
1676
def updateExpressions: Seq[Expression] = {
1677
if (children.isEmpty) {
1678
// COUNT(*) - count all rows
1679
Seq(Add(count, Literal(1L)))
1680
} else {
1681
// COUNT(expr) - count non-null values
1682
val condition = children.map(IsNotNull).reduce(And)
1683
Seq(If(condition, Add(count, Literal(1L)), count))
1684
}
1685
}
1686
1687
// Merge counts by adding them
1688
def mergeExpressions: Seq[Expression] = Seq(Add(count.left, count.right))
1689
1690
// Final result is the count value
1691
def evaluateExpression: Expression = count
1692
}
1693
1694
/** Sum aggregate function */
1695
case class Sum(child: Expression) extends DeclarativeAggregate {
1696
def children: Seq[Expression] = Seq(child)
1697
def dataType: DataType = child.dataType
1698
def nullable: Boolean = true
1699
1700
// Buffer has sum and isEmpty flag
1701
lazy val sum = AttributeReference("sum", child.dataType, nullable = true)()
1702
lazy val isEmpty = AttributeReference("isEmpty", BooleanType, nullable = false)()
1703
def aggBufferAttributes: Seq[AttributeReference] = Seq(sum, isEmpty)
1704
1705
// Initialize sum to null, isEmpty to true
1706
def initialValues: Seq[Expression] = Seq(Literal(null, child.dataType), Literal(true))
1707
1708
// Add non-null values to sum
1709
def updateExpressions: Seq[Expression] = Seq(
1710
If(IsNull(child), sum,
1711
If(isEmpty, child, Add(sum, child))),
1712
If(IsNull(child), isEmpty, Literal(false))
1713
)
1714
1715
// Merge sums
1716
def mergeExpressions: Seq[Expression] = Seq(
1717
If(sum.left.isEmpty, sum.right,
1718
If(sum.right.isEmpty, sum.left, Add(sum.left, sum.right))),
1719
And(sum.left.isEmpty, sum.right.isEmpty)
1720
)
1721
1722
// Final result is sum (null if no non-null values)
1723
def evaluateExpression: Expression = If(isEmpty, Literal(null, dataType), sum)
1724
}
1725
1726
/** Maximum aggregate function */
1727
case class Max(child: Expression) extends DeclarativeAggregate {
1728
def children: Seq[Expression] = Seq(child)
1729
def dataType: DataType = child.dataType
1730
def nullable: Boolean = true
1731
1732
lazy val max = AttributeReference("max", child.dataType, nullable = true)()
1733
def aggBufferAttributes: Seq[AttributeReference] = Seq(max)
1734
1735
def initialValues: Seq[Expression] = Seq(Literal(null, child.dataType))
1736
1737
def updateExpressions: Seq[Expression] = Seq(
1738
If(IsNull(child), max,
1739
If(IsNull(max), child,
1740
If(GreaterThan(child, max), child, max)))
1741
)
1742
1743
def mergeExpressions: Seq[Expression] = Seq(
1744
If(IsNull(max.left), max.right,
1745
If(IsNull(max.right), max.left,
1746
If(GreaterThan(max.left, max.right), max.left, max.right)))
1747
)
1748
1749
def evaluateExpression: Expression = max
1750
}
1751
1752
/** Minimum aggregate function */
1753
case class Min(child: Expression) extends DeclarativeAggregate {
1754
def children: Seq[Expression] = Seq(child)
1755
def dataType: DataType = child.dataType
1756
def nullable: Boolean = true
1757
1758
lazy val min = AttributeReference("min", child.dataType, nullable = true)()
1759
def aggBufferAttributes: Seq[AttributeReference] = Seq(min)
1760
1761
def initialValues: Seq[Expression] = Seq(Literal(null, child.dataType))
1762
1763
def updateExpressions: Seq[Expression] = Seq(
1764
If(IsNull(child), min,
1765
If(IsNull(min), child,
1766
If(LessThan(child, min), child, min)))
1767
)
1768
1769
def mergeExpressions: Seq[Expression] = Seq(
1770
If(IsNull(min.left), min.right,
1771
If(IsNull(min.right), min.left,
1772
If(LessThan(min.left, min.right), min.left, min.right)))
1773
)
1774
1775
def evaluateExpression: Expression = min
1776
}
1777
1778
/** Average aggregate function */
1779
case class Average(child: Expression) extends DeclarativeAggregate {
1780
def children: Seq[Expression] = Seq(child)
1781
def dataType: DataType = DoubleType
1782
def nullable: Boolean = true
1783
1784
lazy val sum = AttributeReference("sum", DoubleType, nullable = true)()
1785
lazy val count = AttributeReference("count", LongType, nullable = false)()
1786
def aggBufferAttributes: Seq[AttributeReference] = Seq(sum, count)
1787
1788
def initialValues: Seq[Expression] = Seq(Literal(null, DoubleType), Literal(0L))
1789
1790
def updateExpressions: Seq[Expression] = Seq(
1791
If(IsNull(child), sum,
1792
If(IsNull(sum), Cast(child, DoubleType), Add(sum, Cast(child, DoubleType)))),
1793
If(IsNull(child), count, Add(count, Literal(1L)))
1794
)
1795
1796
def mergeExpressions: Seq[Expression] = Seq(
1797
If(IsNull(sum.left), sum.right,
1798
If(IsNull(sum.right), sum.left, Add(sum.left, sum.right))),
1799
Add(count.left, count.right)
1800
)
1801
1802
def evaluateExpression: Expression =
1803
If(EqualTo(count, Literal(0L)), Literal(null, DoubleType), Divide(sum, Cast(count, DoubleType)))
1804
}
1805
```
1806
1807
**Usage Examples:**
1808
1809
```scala
1810
// Aggregate expression creation
1811
val ageAttr = AttributeReference("age", IntegerType)()
1812
val salaryAttr = AttributeReference("salary", DoubleType)()
1813
1814
val totalCount = Count(Seq.empty) // COUNT(*)
1815
val ageCount = Count(Seq(ageAttr)) // COUNT(age)
1816
val totalSalary = Sum(salaryAttr)
1817
val maxAge = Max(ageAttr)
1818
val minSalary = Min(salaryAttr)
1819
val avgSalary = Average(salaryAttr)
1820
1821
// Usage in aggregation context (conceptual - normally used in query plans)
1822
val aggregates = Seq(
1823
Alias(totalCount, "total_employees")(),
1824
Alias(avgSalary, "avg_salary")(),
1825
Alias(maxAge, "max_age")()
1826
)
1827
```
1828
1829
### Code Generation Framework
1830
1831
Catalyst's code generation system for high-performance expression evaluation:
1832
1833
```scala { .api }
1834
/**
1835
* Context for Java code generation
1836
*/
1837
class CodegenContext {
1838
/** Add object reference and return variable name */
1839
def addReferenceObj(obj: AnyRef): String = {
1840
val objName = s"obj$${references.size}"
1841
references += obj
1842
objName
1843
}
1844
1845
/** Add mutable state variable */
1846
def addMutableState(
1847
javaType: String,
1848
variableName: String,
1849
initCode: String = ""): String = {
1850
val name = if (variableName.nonEmpty) variableName else freshName("mutableState")
1851
mutableStateInitCode += s"$initCode"
1852
mutableStateVars += s"private $javaType $name;"
1853
name
1854
}
1855
1856
/** Generate fresh variable name */
1857
def freshName(name: String): String = {
1858
val count = freshNameIds.getOrElse(name, 0)
1859
freshNameIds(name) = count + 1
1860
s"${name}_$count"
1861
}
1862
1863
/** Add new function to generated class */
1864
def addNewFunction(funcName: String, funcCode: String): String = {
1865
addedFunctions += funcCode
1866
funcName
1867
}
1868
1869
/** Split large expressions into multiple functions */
1870
def splitExpressionsByRows(expressions: Seq[Expression], functionName: String): String = {
1871
// Split expressions to avoid Java method size limits
1872
val splitSize = 1000
1873
val splitExprs = expressions.grouped(splitSize).zipWithIndex.map { case (exprs, index) =>
1874
val subFuncName = s"${functionName}_$index"
1875
val codes = exprs.map(_.genCode(this))
1876
generateSubFunction(subFuncName, codes)
1877
}
1878
splitExprs.mkString("\\n")
1879
}
1880
1881
// Internal state
1882
private val references = mutable.ArrayBuffer[AnyRef]()
1883
private val mutableStateInitCode = mutable.ArrayBuffer[String]()
1884
private val mutableStateVars = mutable.ArrayBuffer[String]()
1885
private val freshNameIds = mutable.HashMap[String, Int]()
1886
private val addedFunctions = mutable.ArrayBuffer[String]()
1887
}
1888
1889
/**
1890
* Generated code for an expression
1891
*/
1892
case class ExprCode(code: String, isNull: String, value: String) {
1893
/** Copy with new null check code */
1894
def copy(code: String = this.code,
1895
isNull: String = this.isNull,
1896
value: String = this.value): ExprCode = {
1897
ExprCode(code, isNull, value)
1898
}
1899
}
1900
1901
/**
1902
* Utility functions for code generation
1903
*/
1904
object CodeGenerator {
1905
/** Generate Java code for expression evaluation */
1906
def generateCode(ctx: CodegenContext, expressions: Seq[Expression]): (String, String, String) = {
1907
val codes = expressions.map(_.genCode(ctx))
1908
1909
val evalCodes = codes.map(_.code).mkString("\\n")
1910
val nullChecks = codes.map(_.isNull).mkString(", ")
1911
val values = codes.map(_.value).mkString(", ")
1912
1913
(evalCodes, nullChecks, values)
1914
}
1915
1916
/** Java type mapping from Catalyst DataType */
1917
def javaType(dataType: DataType): String = dataType match {
1918
case BooleanType => "boolean"
1919
case ByteType => "byte"
1920
case ShortType => "short"
1921
case IntegerType => "int"
1922
case LongType => "long"
1923
case FloatType => "float"
1924
case DoubleType => "double"
1925
case StringType => "UTF8String"
1926
case BinaryType => "byte[]"
1927
case DateType => "int"
1928
case TimestampType => "long"
1929
case _: DecimalType => "Decimal"
1930
case _: ArrayType => "ArrayData"
1931
case _: MapType => "MapData"
1932
case _: StructType => "InternalRow"
1933
case _ => "Object"
1934
}
1935
1936
/** Default value for Java type */
1937
def defaultValue(dataType: DataType): String = dataType match {
1938
case BooleanType => "false"
1939
case ByteType | ShortType | IntegerType => "0"
1940
case LongType => "0L"
1941
case FloatType => "0.0f"
1942
case DoubleType => "0.0"
1943
case _ => "null"
1944
}
1945
}
1946
1947
/**
1948
* Code generation support for expressions
1949
*/
1950
trait CodegenSupport extends Expression {
1951
/** Generate Java code for this expression */
1952
def genCode(ctx: CodegenContext): ExprCode = {
1953
val nullVar = ctx.freshName("isNull")
1954
val valueVar = ctx.freshName("value")
1955
val javaType = CodeGenerator.javaType(dataType)
1956
1957
val code = s"""
1958
|boolean $nullVar = false;
1959
|$javaType $valueVar = ${CodeGenerator.defaultValue(dataType)};
1960
|${doGenCode(ctx, nullVar, valueVar)}
1961
""".stripMargin
1962
1963
ExprCode(code, nullVar, valueVar)
1964
}
1965
1966
/** Subclasses implement this to generate expression-specific code */
1967
protected def doGenCode(ctx: CodegenContext, nullVar: String, valueVar: String): String
1968
}
1969
1970
/**
1971
* Example: Add expression with code generation
1972
*/
1973
case class Add(left: Expression, right: Expression) extends BinaryArithmetic with CodegenSupport {
1974
def symbol: String = "+"
1975
1976
protected def doGenCode(ctx: CodegenContext, nullVar: String, valueVar: String): String = {
1977
val leftGen = left.genCode(ctx)
1978
val rightGen = right.genCode(ctx)
1979
1980
s"""
1981
|${leftGen.code}
1982
|${rightGen.code}
1983
|
1984
|if (${leftGen.isNull} || ${rightGen.isNull}) {
1985
| $nullVar = true;
1986
|} else {
1987
| $valueVar = ${leftGen.value} + ${rightGen.value};
1988
|}
1989
""".stripMargin
1990
}
1991
1992
// Fallback interpreted evaluation
1993
def nullSafeEval(input1: Any, input2: Any): Any = {
1994
dataType match {
1995
case IntegerType => input1.asInstanceOf[Int] + input2.asInstanceOf[Int]
1996
case LongType => input1.asInstanceOf[Long] + input2.asInstanceOf[Long]
1997
case DoubleType => input1.asInstanceOf[Double] + input2.asInstanceOf[Double]
1998
// ... other numeric types
1999
}
2000
}
2001
}
2002
```
2003
2004
**Usage Examples:**
2005
2006
```scala
2007
// Code generation context
2008
val ctx = new CodegenContext()
2009
2010
// Expression for code generation
2011
val ageAttr = AttributeReference("age", IntegerType)()
2012
val expr = Add(ageAttr, Literal(10))
2013
2014
// Generate code
2015
val generated = expr.genCode(ctx)
2016
println(generated.code)
2017
// Output: Java code for evaluating age + 10
2018
2019
// Generated code structure:
2020
/*
2021
boolean isNull_0 = false;
2022
int value_0 = 0;
2023
2024
// Load age from input row
2025
boolean isNull_1 = input.isNullAt(0);
2026
int value_1 = isNull_1 ? 0 : input.getInt(0);
2027
2028
// Literal 10
2029
boolean isNull_2 = false;
2030
int value_2 = 10;
2031
2032
// Add operation
2033
if (isNull_1 || isNull_2) {
2034
isNull_0 = true;
2035
} else {
2036
value_0 = value_1 + value_2;
2037
}
2038
*/
2039
2040
// Performance: Generated code is much faster than interpretation
2041
// - No virtual method calls
2042
// - No boxing/unboxing
2043
// - Direct memory access
2044
// - JIT-friendly code patterns
2045
```
2046
2047
## Summary
2048
2049
The Catalyst expression system provides a powerful and extensible framework for representing and evaluating computations in Spark SQL. Key capabilities include:
2050
2051
**Expression Hierarchy**: Base Expression class with specialized traits for different patterns (Unary, Binary, Ternary, etc.)
2052
2053
**Type Safety**: Strong typing with DataType system integration and comprehensive type checking
2054
2055
**Evaluation Modes**: Both interpreted evaluation (eval) and high-performance code generation (genCode)
2056
2057
**Rich Function Library**: Comprehensive set of built-in operations for arithmetic, comparisons, string manipulation, date/time operations, complex types, aggregation, and more
2058
2059
**Named Expressions**: Support for aliasing and attribute references enabling query planning and optimization
2060
2061
**Conditional Logic**: Full support for if-then-else, case-when, and null handling operations
2062
2063
**Code Generation**: Advanced code generation framework that produces optimized Java code for high-performance execution
2064
2065
**Extensibility**: Clean interfaces for adding custom expressions and functions
2066
2067
This expression system forms the foundation for all SQL operations in Spark, enabling both correctness through strong typing and performance through code generation optimization.