or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

analysis.mdcode-generation.mddata-types.mdexpressions.mdindex.mdoptimization.mdparsing.mdquery-plans.mdutilities.md

expressions.mddocs/

0

# Expression System and Evaluation

1

2

Catalyst's expression system provides a comprehensive framework for representing computations, predicates, and data transformations. Expressions form the foundation of SQL operations and can be evaluated both interpretively and through code generation.

3

4

## Capabilities

5

6

### Expression Base Class

7

8

Core interface for all expressions in Catalyst with evaluation and code generation capabilities.

9

10

```scala { .api }

11

/**

12

* Base class for all expressions in Catalyst

13

* Expressions represent computations that can be evaluated on input data

14

*/

15

abstract class Expression extends TreeNode[Expression] {

16

/** Data type of the expression result */

17

def dataType: DataType

18

19

/** Whether expression can produce null values */

20

def nullable: Boolean

21

22

/** Whether expression can be constant folded */

23

def foldable: Boolean

24

25

/** Whether expression is deterministic (same input -> same output) */

26

def deterministic: Boolean = true

27

28

/** Set of attributes referenced by this expression */

29

def references: AttributeSet

30

31

/** Evaluate expression on input row (interpreted mode) */

32

def eval(input: InternalRow = null): Any

33

34

/** Generate Java code for expression (code generation mode) */

35

def genCode(ctx: CodegenContext): ExprCode

36

37

/** Validate input data types */

38

def checkInputDataTypes(): TypeCheckResult

39

40

/** Human-readable name for expression */

41

def prettyName: String

42

43

/** String representation for SQL */

44

def sql: String

45

46

/** Clone expression with new children */

47

def withNewChildren(newChildren: Seq[Expression]): Expression

48

}

49

```

50

51

**Usage Examples:**

52

53

```scala

54

import org.apache.spark.sql.catalyst.expressions._

55

import org.apache.spark.sql.types._

56

57

// Create literal expression

58

val literal = Literal(42, IntegerType)

59

println(literal.dataType) // IntegerType

60

println(literal.nullable) // false

61

println(literal.foldable) // true

62

println(literal.deterministic) // true

63

64

// Evaluate expression

65

val result = literal.eval() // Returns 42

66

67

// Expression properties

68

println(literal.prettyName) // "42"

69

println(literal.sql) // "42"

70

```

71

72

### Expression Traits and Hierarchy

73

74

Specialized traits for different expression patterns:

75

76

```scala { .api }

77

/** Expression with no child expressions */

78

trait LeafExpression extends Expression {

79

def children: Seq[Expression] = Nil

80

}

81

82

/** Expression with one child expression */

83

trait UnaryExpression extends Expression {

84

def child: Expression

85

def children: Seq[Expression] = child :: Nil

86

}

87

88

/** Expression with two child expressions */

89

trait BinaryExpression extends Expression {

90

def left: Expression

91

def right: Expression

92

def children: Seq[Expression] = Seq(left, right)

93

}

94

95

/** Expression with three child expressions */

96

trait TernaryExpression extends Expression {

97

def first: Expression

98

def second: Expression

99

def third: Expression

100

def children: Seq[Expression] = Seq(first, second, third)

101

}

102

103

/** Special case of BinaryExpression requiring same data type for both children */

104

trait BinaryOperator extends BinaryExpression {

105

def dataType: DataType = left.dataType

106

}

107

108

/** Marker for expressions that are not deterministic */

109

trait Nondeterministic extends Expression {

110

final override def deterministic: Boolean = false

111

112

/** Initialize per-partition state */

113

def initializeInternal(partitionIndex: Int): Unit

114

115

/** Evaluate with nondeterministic behavior */

116

def evalInternal(input: InternalRow): Any

117

}

118

119

/** Expression that should not be evaluated (analysis-time only) */

120

trait Unevaluable extends Expression {

121

final override def eval(input: InternalRow = null): Any =

122

throw new UnsupportedOperationException(s"Cannot evaluate expression: $this")

123

}

124

125

/** Expression that falls back to interpreted mode (no code generation) */

126

trait CodegenFallback extends Expression

127

128

/** Expression that cannot be expressed in SQL */

129

trait NonSQLExpression extends Expression

130

131

/** Expression replaced by another expression at runtime */

132

trait RuntimeReplaceable extends UnaryExpression with Unevaluable {

133

def replacement: Expression

134

}

135

```

136

137

### Named Expressions

138

139

Expressions that have names and can be referenced in query plans:

140

141

```scala { .api }

142

/**

143

* Expression that has a name and can be referenced

144

*/

145

trait NamedExpression extends Expression {

146

/** Expression name */

147

def name: String

148

149

/** Unique expression ID */

150

def exprId: ExprId

151

152

/** Qualification path (e.g., table name) */

153

def qualifiers: Seq[String]

154

155

/** Convert to attribute reference */

156

def toAttribute: Attribute

157

158

/** Create new instance with same name but different ID */

159

def newInstance(): NamedExpression

160

161

/** Create copy with new qualifiers */

162

def withQualifier(newQualifier: Seq[String]): NamedExpression

163

164

/** Metadata associated with this expression */

165

def metadata: Metadata

166

}

167

168

/**

169

* Reference to an attribute/column

170

*/

171

abstract class Attribute extends LeafExpression with NamedExpression with Unevaluable {

172

/** Create copy with new name */

173

def withName(newName: String): Attribute

174

175

/** Create copy with new qualifier */

176

def withQualifier(newQualifier: Seq[String]): Attribute

177

178

/** Create copy with new expression ID */

179

def withExprId(newExprId: ExprId): Attribute

180

181

/** Create copy with new data type */

182

def withDataType(newType: DataType): Attribute

183

184

/** Create copy with new nullability */

185

def withNullability(newNullability: Boolean): Attribute

186

187

/** Create copy with new metadata */

188

def withMetadata(newMetadata: Metadata): Attribute

189

}

190

191

/**

192

* Reference to an attribute with name, data type, and nullability

193

*/

194

case class AttributeReference(

195

name: String,

196

dataType: DataType,

197

nullable: Boolean = true,

198

metadata: Metadata = Metadata.empty)(

199

val exprId: ExprId = NamedExpression.newExprId,

200

val qualifier: Seq[String] = Seq.empty[String]) extends Attribute {

201

202

def qualifiers: Seq[String] = qualifier

203

204

// Implementation of withX methods for creating copies

205

def withName(newName: String): AttributeReference = copy(name = newName)(exprId, qualifier)

206

def withQualifier(newQualifier: Seq[String]): AttributeReference = copy()(exprId, newQualifier)

207

def withExprId(newExprId: ExprId): AttributeReference = copy()(newExprId, qualifier)

208

def withDataType(newType: DataType): AttributeReference = copy(dataType = newType)(exprId, qualifier)

209

def withNullability(newNullability: Boolean): AttributeReference = copy(nullable = newNullability)(exprId, qualifier)

210

def withMetadata(newMetadata: Metadata): AttributeReference = copy(metadata = newMetadata)(exprId, qualifier)

211

212

def newInstance(): AttributeReference = copy()(NamedExpression.newExprId, qualifier)

213

def toAttribute: AttributeReference = this

214

}

215

216

/**

217

* Expression with an alias name

218

*/

219

case class Alias(

220

child: Expression,

221

name: String)(

222

val exprId: ExprId = NamedExpression.newExprId,

223

val qualifier: Seq[String] = Seq.empty[String],

224

val explicitMetadata: Option[Metadata] = None) extends UnaryExpression with NamedExpression {

225

226

def qualifiers: Seq[String] = qualifier

227

def dataType: DataType = child.dataType

228

def nullable: Boolean = child.nullable

229

def metadata: Metadata = explicitMetadata.getOrElse(Metadata.empty)

230

231

def newInstance(): NamedExpression = copy()(NamedExpression.newExprId, qualifier, explicitMetadata)

232

def toAttribute: Attribute = AttributeReference(name, dataType, nullable, metadata)(exprId, qualifier)

233

}

234

```

235

236

**Usage Examples:**

237

238

```scala

239

// Attribute reference creation

240

val nameAttr = AttributeReference("name", StringType, nullable = false)()

241

val ageAttr = AttributeReference("age", IntegerType, nullable = false)()

242

243

// Qualified attribute

244

val qualifiedAttr = AttributeReference("id", LongType)(

245

exprId = NamedExpression.newExprId,

246

qualifier = Seq("users")

247

)

248

249

// Alias creation

250

val ageAlias = Alias(ageAttr, "user_age")()

251

val computedAlias = Alias(Add(ageAttr, Literal(1)), "age_plus_one")()

252

253

// Attribute properties

254

println(nameAttr.name) // "name"

255

println(nameAttr.dataType) // StringType

256

println(nameAttr.nullable) // false

257

println(nameAttr.exprId) // Unique ExprId

258

259

// Create copies with modifications

260

val nullableNameAttr = nameAttr.withNullability(true)

261

val renamedAttr = nameAttr.withName("full_name")

262

```

263

264

### Literals and Constants

265

266

Constant value expressions for representing literal data:

267

268

```scala { .api }

269

/**

270

* Constant literal value expression

271

*/

272

case class Literal(value: Any, dataType: DataType) extends LeafExpression {

273

def nullable: Boolean = value == null

274

def foldable: Boolean = true

275

276

def eval(input: InternalRow = null): Any = value

277

278

override def toString: String = if (value == null) "null" else value.toString

279

}

280

281

object Literal {

282

/** Create literal with inferred type */

283

def apply(v: Any): Literal = v match {

284

case null => Literal(null, NullType)

285

case b: Boolean => Literal(b, BooleanType)

286

case b: Byte => Literal(b, ByteType)

287

case s: Short => Literal(s, ShortType)

288

case i: Int => Literal(i, IntegerType)

289

case l: Long => Literal(l, LongType)

290

case f: Float => Literal(f, FloatType)

291

case d: Double => Literal(d, DoubleType)

292

case s: String => Literal(s, StringType)

293

case d: java.math.BigDecimal => Literal(d, DecimalType.fromBigDecimal(d))

294

case d: java.sql.Date => Literal(d, DateType)

295

case t: java.sql.Timestamp => Literal(t, TimestampType)

296

case a: Array[Byte] => Literal(a, BinaryType)

297

case _ => throw new RuntimeException(s"Unsupported literal type: ${v.getClass}")

298

}

299

300

/** Create literal with explicit type */

301

def create(v: Any, dataType: DataType): Literal = Literal(v, dataType)

302

303

/** Create from Scala object with proper internal representation */

304

def fromObject(obj: Any): Literal = apply(obj)

305

306

/** Create default value for data type */

307

def default(dataType: DataType): Literal = dataType match {

308

case BooleanType => Literal(false)

309

case ByteType => Literal(0.toByte)

310

case ShortType => Literal(0.toShort)

311

case IntegerType => Literal(0)

312

case LongType => Literal(0L)

313

case FloatType => Literal(0.0f)

314

case DoubleType => Literal(0.0)

315

case StringType => Literal("")

316

case _ => Literal(null, dataType)

317

}

318

}

319

```

320

321

**Usage Examples:**

322

323

```scala

324

// Literal creation with type inference

325

val intLit = Literal(42) // Literal(42, IntegerType)

326

val stringLit = Literal("hello") // Literal("hello", StringType)

327

val nullLit = Literal(null) // Literal(null, NullType)

328

329

// Explicit type creation

330

val decimalLit = Literal.create(new java.math.BigDecimal("123.45"), DecimalType(5, 2))

331

332

// Default values

333

val defaultInt = Literal.default(IntegerType) // Literal(0, IntegerType)

334

val defaultString = Literal.default(StringType) // Literal("", StringType)

335

336

// Evaluation

337

val result = intLit.eval() // Returns 42

338

println(intLit.foldable) // true (can be constant folded)

339

```

340

341

### Predicates and Comparisons

342

343

Boolean expressions for filtering and conditional logic:

344

345

```scala { .api }

346

/**

347

* Base class for boolean expressions/predicates

348

*/

349

abstract class Predicate extends Expression {

350

def dataType: DataType = BooleanType

351

}

352

353

/**

354

* Base class for binary comparison expressions

355

*/

356

abstract class Comparison extends BinaryExpression with Predicate

357

358

/** Equality comparison (=) */

359

case class EqualTo(left: Expression, right: Expression) extends BinaryOperator with Comparison {

360

def symbol: String = "="

361

362

def nullSafeEval(input1: Any, input2: Any): Any = input1 == input2

363

}

364

365

/** Null-safe equality comparison (<=>) */

366

case class EqualNullSafe(left: Expression, right: Expression) extends BinaryOperator with Comparison {

367

def symbol: String = "<=>"

368

override def nullable: Boolean = false

369

370

def nullSafeEval(input1: Any, input2: Any): Any = {

371

if (input1 == null && input2 == null) true

372

else if (input1 == null || input2 == null) false

373

else input1 == input2

374

}

375

}

376

377

/** Greater than comparison (>) */

378

case class GreaterThan(left: Expression, right: Expression) extends BinaryOperator with Comparison {

379

def symbol: String = ">"

380

381

def nullSafeEval(input1: Any, input2: Any): Any = {

382

RowOrdering.compare(input1, input2, left.dataType) > 0

383

}

384

}

385

386

/** Greater than or equal comparison (>=) */

387

case class GreaterThanOrEqual(left: Expression, right: Expression) extends BinaryOperator with Comparison {

388

def symbol: String = ">="

389

390

def nullSafeEval(input1: Any, input2: Any): Any = {

391

RowOrdering.compare(input1, input2, left.dataType) >= 0

392

}

393

}

394

395

/** Less than comparison (<) */

396

case class LessThan(left: Expression, right: Expression) extends BinaryOperator with Comparison {

397

def symbol: String = "<"

398

399

def nullSafeEval(input1: Any, input2: Any): Any = {

400

RowOrdering.compare(input1, input2, left.dataType) < 0

401

}

402

}

403

404

/** Less than or equal comparison (<=) */

405

case class LessThanOrEqual(left: Expression, right: Expression) extends BinaryOperator with Comparison {

406

def symbol: String = "<="

407

408

def nullSafeEval(input1: Any, input2: Any): Any = {

409

RowOrdering.compare(input1, input2, left.dataType) <= 0

410

}

411

}

412

413

/** Logical AND operation */

414

case class And(left: Expression, right: Expression) extends BinaryOperator with Predicate {

415

def symbol: String = "&&"

416

417

def eval(input: InternalRow): Any = {

418

val leftResult = left.eval(input)

419

if (leftResult == false) {

420

false

421

} else {

422

val rightResult = right.eval(input)

423

if (rightResult == false) false

424

else if (leftResult == null || rightResult == null) null

425

else true

426

}

427

}

428

}

429

430

/** Logical OR operation */

431

case class Or(left: Expression, right: Expression) extends BinaryOperator with Predicate {

432

def symbol: String = "||"

433

434

def eval(input: InternalRow): Any = {

435

val leftResult = left.eval(input)

436

if (leftResult == true) {

437

true

438

} else {

439

val rightResult = right.eval(input)

440

if (rightResult == true) true

441

else if (leftResult == null || rightResult == null) null

442

else false

443

}

444

}

445

}

446

447

/** Logical NOT operation */

448

case class Not(child: Expression) extends UnaryExpression with Predicate {

449

def dataType: DataType = BooleanType

450

451

def eval(input: InternalRow): Any = {

452

val childResult = child.eval(input)

453

if (childResult == null) null

454

else !childResult.asInstanceOf[Boolean]

455

}

456

}

457

458

/** IS NULL predicate */

459

case class IsNull(child: Expression) extends UnaryExpression with Predicate {

460

override def nullable: Boolean = false

461

462

def eval(input: InternalRow): Any = child.eval(input) == null

463

}

464

465

/** IS NOT NULL predicate */

466

case class IsNotNull(child: Expression) extends UnaryExpression with Predicate {

467

override def nullable: Boolean = false

468

469

def eval(input: InternalRow): Any = child.eval(input) != null

470

}

471

472

/** IN predicate (value IN (list)) */

473

case class In(value: Expression, list: Seq[Expression]) extends Predicate {

474

def children: Seq[Expression] = value +: list

475

def nullable: Boolean = children.exists(_.nullable)

476

477

def eval(input: InternalRow): Any = {

478

val evaluatedValue = value.eval(input)

479

if (evaluatedValue == null) {

480

null

481

} else {

482

list.exists { item =>

483

val itemValue = item.eval(input)

484

itemValue != null && evaluatedValue == itemValue

485

}

486

}

487

}

488

}

489

```

490

491

**Usage Examples:**

492

493

```scala

494

// Comparison expressions

495

val ageAttr = AttributeReference("age", IntegerType)()

496

val nameAttr = AttributeReference("name", StringType)()

497

498

val ageFilter = GreaterThan(ageAttr, Literal(25))

499

val nameFilter = EqualTo(nameAttr, Literal("Alice"))

500

501

// Logical operations

502

val combinedFilter = And(ageFilter, nameFilter)

503

val eitherFilter = Or(ageFilter, nameFilter)

504

val notFilter = Not(ageFilter)

505

506

// Null checks

507

val nullCheck = IsNull(nameAttr)

508

val notNullCheck = IsNotNull(nameAttr)

509

510

// IN predicate

511

val statusIn = In(

512

AttributeReference("status", StringType)(),

513

Seq(Literal("active"), Literal("pending"), Literal("verified"))

514

)

515

516

// Expression evaluation with sample data

517

val sampleRow = InternalRow(30, "Alice", "active")

518

val result1 = ageFilter.eval(sampleRow) // true (30 > 25)

519

val result2 = nameFilter.eval(sampleRow) // true ("Alice" == "Alice")

520

val result3 = combinedFilter.eval(sampleRow) // true (both conditions true)

521

```

522

523

### Arithmetic Expressions

524

525

Mathematical operations and numeric computations:

526

527

```scala { .api }

528

/**

529

* Base trait for binary arithmetic expressions

530

*/

531

trait BinaryArithmetic extends BinaryOperator {

532

/** Symbol for the arithmetic operation */

533

def symbol: String

534

535

/** Null-safe evaluation of the arithmetic operation */

536

def nullSafeEval(input1: Any, input2: Any): Any

537

538

def eval(input: InternalRow): Any = {

539

val leftResult = left.eval(input)

540

if (leftResult == null) {

541

null

542

} else {

543

val rightResult = right.eval(input)

544

if (rightResult == null) null

545

else nullSafeEval(leftResult, rightResult)

546

}

547

}

548

}

549

550

/** Addition operation (+) */

551

case class Add(left: Expression, right: Expression) extends BinaryArithmetic {

552

def symbol: String = "+"

553

554

def nullSafeEval(input1: Any, input2: Any): Any = dataType match {

555

case ByteType => input1.asInstanceOf[Byte] + input2.asInstanceOf[Byte]

556

case ShortType => input1.asInstanceOf[Short] + input2.asInstanceOf[Short]

557

case IntegerType => input1.asInstanceOf[Int] + input2.asInstanceOf[Int]

558

case LongType => input1.asInstanceOf[Long] + input2.asInstanceOf[Long]

559

case FloatType => input1.asInstanceOf[Float] + input2.asInstanceOf[Float]

560

case DoubleType => input1.asInstanceOf[Double] + input2.asInstanceOf[Double]

561

case DecimalType() => input1.asInstanceOf[Decimal] + input2.asInstanceOf[Decimal]

562

}

563

}

564

565

/** Subtraction operation (-) */

566

case class Subtract(left: Expression, right: Expression) extends BinaryArithmetic {

567

def symbol: String = "-"

568

569

def nullSafeEval(input1: Any, input2: Any): Any = dataType match {

570

case ByteType => input1.asInstanceOf[Byte] - input2.asInstanceOf[Byte]

571

case ShortType => input1.asInstanceOf[Short] - input2.asInstanceOf[Short]

572

case IntegerType => input1.asInstanceOf[Int] - input2.asInstanceOf[Int]

573

case LongType => input1.asInstanceOf[Long] - input2.asInstanceOf[Long]

574

case FloatType => input1.asInstanceOf[Float] - input2.asInstanceOf[Float]

575

case DoubleType => input1.asInstanceOf[Double] - input2.asInstanceOf[Double]

576

case DecimalType() => input1.asInstanceOf[Decimal] - input2.asInstanceOf[Decimal]

577

}

578

}

579

580

/** Multiplication operation (*) */

581

case class Multiply(left: Expression, right: Expression) extends BinaryArithmetic {

582

def symbol: String = "*"

583

584

def nullSafeEval(input1: Any, input2: Any): Any = dataType match {

585

case ByteType => input1.asInstanceOf[Byte] * input2.asInstanceOf[Byte]

586

case ShortType => input1.asInstanceOf[Short] * input2.asInstanceOf[Short]

587

case IntegerType => input1.asInstanceOf[Int] * input2.asInstanceOf[Int]

588

case LongType => input1.asInstanceOf[Long] * input2.asInstanceOf[Long]

589

case FloatType => input1.asInstanceOf[Float] * input2.asInstanceOf[Float]

590

case DoubleType => input1.asInstanceOf[Double] * input2.asInstanceOf[Double]

591

case DecimalType() => input1.asInstanceOf[Decimal] * input2.asInstanceOf[Decimal]

592

}

593

}

594

595

/** Division operation (/) */

596

case class Divide(left: Expression, right: Expression) extends BinaryArithmetic {

597

def symbol: String = "/"

598

599

def nullSafeEval(input1: Any, input2: Any): Any = dataType match {

600

case DoubleType => input1.asInstanceOf[Double] / input2.asInstanceOf[Double]

601

case DecimalType() => input1.asInstanceOf[Decimal] / input2.asInstanceOf[Decimal]

602

case _ => sys.error(s"Type $dataType not supported.")

603

}

604

}

605

606

/** Modulo/remainder operation (%) */

607

case class Remainder(left: Expression, right: Expression) extends BinaryArithmetic {

608

def symbol: String = "%"

609

610

def nullSafeEval(input1: Any, input2: Any): Any = dataType match {

611

case ByteType => input1.asInstanceOf[Byte] % input2.asInstanceOf[Byte]

612

case ShortType => input1.asInstanceOf[Short] % input2.asInstanceOf[Short]

613

case IntegerType => input1.asInstanceOf[Int] % input2.asInstanceOf[Int]

614

case LongType => input1.asInstanceOf[Long] % input2.asInstanceOf[Long]

615

case FloatType => input1.asInstanceOf[Float] % input2.asInstanceOf[Float]

616

case DoubleType => input1.asInstanceOf[Double] % input2.asInstanceOf[Double]

617

case DecimalType() => input1.asInstanceOf[Decimal] % input2.asInstanceOf[Decimal]

618

}

619

}

620

621

/** Unary minus operation (-) */

622

case class UnaryMinus(child: Expression) extends UnaryExpression {

623

def dataType: DataType = child.dataType

624

def nullable: Boolean = child.nullable

625

626

def eval(input: InternalRow): Any = {

627

val childResult = child.eval(input)

628

if (childResult == null) {

629

null

630

} else {

631

dataType match {

632

case ByteType => -childResult.asInstanceOf[Byte]

633

case ShortType => -childResult.asInstanceOf[Short]

634

case IntegerType => -childResult.asInstanceOf[Int]

635

case LongType => -childResult.asInstanceOf[Long]

636

case FloatType => -childResult.asInstanceOf[Float]

637

case DoubleType => -childResult.asInstanceOf[Double]

638

case DecimalType() => -childResult.asInstanceOf[Decimal]

639

}

640

}

641

}

642

}

643

644

/** Unary plus operation (+) */

645

case class UnaryPositive(child: Expression) extends UnaryExpression {

646

def dataType: DataType = child.dataType

647

def nullable: Boolean = child.nullable

648

649

def eval(input: InternalRow): Any = child.eval(input)

650

}

651

652

/** Absolute value function */

653

case class Abs(child: Expression) extends UnaryExpression {

654

def dataType: DataType = child.dataType

655

def nullable: Boolean = child.nullable

656

657

def eval(input: InternalRow): Any = {

658

val childResult = child.eval(input)

659

if (childResult == null) {

660

null

661

} else {

662

dataType match {

663

case ByteType => math.abs(childResult.asInstanceOf[Byte])

664

case ShortType => math.abs(childResult.asInstanceOf[Short])

665

case IntegerType => math.abs(childResult.asInstanceOf[Int])

666

case LongType => math.abs(childResult.asInstanceOf[Long])

667

case FloatType => math.abs(childResult.asInstanceOf[Float])

668

case DoubleType => math.abs(childResult.asInstanceOf[Double])

669

case DecimalType() => childResult.asInstanceOf[Decimal].abs

670

}

671

}

672

}

673

}

674

```

675

676

**Usage Examples:**

677

678

```scala

679

// Arithmetic expression creation

680

val ageAttr = AttributeReference("age", IntegerType)()

681

val salaryAttr = AttributeReference("salary", DoubleType)()

682

683

val agePlus10 = Add(ageAttr, Literal(10))

684

val salaryMinus1000 = Subtract(salaryAttr, Literal(1000.0))

685

val doubleSalary = Multiply(salaryAttr, Literal(2.0))

686

val halfSalary = Divide(salaryAttr, Literal(2.0))

687

688

// Unary operations

689

val negativeAge = UnaryMinus(ageAttr)

690

val absoluteValue = Abs(UnaryMinus(salaryAttr))

691

692

// Evaluation with sample data

693

val row = InternalRow(25, 50000.0)

694

val result1 = agePlus10.eval(row) // 35

695

val result2 = doubleSalary.eval(row) // 100000.0

696

val result3 = negativeAge.eval(row) // -25

697

```

698

699

### String Expressions

700

701

String manipulation and text processing operations:

702

703

```scala { .api }

704

/** String substring operation */

705

case class Substring(str: Expression, pos: Expression, len: Expression) extends TernaryExpression {

706

def dataType: DataType = StringType

707

def nullable: Boolean = str.nullable || pos.nullable || len.nullable

708

709

def first: Expression = str

710

def second: Expression = pos

711

def third: Expression = len

712

713

def eval(input: InternalRow): Any = {

714

val string = str.eval(input)

715

if (string == null) return null

716

717

val position = pos.eval(input)

718

if (position == null) return null

719

720

val length = len.eval(input)

721

if (length == null) return null

722

723

val s = string.asInstanceOf[UTF8String]

724

val start = position.asInstanceOf[Int] - 1 // 1-based to 0-based

725

val len = length.asInstanceOf[Int]

726

727

if (start < 0 || len < 0) UTF8String.EMPTY_UTF8

728

else s.substring(start, start + len)

729

}

730

}

731

732

/** String length function */

733

case class Length(child: Expression) extends UnaryExpression {

734

def dataType: DataType = IntegerType

735

def nullable: Boolean = child.nullable

736

737

def eval(input: InternalRow): Any = {

738

val childResult = child.eval(input)

739

if (childResult == null) null

740

else childResult.asInstanceOf[UTF8String].numChars()

741

}

742

}

743

744

/** Convert string to uppercase */

745

case class Upper(child: Expression) extends UnaryExpression {

746

def dataType: DataType = StringType

747

def nullable: Boolean = child.nullable

748

749

def eval(input: InternalRow): Any = {

750

val childResult = child.eval(input)

751

if (childResult == null) null

752

else childResult.asInstanceOf[UTF8String].toUpperCase

753

}

754

}

755

756

/** Convert string to lowercase */

757

case class Lower(child: Expression) extends UnaryExpression {

758

def dataType: DataType = StringType

759

def nullable: Boolean = child.nullable

760

761

def eval(input: InternalRow): Any = {

762

val childResult = child.eval(input)

763

if (childResult == null) null

764

else childResult.asInstanceOf[UTF8String].toLowerCase

765

}

766

}

767

768

/** String concatenation */

769

case class Concat(children: Seq[Expression]) extends Expression {

770

def dataType: DataType = StringType

771

def nullable: Boolean = children.exists(_.nullable)

772

773

def eval(input: InternalRow): Any = {

774

val inputs = children.map(_.eval(input))

775

if (inputs.contains(null)) return null

776

777

val strings = inputs.map(_.asInstanceOf[UTF8String])

778

UTF8String.concat(strings: _*)

779

}

780

}

781

782

/** String trimming (remove leading/trailing whitespace) */

783

case class StringTrim(child: Expression) extends UnaryExpression {

784

def dataType: DataType = StringType

785

def nullable: Boolean = child.nullable

786

787

def eval(input: InternalRow): Any = {

788

val childResult = child.eval(input)

789

if (childResult == null) null

790

else childResult.asInstanceOf[UTF8String].trim()

791

}

792

}

793

794

/** Left trim (remove leading whitespace) */

795

case class StringLTrim(child: Expression) extends UnaryExpression {

796

def dataType: DataType = StringType

797

def nullable: Boolean = child.nullable

798

799

def eval(input: InternalRow): Any = {

800

val childResult = child.eval(input)

801

if (childResult == null) null

802

else childResult.asInstanceOf[UTF8String].trimLeft()

803

}

804

}

805

806

/** Right trim (remove trailing whitespace) */

807

case class StringRTrim(child: Expression) extends UnaryExpression {

808

def dataType: DataType = StringType

809

def nullable: Boolean = child.nullable

810

811

def eval(input: InternalRow): Any = {

812

val childResult = child.eval(input)

813

if (childResult == null) null

814

else childResult.asInstanceOf[UTF8String].trimRight()

815

}

816

}

817

```

818

819

**Usage Examples:**

820

821

```scala

822

// String expression creation

823

val nameAttr = AttributeReference("name", StringType)()

824

val messageAttr = AttributeReference("message", StringType)()

825

826

val upperName = Upper(nameAttr)

827

val lowerName = Lower(nameAttr)

828

val nameLength = Length(nameAttr)

829

val substring = Substring(nameAttr, Literal(1), Literal(3)) // First 3 characters

830

831

// String concatenation

832

val greeting = Concat(Seq(Literal("Hello "), nameAttr, Literal("!")))

833

834

// String trimming

835

val trimmed = StringTrim(messageAttr)

836

837

// Evaluation with sample data

838

val row = InternalRow(UTF8String.fromString(" Alice "), UTF8String.fromString(" Hello World "))

839

val result1 = upperName.eval(row) // " ALICE "

840

val result2 = nameLength.eval(row) // 8 (including spaces)

841

val result3 = trimmed.eval(row) // "Hello World" (spaces removed)

842

```

843

844

### Mathematical Expressions

845

846

Mathematical functions and operations:

847

848

```scala { .api }

849

/**

850

* Base trait for unary mathematical expressions

851

*/

852

trait UnaryMathExpression extends UnaryExpression {

853

def dataType: DataType = DoubleType

854

def nullable: Boolean = child.nullable

855

856

/** The mathematical function to apply */

857

def mathFunction(input: Double): Double

858

859

def eval(input: InternalRow): Any = {

860

val childResult = child.eval(input)

861

if (childResult == null) {

862

null

863

} else {

864

val doubleValue = childResult match {

865

case d: Double => d

866

case f: Float => f.toDouble

867

case i: Int => i.toDouble

868

case l: Long => l.toDouble

869

case _ => sys.error(s"Unsupported data type: ${child.dataType}")

870

}

871

mathFunction(doubleValue)

872

}

873

}

874

}

875

876

/** Sine function */

877

case class Sin(child: Expression) extends UnaryMathExpression {

878

def mathFunction(input: Double): Double = math.sin(input)

879

override def prettyName: String = "sin"

880

}

881

882

/** Cosine function */

883

case class Cos(child: Expression) extends UnaryMathExpression {

884

def mathFunction(input: Double): Double = math.cos(input)

885

override def prettyName: String = "cos"

886

}

887

888

/** Tangent function */

889

case class Tan(child: Expression) extends UnaryMathExpression {

890

def mathFunction(input: Double): Double = math.tan(input)

891

override def prettyName: String = "tan"

892

}

893

894

/** Square root function */

895

case class Sqrt(child: Expression) extends UnaryMathExpression {

896

def mathFunction(input: Double): Double = math.sqrt(input)

897

override def prettyName: String = "sqrt"

898

}

899

900

/** Natural logarithm function */

901

case class Log(child: Expression) extends UnaryMathExpression {

902

def mathFunction(input: Double): Double = math.log(input)

903

override def prettyName: String = "ln"

904

}

905

906

/** Exponential function (e^x) */

907

case class Exp(child: Expression) extends UnaryMathExpression {

908

def mathFunction(input: Double): Double = math.exp(input)

909

override def prettyName: String = "exp"

910

}

911

912

/** Floor function */

913

case class Floor(child: Expression) extends UnaryMathExpression {

914

def mathFunction(input: Double): Double = math.floor(input)

915

override def prettyName: String = "floor"

916

}

917

918

/** Ceiling function */

919

case class Ceil(child: Expression) extends UnaryMathExpression {

920

def mathFunction(input: Double): Double = math.ceil(input)

921

override def prettyName: String = "ceil"

922

}

923

924

/**

925

* Base trait for binary mathematical expressions

926

*/

927

trait BinaryMathExpression extends BinaryExpression {

928

def dataType: DataType = DoubleType

929

def nullable: Boolean = left.nullable || right.nullable

930

931

/** The mathematical function to apply */

932

def mathFunction(left: Double, right: Double): Double

933

934

def eval(input: InternalRow): Any = {

935

val leftResult = left.eval(input)

936

if (leftResult == null) return null

937

938

val rightResult = right.eval(input)

939

if (rightResult == null) return null

940

941

val leftDouble = leftResult.asInstanceOf[Double]

942

val rightDouble = rightResult.asInstanceOf[Double]

943

mathFunction(leftDouble, rightDouble)

944

}

945

}

946

947

/** Round function with precision */

948

case class Round(child: Expression, scale: Expression) extends BinaryExpression {

949

def left: Expression = child

950

def right: Expression = scale

951

def dataType: DataType = child.dataType

952

def nullable: Boolean = child.nullable || scale.nullable

953

954

def eval(input: InternalRow): Any = {

955

val childResult = child.eval(input)

956

if (childResult == null) return null

957

958

val scaleResult = scale.eval(input)

959

if (scaleResult == null) return null

960

961

val scaleValue = scaleResult.asInstanceOf[Int]

962

// Implementation depends on child data type

963

childResult match {

964

case d: Double => BigDecimal(d).setScale(scaleValue, BigDecimal.RoundingMode.HALF_UP).doubleValue

965

case f: Float => BigDecimal(f.toDouble).setScale(scaleValue, BigDecimal.RoundingMode.HALF_UP).floatValue

966

case _ => childResult

967

}

968

}

969

}

970

971

/** Power function (base^exponent) */

972

case class Pow(left: Expression, right: Expression) extends BinaryMathExpression {

973

def mathFunction(left: Double, right: Double): Double = math.pow(left, right)

974

override def prettyName: String = "power"

975

}

976

```

977

978

**Usage Examples:**

979

980

```scala

981

// Mathematical expression creation

982

val angleAttr = AttributeReference("angle", DoubleType)()

983

val valueAttr = AttributeReference("value", DoubleType)()

984

985

val sineValue = Sin(angleAttr)

986

val sqrtValue = Sqrt(valueAttr)

987

val logValue = Log(valueAttr)

988

val powerValue = Pow(valueAttr, Literal(2.0)) // Square

989

990

// Rounding

991

val rounded = Round(valueAttr, Literal(2)) // Round to 2 decimal places

992

993

// Evaluation with sample data

994

val row = InternalRow(math.Pi / 2, 16.0) // 90 degrees, value 16

995

val result1 = sineValue.eval(row) // ~1.0 (sin(π/2))

996

val result2 = sqrtValue.eval(row) // 4.0 (√16)

997

val result3 = powerValue.eval(row) // 256.0 (16²)

998

```

999

1000

### Complex Type Operations

1001

1002

Operations for working with arrays, maps, and structs:

1003

1004

```scala { .api }

1005

/** Extract item from array by index */

1006

case class GetArrayItem(left: Expression, right: Expression) extends BinaryExpression {

1007

def dataType: DataType = left.dataType.asInstanceOf[ArrayType].elementType

1008

def nullable: Boolean = true // Array access can return null

1009

1010

def eval(input: InternalRow): Any = {

1011

val arrayResult = left.eval(input)

1012

if (arrayResult == null) return null

1013

1014

val indexResult = right.eval(input)

1015

if (indexResult == null) return null

1016

1017

val array = arrayResult.asInstanceOf[ArrayData]

1018

val index = indexResult.asInstanceOf[Int]

1019

1020

if (index < 0 || index >= array.numElements() || array.isNullAt(index)) {

1021

null

1022

} else {

1023

array.get(index, dataType)

1024

}

1025

}

1026

}

1027

1028

/** Extract value from map by key */

1029

case class GetMapValue(left: Expression, right: Expression) extends BinaryExpression {

1030

def dataType: DataType = left.dataType.asInstanceOf[MapType].valueType

1031

def nullable: Boolean = true

1032

1033

def eval(input: InternalRow): Any = {

1034

val mapResult = left.eval(input)

1035

if (mapResult == null) return null

1036

1037

val keyResult = right.eval(input)

1038

if (keyResult == null) return null

1039

1040

val map = mapResult.asInstanceOf[MapData]

1041

val key = keyResult

1042

1043

// Find key in map and return corresponding value

1044

val keys = map.keyArray()

1045

val values = map.valueArray()

1046

val keyType = left.dataType.asInstanceOf[MapType].keyType

1047

1048

for (i <- 0 until keys.numElements()) {

1049

if (!keys.isNullAt(i) && keys.get(i, keyType) == key) {

1050

if (values.isNullAt(i)) return null

1051

else return values.get(i, dataType)

1052

}

1053

}

1054

null // Key not found

1055

}

1056

}

1057

1058

/** Extract field from struct by index */

1059

case class GetStructField(child: Expression, ordinal: Int, name: Option[String] = None) extends UnaryExpression {

1060

def dataType: DataType = child.dataType.asInstanceOf[StructType].fields(ordinal).dataType

1061

def nullable: Boolean = child.nullable || child.dataType.asInstanceOf[StructType].fields(ordinal).nullable

1062

1063

def eval(input: InternalRow): Any = {

1064

val structResult = child.eval(input)

1065

if (structResult == null) return null

1066

1067

val struct = structResult.asInstanceOf[InternalRow]

1068

if (struct.isNullAt(ordinal)) null

1069

else struct.get(ordinal, dataType)

1070

}

1071

}

1072

1073

/** Create array from expressions */

1074

case class CreateArray(children: Seq[Expression]) extends Expression {

1075

def dataType: DataType = {

1076

val elementType = if (children.nonEmpty) children.head.dataType else NullType

1077

ArrayType(elementType, containsNull = children.exists(_.nullable))

1078

}

1079

def nullable: Boolean = false // Array itself is never null (but elements can be)

1080

1081

def eval(input: InternalRow): Any = {

1082

val values = children.map(_.eval(input))

1083

new GenericArrayData(values)

1084

}

1085

}

1086

1087

/** Create map from key-value expressions (alternating keys and values) */

1088

case class CreateMap(children: Seq[Expression]) extends Expression {

1089

require(children.size % 2 == 0, "CreateMap should have an even number of arguments")

1090

1091

def dataType: DataType = {

1092

val keyType = if (children.nonEmpty) children.head.dataType else StringType

1093

val valueType = if (children.size > 1) children(1).dataType else StringType

1094

val valueContainsNull = children.indices.filter(_ % 2 == 1).exists(i => children(i).nullable)

1095

MapType(keyType, valueType, valueContainsNull)

1096

}

1097

def nullable: Boolean = false

1098

1099

def eval(input: InternalRow): Any = {

1100

val evaluatedChildren = children.map(_.eval(input))

1101

val keys = evaluatedChildren.indices.filter(_ % 2 == 0).map(evaluatedChildren(_))

1102

val values = evaluatedChildren.indices.filter(_ % 2 == 1).map(evaluatedChildren(_))

1103

1104

ArrayBasedMapData(new GenericArrayData(keys), new GenericArrayData(values))

1105

}

1106

}

1107

1108

/** Create struct from field expressions */

1109

case class CreateStruct(children: Seq[Expression]) extends Expression {

1110

def dataType: DataType = {

1111

val fields = children.zipWithIndex.map { case (child, index) =>

1112

StructField(s"col${index + 1}", child.dataType, child.nullable)

1113

}

1114

StructType(fields)

1115

}

1116

def nullable: Boolean = false

1117

1118

def eval(input: InternalRow): Any = {

1119

val values = children.map(_.eval(input))

1120

InternalRow.fromSeq(values)

1121

}

1122

}

1123

```

1124

1125

**Usage Examples:**

1126

1127

```scala

1128

// Array operations

1129

val arrayAttr = AttributeReference("items", ArrayType(StringType))()

1130

val arrayAccess = GetArrayItem(arrayAttr, Literal(0)) // First element

1131

1132

val newArray = CreateArray(Seq(

1133

Literal("apple"),

1134

Literal("banana"),

1135

Literal("orange")

1136

))

1137

1138

// Map operations

1139

val mapAttr = AttributeReference("properties", MapType(StringType, IntegerType))()

1140

val mapAccess = GetMapValue(mapAttr, Literal("age"))

1141

1142

val newMap = CreateMap(Seq(

1143

Literal("name"), Literal("Alice"),

1144

Literal("age"), Literal(25)

1145

))

1146

1147

// Struct operations

1148

val structAttr = AttributeReference("person", StructType(Seq(

1149

StructField("name", StringType),

1150

StructField("age", IntegerType)

1151

)))()

1152

1153

val nameField = GetStructField(structAttr, ordinal = 0, name = Some("name"))

1154

val ageField = GetStructField(structAttr, ordinal = 1, name = Some("age"))

1155

1156

val newStruct = CreateStruct(Seq(

1157

Literal("Bob"),

1158

Literal(30)

1159

))

1160

```

1161

1162

### Type Conversion and Casting

1163

1164

Type conversion operations for data transformation:

1165

1166

```scala { .api }

1167

/**

1168

* Type casting expression

1169

*/

1170

case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String] = None)

1171

extends UnaryExpression {

1172

1173

def nullable: Boolean = child.nullable

1174

1175

def eval(input: InternalRow): Any = {

1176

val childResult = child.eval(input)

1177

if (childResult == null) return null

1178

1179

// Cast implementation depends on source and target types

1180

cast(childResult, child.dataType, dataType)

1181

}

1182

1183

private def cast(value: Any, from: DataType, to: DataType): Any = {

1184

(from, to) match {

1185

case (StringType, IntegerType) =>

1186

try { value.asInstanceOf[UTF8String].toString.toInt }

1187

catch { case _: NumberFormatException => null }

1188

1189

case (IntegerType, StringType) =>

1190

UTF8String.fromString(value.asInstanceOf[Int].toString)

1191

1192

case (DoubleType, IntegerType) =>

1193

value.asInstanceOf[Double].toInt

1194

1195

case (IntegerType, DoubleType) =>

1196

value.asInstanceOf[Int].toDouble

1197

1198

// Many more casting rules...

1199

case _ if from == to => value // No cast needed

1200

case _ => null // Unsupported cast

1201

}

1202

}

1203

}

1204

1205

/** Safe cast that returns null on conversion failure */

1206

case class TryCast(child: Expression, dataType: DataType, timeZoneId: Option[String] = None)

1207

extends UnaryExpression {

1208

1209

def nullable: Boolean = true // Always nullable since cast can fail

1210

1211

def eval(input: InternalRow): Any = {

1212

try {

1213

Cast(child, dataType, timeZoneId).eval(input)

1214

} catch {

1215

case _: Exception => null // Return null on any cast failure

1216

}

1217

}

1218

}

1219

```

1220

1221

**Usage Examples:**

1222

1223

```scala

1224

// Type casting

1225

val stringAttr = AttributeReference("stringValue", StringType)()

1226

val numberAttr = AttributeReference("numberValue", DoubleType)()

1227

1228

val stringToInt = Cast(stringAttr, IntegerType)

1229

val doubleToInt = Cast(numberAttr, IntegerType)

1230

val intToString = Cast(Literal(42), StringType)

1231

1232

// Safe casting (returns null on failure)

1233

val safeCast = TryCast(stringAttr, IntegerType)

1234

1235

// Evaluation

1236

val row1 = InternalRow(UTF8String.fromString("123"), 45.67)

1237

val result1 = stringToInt.eval(row1) // 123

1238

val result2 = doubleToInt.eval(row1) // 45

1239

val result3 = intToString.eval(row1) // "42"

1240

1241

val row2 = InternalRow(UTF8String.fromString("invalid"), 45.67)

1242

val result4 = stringToInt.eval(row2) // null (cast failure)

1243

val result5 = safeCast.eval(row2) // null (safe cast)

1244

```

1245

1246

### Date and Time Expressions

1247

1248

Date and time operations for temporal data processing:

1249

1250

```scala { .api }

1251

/** Extract year from date/timestamp */

1252

case class Year(child: Expression) extends UnaryExpression {

1253

def dataType: DataType = IntegerType

1254

def nullable: Boolean = child.nullable

1255

1256

def eval(input: InternalRow): Any = {

1257

val childResult = child.eval(input)

1258

if (childResult == null) return null

1259

1260

child.dataType match {

1261

case DateType =>

1262

val days = childResult.asInstanceOf[Int]

1263

DateTimeUtils.getYear(days)

1264

case TimestampType =>

1265

val microseconds = childResult.asInstanceOf[Long]

1266

DateTimeUtils.getYear(microseconds)

1267

}

1268

}

1269

}

1270

1271

/** Extract month from date/timestamp */

1272

case class Month(child: Expression) extends UnaryExpression {

1273

def dataType: DataType = IntegerType

1274

def nullable: Boolean = child.nullable

1275

1276

def eval(input: InternalRow): Any = {

1277

val childResult = child.eval(input)

1278

if (childResult == null) return null

1279

1280

child.dataType match {

1281

case DateType =>

1282

val days = childResult.asInstanceOf[Int]

1283

DateTimeUtils.getMonth(days)

1284

case TimestampType =>

1285

val microseconds = childResult.asInstanceOf[Long]

1286

DateTimeUtils.getMonth(microseconds)

1287

}

1288

}

1289

}

1290

1291

/** Extract day of month from date/timestamp */

1292

case class DayOfMonth(child: Expression) extends UnaryExpression {

1293

def dataType: DataType = IntegerType

1294

def nullable: Boolean = child.nullable

1295

1296

def eval(input: InternalRow): Any = {

1297

val childResult = child.eval(input)

1298

if (childResult == null) return null

1299

1300

child.dataType match {

1301

case DateType =>

1302

val days = childResult.asInstanceOf[Int]

1303

DateTimeUtils.getDayOfMonth(days)

1304

case TimestampType =>

1305

val microseconds = childResult.asInstanceOf[Long]

1306

DateTimeUtils.getDayOfMonth(microseconds)

1307

}

1308

}

1309

}

1310

1311

/** Extract hour from timestamp */

1312

case class Hour(child: Expression) extends UnaryExpression {

1313

def dataType: DataType = IntegerType

1314

def nullable: Boolean = child.nullable

1315

1316

def eval(input: InternalRow): Any = {

1317

val childResult = child.eval(input)

1318

if (childResult == null) return null

1319

1320

val microseconds = childResult.asInstanceOf[Long]

1321

DateTimeUtils.getHour(microseconds)

1322

}

1323

}

1324

1325

/** Extract minute from timestamp */

1326

case class Minute(child: Expression) extends UnaryExpression {

1327

def dataType: DataType = IntegerType

1328

def nullable: Boolean = child.nullable

1329

1330

def eval(input: InternalRow): Any = {

1331

val childResult = child.eval(input)

1332

if (childResult == null) return null

1333

1334

val microseconds = childResult.asInstanceOf[Long]

1335

DateTimeUtils.getMinute(microseconds)

1336

}

1337

}

1338

1339

/** Extract second from timestamp */

1340

case class Second(child: Expression) extends UnaryExpression {

1341

def dataType: DataType = IntegerType

1342

def nullable: Boolean = child.nullable

1343

1344

def eval(input: InternalRow): Any = {

1345

val childResult = child.eval(input)

1346

if (childResult == null) return null

1347

1348

val microseconds = childResult.asInstanceOf[Long]

1349

DateTimeUtils.getSecond(microseconds)

1350

}

1351

}

1352

1353

/** Current timestamp function */

1354

case class CurrentTimestamp() extends LeafExpression with Nondeterministic {

1355

def dataType: DataType = TimestampType

1356

def nullable: Boolean = false

1357

1358

def initializeInternal(partitionIndex: Int): Unit = {

1359

// Initialize with current time at task start

1360

}

1361

1362

def evalInternal(input: InternalRow): Any = {

1363

System.currentTimeMillis() * 1000L // Convert to microseconds

1364

}

1365

}

1366

1367

/** Current date function */

1368

case class CurrentDate(timeZoneId: Option[String] = None) extends LeafExpression with Nondeterministic {

1369

def dataType: DataType = DateType

1370

def nullable: Boolean = false

1371

1372

def initializeInternal(partitionIndex: Int): Unit = {

1373

// Initialize with current date at task start

1374

}

1375

1376

def evalInternal(input: InternalRow): Any = {

1377

val currentTimeMillis = System.currentTimeMillis()

1378

DateTimeUtils.millisToDays(currentTimeMillis)

1379

}

1380

}

1381

1382

/** Add interval to timestamp */

1383

case class DateAdd(startDate: Expression, days: Expression) extends BinaryExpression {

1384

def left: Expression = startDate

1385

def right: Expression = days

1386

def dataType: DataType = DateType

1387

def nullable: Boolean = startDate.nullable || days.nullable

1388

1389

def eval(input: InternalRow): Any = {

1390

val startResult = startDate.eval(input)

1391

if (startResult == null) return null

1392

1393

val daysResult = days.eval(input)

1394

if (daysResult == null) return null

1395

1396

val startDays = startResult.asInstanceOf[Int]

1397

val additionalDays = daysResult.asInstanceOf[Int]

1398

startDays + additionalDays

1399

}

1400

}

1401

1402

/** Subtract interval from timestamp */

1403

case class DateSub(startDate: Expression, days: Expression) extends BinaryExpression {

1404

def left: Expression = startDate

1405

def right: Expression = days

1406

def dataType: DataType = DateType

1407

def nullable: Boolean = startDate.nullable || days.nullable

1408

1409

def eval(input: InternalRow): Any = {

1410

val startResult = startDate.eval(input)

1411

if (startResult == null) return null

1412

1413

val daysResult = days.eval(input)

1414

if (daysResult == null) return null

1415

1416

val startDays = startResult.asInstanceOf[Int]

1417

val subtractDays = daysResult.asInstanceOf[Int]

1418

startDays - subtractDays

1419

}

1420

}

1421

```

1422

1423

**Usage Examples:**

1424

1425

```scala

1426

// Date/time expression creation

1427

val timestampAttr = AttributeReference("created_at", TimestampType)()

1428

val dateAttr = AttributeReference("birth_date", DateType)()

1429

1430

val yearExtracted = Year(timestampAttr)

1431

val monthExtracted = Month(dateAttr)

1432

val hourExtracted = Hour(timestampAttr)

1433

1434

// Current date/time

1435

val now = CurrentTimestamp()

1436

val today = CurrentDate()

1437

1438

// Date arithmetic

1439

val futureDate = DateAdd(dateAttr, Literal(30)) // Add 30 days

1440

val pastDate = DateSub(dateAttr, Literal(7)) // Subtract 7 days

1441

1442

// Evaluation with sample data

1443

val currentTime = System.currentTimeMillis() * 1000L

1444

val sampleRow = InternalRow(currentTime, DateTimeUtils.millisToDays(System.currentTimeMillis()))

1445

1446

val yearResult = yearExtracted.eval(sampleRow) // Current year

1447

val monthResult = monthExtracted.eval(sampleRow) // Current month

1448

val hourResult = hourExtracted.eval(sampleRow) // Current hour

1449

```

1450

1451

### Conditional Expressions

1452

1453

Conditional logic and branching expressions:

1454

1455

```scala { .api }

1456

/** Conditional if-then-else expression */

1457

case class If(predicate: Expression, trueValue: Expression, falseValue: Expression) extends TernaryExpression {

1458

def first: Expression = predicate

1459

def second: Expression = trueValue

1460

def third: Expression = falseValue

1461

1462

def dataType: DataType = trueValue.dataType

1463

def nullable: Boolean = trueValue.nullable || falseValue.nullable

1464

1465

def eval(input: InternalRow): Any = {

1466

val predicateResult = predicate.eval(input)

1467

if (predicateResult == null) {

1468

null

1469

} else if (predicateResult.asInstanceOf[Boolean]) {

1470

trueValue.eval(input)

1471

} else {

1472

falseValue.eval(input)

1473

}

1474

}

1475

}

1476

1477

/** Case-when conditional expression */

1478

case class CaseWhen(

1479

branches: Seq[(Expression, Expression)],

1480

elseValue: Option[Expression] = None) extends Expression {

1481

1482

def children: Seq[Expression] = branches.flatMap { case (condition, value) =>

1483

Seq(condition, value)

1484

} ++ elseValue.toSeq

1485

1486

def dataType: DataType = branches.head._2.dataType

1487

def nullable: Boolean = branches.exists(_._2.nullable) || elseValue.exists(_.nullable)

1488

1489

def eval(input: InternalRow): Any = {

1490

for ((condition, value) <- branches) {

1491

val conditionResult = condition.eval(input)

1492

if (conditionResult != null && conditionResult.asInstanceOf[Boolean]) {

1493

return value.eval(input)

1494

}

1495

}

1496

1497

elseValue match {

1498

case Some(expr) => expr.eval(input)

1499

case None => null

1500

}

1501

}

1502

}

1503

1504

/** Return first non-null expression */

1505

case class Coalesce(children: Seq[Expression]) extends Expression {

1506

require(children.nonEmpty, "Coalesce must have at least one child")

1507

1508

def dataType: DataType = children.head.dataType

1509

def nullable: Boolean = children.forall(_.nullable)

1510

1511

def eval(input: InternalRow): Any = {

1512

for (child <- children) {

1513

val result = child.eval(input)

1514

if (result != null) {

1515

return result

1516

}

1517

}

1518

null

1519

}

1520

}

1521

1522

/** Null if two expressions are equal, otherwise returns first expression */

1523

case class NullIf(left: Expression, right: Expression) extends BinaryExpression {

1524

def dataType: DataType = left.dataType

1525

def nullable: Boolean = true // Always nullable since it can return null

1526

1527

def eval(input: InternalRow): Any = {

1528

val leftResult = left.eval(input)

1529

val rightResult = right.eval(input)

1530

1531

if (leftResult == rightResult) {

1532

null

1533

} else {

1534

leftResult

1535

}

1536

}

1537

}

1538

1539

/** If first expression is null, return second expression, otherwise return first */

1540

case class Nvl(left: Expression, right: Expression) extends BinaryExpression {

1541

def dataType: DataType = left.dataType

1542

def nullable: Boolean = right.nullable

1543

1544

def eval(input: InternalRow): Any = {

1545

val leftResult = left.eval(input)

1546

if (leftResult == null) {

1547

right.eval(input)

1548

} else {

1549

leftResult

1550

}

1551

}

1552

}

1553

1554

/** Three-argument null handling: if first is null return third, otherwise return second */

1555

case class Nvl2(expr1: Expression, expr2: Expression, expr3: Expression) extends TernaryExpression {

1556

def first: Expression = expr1

1557

def second: Expression = expr2

1558

def third: Expression = expr3

1559

1560

def dataType: DataType = expr2.dataType

1561

def nullable: Boolean = expr2.nullable || expr3.nullable

1562

1563

def eval(input: InternalRow): Any = {

1564

val firstResult = expr1.eval(input)

1565

if (firstResult == null) {

1566

expr3.eval(input)

1567

} else {

1568

expr2.eval(input)

1569

}

1570

}

1571

}

1572

```

1573

1574

**Usage Examples:**

1575

1576

```scala

1577

// Conditional expressions

1578

val ageAttr = AttributeReference("age", IntegerType)()

1579

val nameAttr = AttributeReference("name", StringType)()

1580

val statusAttr = AttributeReference("status", StringType)()

1581

1582

// If-then-else

1583

val ageCategory = If(

1584

GreaterThan(ageAttr, Literal(18)),

1585

Literal("Adult"),

1586

Literal("Minor")

1587

)

1588

1589

// Case-when with multiple branches

1590

val riskLevel = CaseWhen(

1591

branches = Seq(

1592

(GreaterThan(ageAttr, Literal(65)), Literal("High Risk")),

1593

(GreaterThan(ageAttr, Literal(40)), Literal("Medium Risk")),

1594

(GreaterThan(ageAttr, Literal(18)), Literal("Low Risk"))

1595

),

1596

elseValue = Some(Literal("No Risk"))

1597

)

1598

1599

// Coalesce - return first non-null

1600

val preferredName = Coalesce(Seq(

1601

AttributeReference("nickname", StringType)(),

1602

AttributeReference("first_name", StringType)(),

1603

Literal("Unknown")

1604

))

1605

1606

// Null handling

1607

val statusOrDefault = Nvl(statusAttr, Literal("Active"))

1608

1609

// Evaluation with sample data

1610

val sampleRow = InternalRow(25, UTF8String.fromString("Alice"), null)

1611

val result1 = ageCategory.eval(sampleRow) // "Adult" (25 > 18)

1612

val result2 = riskLevel.eval(sampleRow) // "Low Risk" (25 > 18 but not > 40)

1613

val result3 = statusOrDefault.eval(sampleRow) // "Active" (status is null)

1614

```

1615

1616

### Aggregate Expressions

1617

1618

Aggregation operations for computing statistics across multiple rows:

1619

1620

```scala { .api }

1621

/**

1622

* Base class for all aggregate expressions

1623

*/

1624

abstract class AggregateExpression extends Expression {

1625

/** Whether this aggregate is distinct */

1626

def isDistinct: Boolean

1627

1628

/** Aggregate function implementation */

1629

def aggregateFunction: AggregateFunction

1630

1631

def dataType: DataType = aggregateFunction.dataType

1632

def nullable: Boolean = aggregateFunction.nullable

1633

def children: Seq[Expression] = aggregateFunction.children

1634

}

1635

1636

/**

1637

* Base trait for aggregate functions

1638

*/

1639

abstract class AggregateFunction extends Expression {

1640

/** Attributes representing the aggregation buffer */

1641

def aggBufferAttributes: Seq[AttributeReference]

1642

1643

/** Initial values for the buffer */

1644

def initialValues: Seq[Expression]

1645

1646

/** Update buffer with new input row */

1647

def updateExpressions: Seq[Expression]

1648

1649

/** Merge two buffers */

1650

def mergeExpressions: Seq[Expression]

1651

1652

/** Extract final result from buffer */

1653

def evaluateExpression: Expression

1654

}

1655

1656

/**

1657

* Declarative aggregate functions using expressions

1658

*/

1659

abstract class DeclarativeAggregate extends AggregateFunction

1660

1661

/** Count aggregate function */

1662

case class Count(children: Seq[Expression]) extends DeclarativeAggregate {

1663

def this(child: Expression) = this(Seq(child))

1664

1665

def dataType: DataType = LongType

1666

def nullable: Boolean = false

1667

1668

// Buffer has single long value for count

1669

lazy val count = AttributeReference("count", LongType, nullable = false)()

1670

def aggBufferAttributes: Seq[AttributeReference] = Seq(count)

1671

1672

// Initialize count to 0

1673

def initialValues: Seq[Expression] = Seq(Literal(0L))

1674

1675

// Increment count for non-null values (or all rows if no children)

1676

def updateExpressions: Seq[Expression] = {

1677

if (children.isEmpty) {

1678

// COUNT(*) - count all rows

1679

Seq(Add(count, Literal(1L)))

1680

} else {

1681

// COUNT(expr) - count non-null values

1682

val condition = children.map(IsNotNull).reduce(And)

1683

Seq(If(condition, Add(count, Literal(1L)), count))

1684

}

1685

}

1686

1687

// Merge counts by adding them

1688

def mergeExpressions: Seq[Expression] = Seq(Add(count.left, count.right))

1689

1690

// Final result is the count value

1691

def evaluateExpression: Expression = count

1692

}

1693

1694

/** Sum aggregate function */

1695

case class Sum(child: Expression) extends DeclarativeAggregate {

1696

def children: Seq[Expression] = Seq(child)

1697

def dataType: DataType = child.dataType

1698

def nullable: Boolean = true

1699

1700

// Buffer has sum and isEmpty flag

1701

lazy val sum = AttributeReference("sum", child.dataType, nullable = true)()

1702

lazy val isEmpty = AttributeReference("isEmpty", BooleanType, nullable = false)()

1703

def aggBufferAttributes: Seq[AttributeReference] = Seq(sum, isEmpty)

1704

1705

// Initialize sum to null, isEmpty to true

1706

def initialValues: Seq[Expression] = Seq(Literal(null, child.dataType), Literal(true))

1707

1708

// Add non-null values to sum

1709

def updateExpressions: Seq[Expression] = Seq(

1710

If(IsNull(child), sum,

1711

If(isEmpty, child, Add(sum, child))),

1712

If(IsNull(child), isEmpty, Literal(false))

1713

)

1714

1715

// Merge sums

1716

def mergeExpressions: Seq[Expression] = Seq(

1717

If(sum.left.isEmpty, sum.right,

1718

If(sum.right.isEmpty, sum.left, Add(sum.left, sum.right))),

1719

And(sum.left.isEmpty, sum.right.isEmpty)

1720

)

1721

1722

// Final result is sum (null if no non-null values)

1723

def evaluateExpression: Expression = If(isEmpty, Literal(null, dataType), sum)

1724

}

1725

1726

/** Maximum aggregate function */

1727

case class Max(child: Expression) extends DeclarativeAggregate {

1728

def children: Seq[Expression] = Seq(child)

1729

def dataType: DataType = child.dataType

1730

def nullable: Boolean = true

1731

1732

lazy val max = AttributeReference("max", child.dataType, nullable = true)()

1733

def aggBufferAttributes: Seq[AttributeReference] = Seq(max)

1734

1735

def initialValues: Seq[Expression] = Seq(Literal(null, child.dataType))

1736

1737

def updateExpressions: Seq[Expression] = Seq(

1738

If(IsNull(child), max,

1739

If(IsNull(max), child,

1740

If(GreaterThan(child, max), child, max)))

1741

)

1742

1743

def mergeExpressions: Seq[Expression] = Seq(

1744

If(IsNull(max.left), max.right,

1745

If(IsNull(max.right), max.left,

1746

If(GreaterThan(max.left, max.right), max.left, max.right)))

1747

)

1748

1749

def evaluateExpression: Expression = max

1750

}

1751

1752

/** Minimum aggregate function */

1753

case class Min(child: Expression) extends DeclarativeAggregate {

1754

def children: Seq[Expression] = Seq(child)

1755

def dataType: DataType = child.dataType

1756

def nullable: Boolean = true

1757

1758

lazy val min = AttributeReference("min", child.dataType, nullable = true)()

1759

def aggBufferAttributes: Seq[AttributeReference] = Seq(min)

1760

1761

def initialValues: Seq[Expression] = Seq(Literal(null, child.dataType))

1762

1763

def updateExpressions: Seq[Expression] = Seq(

1764

If(IsNull(child), min,

1765

If(IsNull(min), child,

1766

If(LessThan(child, min), child, min)))

1767

)

1768

1769

def mergeExpressions: Seq[Expression] = Seq(

1770

If(IsNull(min.left), min.right,

1771

If(IsNull(min.right), min.left,

1772

If(LessThan(min.left, min.right), min.left, min.right)))

1773

)

1774

1775

def evaluateExpression: Expression = min

1776

}

1777

1778

/** Average aggregate function */

1779

case class Average(child: Expression) extends DeclarativeAggregate {

1780

def children: Seq[Expression] = Seq(child)

1781

def dataType: DataType = DoubleType

1782

def nullable: Boolean = true

1783

1784

lazy val sum = AttributeReference("sum", DoubleType, nullable = true)()

1785

lazy val count = AttributeReference("count", LongType, nullable = false)()

1786

def aggBufferAttributes: Seq[AttributeReference] = Seq(sum, count)

1787

1788

def initialValues: Seq[Expression] = Seq(Literal(null, DoubleType), Literal(0L))

1789

1790

def updateExpressions: Seq[Expression] = Seq(

1791

If(IsNull(child), sum,

1792

If(IsNull(sum), Cast(child, DoubleType), Add(sum, Cast(child, DoubleType)))),

1793

If(IsNull(child), count, Add(count, Literal(1L)))

1794

)

1795

1796

def mergeExpressions: Seq[Expression] = Seq(

1797

If(IsNull(sum.left), sum.right,

1798

If(IsNull(sum.right), sum.left, Add(sum.left, sum.right))),

1799

Add(count.left, count.right)

1800

)

1801

1802

def evaluateExpression: Expression =

1803

If(EqualTo(count, Literal(0L)), Literal(null, DoubleType), Divide(sum, Cast(count, DoubleType)))

1804

}

1805

```

1806

1807

**Usage Examples:**

1808

1809

```scala

1810

// Aggregate expression creation

1811

val ageAttr = AttributeReference("age", IntegerType)()

1812

val salaryAttr = AttributeReference("salary", DoubleType)()

1813

1814

val totalCount = Count(Seq.empty) // COUNT(*)

1815

val ageCount = Count(Seq(ageAttr)) // COUNT(age)

1816

val totalSalary = Sum(salaryAttr)

1817

val maxAge = Max(ageAttr)

1818

val minSalary = Min(salaryAttr)

1819

val avgSalary = Average(salaryAttr)

1820

1821

// Usage in aggregation context (conceptual - normally used in query plans)

1822

val aggregates = Seq(

1823

Alias(totalCount, "total_employees")(),

1824

Alias(avgSalary, "avg_salary")(),

1825

Alias(maxAge, "max_age")()

1826

)

1827

```

1828

1829

### Code Generation Framework

1830

1831

Catalyst's code generation system for high-performance expression evaluation:

1832

1833

```scala { .api }

1834

/**

1835

* Context for Java code generation

1836

*/

1837

class CodegenContext {

1838

/** Add object reference and return variable name */

1839

def addReferenceObj(obj: AnyRef): String = {

1840

val objName = s"obj$${references.size}"

1841

references += obj

1842

objName

1843

}

1844

1845

/** Add mutable state variable */

1846

def addMutableState(

1847

javaType: String,

1848

variableName: String,

1849

initCode: String = ""): String = {

1850

val name = if (variableName.nonEmpty) variableName else freshName("mutableState")

1851

mutableStateInitCode += s"$initCode"

1852

mutableStateVars += s"private $javaType $name;"

1853

name

1854

}

1855

1856

/** Generate fresh variable name */

1857

def freshName(name: String): String = {

1858

val count = freshNameIds.getOrElse(name, 0)

1859

freshNameIds(name) = count + 1

1860

s"${name}_$count"

1861

}

1862

1863

/** Add new function to generated class */

1864

def addNewFunction(funcName: String, funcCode: String): String = {

1865

addedFunctions += funcCode

1866

funcName

1867

}

1868

1869

/** Split large expressions into multiple functions */

1870

def splitExpressionsByRows(expressions: Seq[Expression], functionName: String): String = {

1871

// Split expressions to avoid Java method size limits

1872

val splitSize = 1000

1873

val splitExprs = expressions.grouped(splitSize).zipWithIndex.map { case (exprs, index) =>

1874

val subFuncName = s"${functionName}_$index"

1875

val codes = exprs.map(_.genCode(this))

1876

generateSubFunction(subFuncName, codes)

1877

}

1878

splitExprs.mkString("\\n")

1879

}

1880

1881

// Internal state

1882

private val references = mutable.ArrayBuffer[AnyRef]()

1883

private val mutableStateInitCode = mutable.ArrayBuffer[String]()

1884

private val mutableStateVars = mutable.ArrayBuffer[String]()

1885

private val freshNameIds = mutable.HashMap[String, Int]()

1886

private val addedFunctions = mutable.ArrayBuffer[String]()

1887

}

1888

1889

/**

1890

* Generated code for an expression

1891

*/

1892

case class ExprCode(code: String, isNull: String, value: String) {

1893

/** Copy with new null check code */

1894

def copy(code: String = this.code,

1895

isNull: String = this.isNull,

1896

value: String = this.value): ExprCode = {

1897

ExprCode(code, isNull, value)

1898

}

1899

}

1900

1901

/**

1902

* Utility functions for code generation

1903

*/

1904

object CodeGenerator {

1905

/** Generate Java code for expression evaluation */

1906

def generateCode(ctx: CodegenContext, expressions: Seq[Expression]): (String, String, String) = {

1907

val codes = expressions.map(_.genCode(ctx))

1908

1909

val evalCodes = codes.map(_.code).mkString("\\n")

1910

val nullChecks = codes.map(_.isNull).mkString(", ")

1911

val values = codes.map(_.value).mkString(", ")

1912

1913

(evalCodes, nullChecks, values)

1914

}

1915

1916

/** Java type mapping from Catalyst DataType */

1917

def javaType(dataType: DataType): String = dataType match {

1918

case BooleanType => "boolean"

1919

case ByteType => "byte"

1920

case ShortType => "short"

1921

case IntegerType => "int"

1922

case LongType => "long"

1923

case FloatType => "float"

1924

case DoubleType => "double"

1925

case StringType => "UTF8String"

1926

case BinaryType => "byte[]"

1927

case DateType => "int"

1928

case TimestampType => "long"

1929

case _: DecimalType => "Decimal"

1930

case _: ArrayType => "ArrayData"

1931

case _: MapType => "MapData"

1932

case _: StructType => "InternalRow"

1933

case _ => "Object"

1934

}

1935

1936

/** Default value for Java type */

1937

def defaultValue(dataType: DataType): String = dataType match {

1938

case BooleanType => "false"

1939

case ByteType | ShortType | IntegerType => "0"

1940

case LongType => "0L"

1941

case FloatType => "0.0f"

1942

case DoubleType => "0.0"

1943

case _ => "null"

1944

}

1945

}

1946

1947

/**

1948

* Code generation support for expressions

1949

*/

1950

trait CodegenSupport extends Expression {

1951

/** Generate Java code for this expression */

1952

def genCode(ctx: CodegenContext): ExprCode = {

1953

val nullVar = ctx.freshName("isNull")

1954

val valueVar = ctx.freshName("value")

1955

val javaType = CodeGenerator.javaType(dataType)

1956

1957

val code = s"""

1958

|boolean $nullVar = false;

1959

|$javaType $valueVar = ${CodeGenerator.defaultValue(dataType)};

1960

|${doGenCode(ctx, nullVar, valueVar)}

1961

""".stripMargin

1962

1963

ExprCode(code, nullVar, valueVar)

1964

}

1965

1966

/** Subclasses implement this to generate expression-specific code */

1967

protected def doGenCode(ctx: CodegenContext, nullVar: String, valueVar: String): String

1968

}

1969

1970

/**

1971

* Example: Add expression with code generation

1972

*/

1973

case class Add(left: Expression, right: Expression) extends BinaryArithmetic with CodegenSupport {

1974

def symbol: String = "+"

1975

1976

protected def doGenCode(ctx: CodegenContext, nullVar: String, valueVar: String): String = {

1977

val leftGen = left.genCode(ctx)

1978

val rightGen = right.genCode(ctx)

1979

1980

s"""

1981

|${leftGen.code}

1982

|${rightGen.code}

1983

|

1984

|if (${leftGen.isNull} || ${rightGen.isNull}) {

1985

| $nullVar = true;

1986

|} else {

1987

| $valueVar = ${leftGen.value} + ${rightGen.value};

1988

|}

1989

""".stripMargin

1990

}

1991

1992

// Fallback interpreted evaluation

1993

def nullSafeEval(input1: Any, input2: Any): Any = {

1994

dataType match {

1995

case IntegerType => input1.asInstanceOf[Int] + input2.asInstanceOf[Int]

1996

case LongType => input1.asInstanceOf[Long] + input2.asInstanceOf[Long]

1997

case DoubleType => input1.asInstanceOf[Double] + input2.asInstanceOf[Double]

1998

// ... other numeric types

1999

}

2000

}

2001

}

2002

```

2003

2004

**Usage Examples:**

2005

2006

```scala

2007

// Code generation context

2008

val ctx = new CodegenContext()

2009

2010

// Expression for code generation

2011

val ageAttr = AttributeReference("age", IntegerType)()

2012

val expr = Add(ageAttr, Literal(10))

2013

2014

// Generate code

2015

val generated = expr.genCode(ctx)

2016

println(generated.code)

2017

// Output: Java code for evaluating age + 10

2018

2019

// Generated code structure:

2020

/*

2021

boolean isNull_0 = false;

2022

int value_0 = 0;

2023

2024

// Load age from input row

2025

boolean isNull_1 = input.isNullAt(0);

2026

int value_1 = isNull_1 ? 0 : input.getInt(0);

2027

2028

// Literal 10

2029

boolean isNull_2 = false;

2030

int value_2 = 10;

2031

2032

// Add operation

2033

if (isNull_1 || isNull_2) {

2034

isNull_0 = true;

2035

} else {

2036

value_0 = value_1 + value_2;

2037

}

2038

*/

2039

2040

// Performance: Generated code is much faster than interpretation

2041

// - No virtual method calls

2042

// - No boxing/unboxing

2043

// - Direct memory access

2044

// - JIT-friendly code patterns

2045

```

2046

2047

## Summary

2048

2049

The Catalyst expression system provides a powerful and extensible framework for representing and evaluating computations in Spark SQL. Key capabilities include:

2050

2051

**Expression Hierarchy**: Base Expression class with specialized traits for different patterns (Unary, Binary, Ternary, etc.)

2052

2053

**Type Safety**: Strong typing with DataType system integration and comprehensive type checking

2054

2055

**Evaluation Modes**: Both interpreted evaluation (eval) and high-performance code generation (genCode)

2056

2057

**Rich Function Library**: Comprehensive set of built-in operations for arithmetic, comparisons, string manipulation, date/time operations, complex types, aggregation, and more

2058

2059

**Named Expressions**: Support for aliasing and attribute references enabling query planning and optimization

2060

2061

**Conditional Logic**: Full support for if-then-else, case-when, and null handling operations

2062

2063

**Code Generation**: Advanced code generation framework that produces optimized Java code for high-performance execution

2064

2065

**Extensibility**: Clean interfaces for adding custom expressions and functions

2066

2067

This expression system forms the foundation for all SQL operations in Spark, enabling both correctness through strong typing and performance through code generation optimization.