or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/maven-org-apache-spark--spark-catalyst

Catalyst query optimization framework and expression evaluation engine for Apache Spark SQL

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
mavenpkg:maven/org.apache.spark/spark-catalyst_2.11@2.2.x

To install, run

npx @tessl/cli install tessl/maven-org-apache-spark--spark-catalyst@2.2.0

0

# Spark Catalyst

1

2

Catalyst is Apache Spark's query optimization framework and expression evaluation engine for Spark SQL. It provides a comprehensive system for manipulating relational query plans, transforming SQL queries and DataFrame operations into efficient execution plans through rule-based and cost-based optimization techniques.

3

4

## Package Information

5

6

- **Package Name**: spark-catalyst_2.11

7

- **Package Type**: maven

8

- **Language**: Scala

9

- **Version**: 2.2.3

10

- **Maven Coordinates**: `org.apache.spark:spark-catalyst_2.11:2.2.3`

11

12

## Core Imports

13

14

```scala

15

import org.apache.spark.sql.Row

16

import org.apache.spark.sql.types._

17

import org.apache.spark.sql.catalyst.expressions._

18

import org.apache.spark.sql.catalyst.plans.logical._

19

import org.apache.spark.sql.catalyst.trees._

20

```

21

22

## Basic Usage

23

24

```scala

25

import org.apache.spark.sql.Row

26

import org.apache.spark.sql.types._

27

import org.apache.spark.sql.catalyst.expressions._

28

29

// Create a data type

30

val stringType = StringType

31

val intType = IntegerType

32

33

// Create a row

34

val row = Row("Alice", 25, true)

35

36

// Access row data

37

val name: String = row.getString(0)

38

val age: Int = row.getInt(1)

39

val isActive: Boolean = row.getBoolean(2)

40

41

// Create expressions

42

val col = UnresolvedAttribute("name")

43

val literal = Literal("Alice")

44

val equals = EqualTo(col, literal)

45

```

46

47

## Architecture

48

49

Catalyst is built around several key components:

50

51

- **Tree Framework**: All query components (expressions, plans) are represented as trees that can be transformed using rules

52

- **Expression System**: Rich set of expressions for computations, predicates, and data access with code generation support

53

- **Query Plans**: Logical and physical plan representations for query execution

54

- **Analysis Framework**: Rule-based system for resolving references and type checking

55

- **Optimization Engine**: Cost-based and rule-based optimizations for query performance

56

- **Code Generation**: Dynamic code generation for high-performance expression evaluation

57

58

## Capabilities

59

60

### Data Types and Structures

61

62

Core data type system including primitive types, complex types (arrays, maps, structs), and Row interface for data access.

63

64

```scala { .api }

65

trait Row {

66

def apply(i: Int): Any

67

def get(i: Int): Any

68

def isNullAt(i: Int): Boolean

69

def getInt(i: Int): Int

70

def getString(i: Int): String

71

def getBoolean(i: Int): Boolean

72

// ... additional primitive accessors

73

}

74

75

abstract class DataType {

76

def typeName: String

77

def json: String

78

def prettyJson: String

79

def simpleString: String

80

}

81

```

82

83

[Data Types and Structures](./data-types.md)

84

85

### Expression System

86

87

Comprehensive expression framework for computations, predicates, aggregations, and data transformations with built-in code generation.

88

89

```scala { .api }

90

abstract class Expression extends TreeNode[Expression] {

91

def dataType: DataType

92

def nullable: Boolean

93

def eval(input: InternalRow): Any

94

def genCode(ctx: CodegenContext): ExprCode

95

def prettyName: String

96

}

97

98

abstract class UnaryExpression extends Expression {

99

def child: Expression

100

}

101

102

abstract class BinaryExpression extends Expression {

103

def left: Expression

104

def right: Expression

105

}

106

```

107

108

[Expression System](./expressions.md)

109

110

### Query Plans

111

112

Logical and physical query plan representations with tree-based transformations and optimization support.

113

114

```scala { .api }

115

abstract class LogicalPlan extends QueryPlan[LogicalPlan] {

116

def output: Seq[Attribute]

117

def references: AttributeSet

118

def inputSet: AttributeSet

119

def resolved: Boolean

120

}

121

122

abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] extends TreeNode[PlanType] {

123

def output: Seq[Attribute]

124

def outputSet: AttributeSet

125

def references: AttributeSet

126

}

127

```

128

129

[Query Plans](./query-plans.md)

130

131

### Analysis Framework

132

133

Rule-based analysis system for resolving unresolved references, type checking, and semantic validation.

134

135

```scala { .api }

136

class Analyzer(catalog: SessionCatalog, conf: SQLConf, maxIterations: Int)

137

extends RuleExecutor[LogicalPlan] {

138

def execute(plan: LogicalPlan): LogicalPlan

139

}

140

141

abstract class Rule[TreeType <: TreeNode[TreeType]] {

142

def ruleName: String

143

def apply(plan: TreeType): TreeType

144

}

145

```

146

147

[Analysis Framework](./analysis.md)

148

149

### Optimization

150

151

Query optimization engine with rule-based and cost-based optimization techniques.

152

153

```scala { .api }

154

abstract class Optimizer extends RuleExecutor[LogicalPlan] {

155

def batches: Seq[Batch]

156

}

157

158

case class Batch(name: String, strategy: Strategy, rules: Rule[LogicalPlan]*)

159

160

abstract class Strategy {

161

def maxIterations: Int

162

}

163

```

164

165

[Optimization](./optimization.md)

166

167

### SQL Parsing

168

169

SQL parsing interfaces and abstract syntax tree representations.

170

171

```scala { .api }

172

abstract class ParserInterface {

173

def parsePlan(sqlText: String): LogicalPlan

174

def parseExpression(sqlText: String): Expression

175

def parseDataType(sqlText: String): DataType

176

}

177

178

trait AstBuilder extends SqlBaseBaseVisitor[AnyRef] {

179

def visitSingleStatement(ctx: SingleStatementContext): LogicalPlan

180

def visitSingleExpression(ctx: SingleExpressionContext): Expression

181

}

182

```

183

184

[SQL Parsing](./parsing.md)

185

186

### Code Generation

187

188

Framework for generating efficient Java code for expression evaluation and query execution.

189

190

```scala { .api }

191

class CodegenContext {

192

def freshName(name: String): String

193

def addReferenceObj(objName: String, obj: Any, className: String = null): String

194

def addMutableState(javaType: String, variableName: String, initFunc: String = ""): String

195

}

196

197

case class ExprCode(code: String, isNull: String, value: String)

198

199

trait CodeGenerator[InType <: AnyRef, OutType <: AnyRef] {

200

def generate(expressions: InType): OutType

201

}

202

```

203

204

[Code Generation](./code-generation.md)

205

206

### Utilities

207

208

Utility classes for date/time operations, string manipulation, and other common operations.

209

210

```scala { .api }

211

object DateTimeUtils {

212

def stringToTimestamp(s: UTF8String): Option[Long]

213

def timestampToString(us: Long): String

214

def dateToString(days: Int): String

215

def stringToDate(s: UTF8String): Option[Int]

216

}

217

218

object UTF8String {

219

def fromString(str: String): UTF8String

220

def fromBytes(bytes: Array[Byte]): UTF8String

221

}

222

```

223

224

[Utilities](./utilities.md)