or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/maven-org-apache-spark--spark-catalyst_2-10

Catalyst is a library for manipulating relational query plans used as the foundation for Spark SQL's query optimizer and execution engine

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
mavenpkg:maven/org.apache.spark/spark-catalyst_2.10@1.6.x

To install, run

npx @tessl/cli install tessl/maven-org-apache-spark--spark-catalyst_2-10@1.6.0

0

# Spark Catalyst

1

2

Spark Catalyst is an extensible query optimizer and execution planning framework that serves as the foundation of Apache Spark's SQL engine. It provides a comprehensive set of tools for representing, manipulating, and optimizing relational query plans through a tree-based representation system that supports complex transformations including predicate pushdown, constant folding, and projection pruning.

3

4

## Package Information

5

6

- **Package Name**: org.apache.spark:spark-catalyst_2.10

7

- **Package Type**: maven

8

- **Language**: Scala

9

- **Installation**: Add to your Maven pom.xml:

10

11

```xml

12

<dependency>

13

<groupId>org.apache.spark</groupId>

14

<artifactId>spark-catalyst_2.10</artifactId>

15

<version>1.6.3</version>

16

</dependency>

17

```

18

19

## Core Imports

20

21

```scala

22

import org.apache.spark.sql._

23

import org.apache.spark.sql.catalyst.expressions._

24

import org.apache.spark.sql.catalyst.plans.logical._

25

import org.apache.spark.sql.catalyst.trees._

26

import org.apache.spark.sql.types._

27

```

28

29

## Basic Usage

30

31

```scala

32

import org.apache.spark.sql._

33

import org.apache.spark.sql.catalyst.expressions._

34

import org.apache.spark.sql.catalyst.plans.logical._

35

import org.apache.spark.sql.types._

36

37

// Create a simple logical plan

38

val schema = StructType(Seq(

39

StructField("id", IntegerType, nullable = false),

40

StructField("name", StringType, nullable = true)

41

))

42

43

// Work with expressions

44

val idRef = AttributeReference("id", IntegerType, nullable = false)()

45

val nameRef = AttributeReference("name", StringType, nullable = true)()

46

val literal = Literal(42, IntegerType)

47

48

// Create a simple filter expression

49

val filterExpr = EqualTo(idRef, literal)

50

51

// Create a Row

52

val row = Row(1, "Alice")

53

val name = row.getString(1)

54

val id = row.getInt(0)

55

```

56

57

## Architecture

58

59

Spark Catalyst is built around several key architectural components:

60

61

- **Tree-based Representation**: All plans and expressions extend TreeNode, providing uniform tree traversal and transformation capabilities

62

- **Type System**: Rich type system supporting SQL types including primitive, complex, and user-defined types

63

- **Expression Framework**: Extensible expression evaluation system with code generation support for high performance

64

- **Analysis Phase**: Resolution of unresolved references, type checking, and semantic validation

65

- **Optimization Phase**: Rule-based optimization framework with built-in optimizations like predicate pushdown and constant folding

66

- **Code Generation**: Whole-stage code generation for high-performance query execution

67

68

## Capabilities

69

70

### Row Data Interface

71

72

Core interface for working with structured row data, providing both generic and type-safe access methods.

73

74

```scala { .api }

75

trait Row extends Serializable {

76

def size: Int

77

def length: Int

78

def schema: StructType

79

def apply(i: Int): Any

80

def get(i: Int): Any

81

def isNullAt(i: Int): Boolean

82

def getBoolean(i: Int): Boolean

83

def getByte(i: Int): Byte

84

def getShort(i: Int): Short

85

def getInt(i: Int): Int

86

def getLong(i: Int): Long

87

def getFloat(i: Int): Float

88

def getDouble(i: Int): Double

89

def getString(i: Int): String

90

def getDecimal(i: Int): java.math.BigDecimal

91

def getDate(i: Int): java.sql.Date

92

def getTimestamp(i: Int): java.sql.Timestamp

93

def getAs[T](i: Int): T

94

def getAs[T](fieldName: String): T

95

def copy(): Row

96

}

97

98

object Row {

99

def apply(values: Any*): Row

100

def fromSeq(values: Seq[Any]): Row

101

def fromTuple(tuple: Product): Row

102

def merge(rows: Row*): Row

103

val empty: Row

104

}

105

```

106

107

[Row Operations](./row-operations.md)

108

109

### Tree Framework

110

111

Foundation for all Catalyst data structures, providing tree traversal, transformation, and manipulation capabilities.

112

113

```scala { .api }

114

abstract class TreeNode[BaseType <: TreeNode[BaseType]] extends Product {

115

def children: Seq[BaseType]

116

def fastEquals(other: TreeNode[_]): Boolean

117

// Tree traversal and transformation methods

118

}

119

120

case class Origin(

121

line: Option[Int] = None,

122

startPosition: Option[Int] = None

123

)

124

125

object CurrentOrigin {

126

def get: Origin

127

def set(o: Origin): Unit

128

def reset(): Unit

129

def withOrigin[A](o: Origin)(f: => A): A

130

}

131

```

132

133

[Tree Operations](./tree-operations.md)

134

135

### Type System

136

137

Comprehensive type system supporting all SQL types including primitives, complex nested types, and user-defined types.

138

139

```scala { .api }

140

abstract class DataType extends AbstractDataType {

141

def defaultSize: Int

142

def typeName: String

143

def json: String

144

def prettyJson: String

145

def simpleString: String

146

def sameType(other: DataType): Boolean

147

}

148

149

// Primitive types (objects)

150

object BooleanType extends DataType

151

object ByteType extends DataType

152

object ShortType extends DataType

153

object IntegerType extends DataType

154

object LongType extends DataType

155

object FloatType extends DataType

156

object DoubleType extends DataType

157

object StringType extends DataType

158

object BinaryType extends DataType

159

object DateType extends DataType

160

object TimestampType extends DataType

161

```

162

163

[Type System](./types.md)

164

165

### Expression System

166

167

Extensible expression evaluation framework with support for code generation and complex expression trees.

168

169

```scala { .api }

170

abstract class Expression extends TreeNode[Expression] {

171

def foldable: Boolean

172

def deterministic: Boolean

173

def nullable: Boolean

174

def references: AttributeSet

175

def eval(input: InternalRow): Any

176

def dataType: DataType

177

}

178

179

// Expression hierarchy

180

trait LeafExpression extends Expression

181

trait UnaryExpression extends Expression

182

trait BinaryExpression extends Expression

183

trait BinaryOperator extends BinaryExpression

184

185

// Key expression types

186

case class Literal(value: Any, dataType: DataType) extends LeafExpression

187

case class AttributeReference(name: String, dataType: DataType, nullable: Boolean, metadata: Metadata = Metadata.empty) extends Attribute

188

case class Cast(child: Expression, dataType: DataType) extends UnaryExpression

189

```

190

191

[Expressions](./expressions.md)

192

193

### Query Planning

194

195

Logical and physical query plan representations with transformation and optimization capabilities.

196

197

```scala { .api }

198

abstract class QueryPlan[PlanType <: TreeNode[PlanType]] extends TreeNode[PlanType] {

199

def output: Seq[Attribute]

200

def outputSet: AttributeSet

201

def references: AttributeSet

202

def inputSet: AttributeSet

203

def transformExpressions(rule: PartialFunction[Expression, Expression]): this.type

204

}

205

206

abstract class LogicalPlan extends QueryPlan[LogicalPlan] {

207

def analyzed: Boolean

208

def resolveOperators(rule: PartialFunction[LogicalPlan, LogicalPlan]): LogicalPlan

209

def resolveExpressions(r: PartialFunction[Expression, Expression]): LogicalPlan

210

}

211

```

212

213

[Query Planning](./query-planning.md)

214

215

### Analysis Framework

216

217

Semantic analysis system for resolving unresolved references, type checking, and plan validation.

218

219

```scala { .api }

220

class Analyzer(catalog: Catalog, registry: FunctionRegistry, conf: CatalystConf) extends RuleExecutor[LogicalPlan] {

221

def execute(plan: LogicalPlan): LogicalPlan

222

}

223

224

trait Catalog {

225

def lookupRelation(name: Seq[String]): LogicalPlan

226

def functionExists(name: String): Boolean

227

def lookupFunction(name: String, children: Seq[Expression]): Expression

228

}

229

230

trait FunctionRegistry {

231

def registerFunction(name: String, info: ExpressionInfo, builder: Seq[Expression] => Expression): Unit

232

def lookupFunction(name: String, children: Seq[Expression]): Expression

233

def functionExists(name: String): Boolean

234

}

235

```

236

237

[Analysis](./analysis.md)

238

239

### Optimization Framework

240

241

Rule-based optimization system with built-in optimizations for query plan improvement.

242

243

```scala { .api }

244

object Optimizer extends RuleExecutor[LogicalPlan] {

245

// Key optimization rules available as objects

246

object ConstantFolding extends Rule[LogicalPlan]

247

object BooleanSimplification extends Rule[LogicalPlan]

248

object ColumnPruning extends Rule[LogicalPlan]

249

object FilterPushdown extends Rule[LogicalPlan]

250

object ProjectCollapsing extends Rule[LogicalPlan]

251

}

252

253

abstract class Rule[TreeType <: TreeNode[TreeType]] {

254

def apply(plan: TreeType): TreeType

255

}

256

257

abstract class RuleExecutor[TreeType <: TreeNode[TreeType]] {

258

def execute(plan: TreeType): TreeType

259

def batches: Seq[Batch]

260

}

261

```

262

263

[Optimization](./optimization.md)

264

265

## Types

266

267

```scala { .api }

268

// Complex data types

269

case class ArrayType(elementType: DataType, containsNull: Boolean) extends DataType

270

case class MapType(keyType: DataType, valueType: DataType, valueContainsNull: Boolean) extends DataType

271

case class StructType(fields: Array[StructField]) extends DataType

272

case class StructField(name: String, dataType: DataType, nullable: Boolean, metadata: Metadata)

273

274

// Decimal types

275

case class DecimalType(precision: Int, scale: Int) extends DataType

276

case class Decimal(value: java.math.BigDecimal) extends Ordered[Decimal]

277

278

// Analysis types

279

type Resolver = (String, String) => Boolean

280

case class TypeCheckResult(isSuccess: Boolean, errorMessage: Option[String])

281

282

// Configuration

283

trait CatalystConf {

284

def caseSensitiveAnalysis: Boolean

285

def orderByOrdinal: Boolean

286

def groupByOrdinal: Boolean

287

}

288

289

// Table identification

290

case class TableIdentifier(table: String, database: Option[String])

291

292

// Exceptions

293

class AnalysisException(message: String, line: Option[Int], startPosition: Option[Int]) extends Exception

294

```