CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/maven-org-apache-spark--spark-catalyst

Catalyst query optimization framework and expression evaluation engine for Apache Spark SQL

Pending
Overview
Eval results
Files

Spark Catalyst

Catalyst is Apache Spark's query optimization framework and expression evaluation engine for Spark SQL. It provides a comprehensive system for manipulating relational query plans, transforming SQL queries and DataFrame operations into efficient execution plans through rule-based and cost-based optimization techniques.

Package Information

  • Package Name: spark-catalyst_2.11
  • Package Type: maven
  • Language: Scala
  • Version: 2.2.3
  • Maven Coordinates: org.apache.spark:spark-catalyst_2.11:2.2.3

Core Imports

import org.apache.spark.sql.Row
import org.apache.spark.sql.types._
import org.apache.spark.sql.catalyst.expressions._
import org.apache.spark.sql.catalyst.plans.logical._
import org.apache.spark.sql.catalyst.trees._

Basic Usage

import org.apache.spark.sql.Row
import org.apache.spark.sql.types._
import org.apache.spark.sql.catalyst.expressions._

// Create a data type
val stringType = StringType
val intType = IntegerType

// Create a row
val row = Row("Alice", 25, true)

// Access row data
val name: String = row.getString(0)
val age: Int = row.getInt(1)
val isActive: Boolean = row.getBoolean(2)

// Create expressions
val col = UnresolvedAttribute("name")
val literal = Literal("Alice")
val equals = EqualTo(col, literal)

Architecture

Catalyst is built around several key components:

  • Tree Framework: All query components (expressions, plans) are represented as trees that can be transformed using rules
  • Expression System: Rich set of expressions for computations, predicates, and data access with code generation support
  • Query Plans: Logical and physical plan representations for query execution
  • Analysis Framework: Rule-based system for resolving references and type checking
  • Optimization Engine: Cost-based and rule-based optimizations for query performance
  • Code Generation: Dynamic code generation for high-performance expression evaluation

Capabilities

Data Types and Structures

Core data type system including primitive types, complex types (arrays, maps, structs), and Row interface for data access.

trait Row {
  def apply(i: Int): Any
  def get(i: Int): Any
  def isNullAt(i: Int): Boolean
  def getInt(i: Int): Int
  def getString(i: Int): String
  def getBoolean(i: Int): Boolean
  // ... additional primitive accessors
}

abstract class DataType {
  def typeName: String
  def json: String
  def prettyJson: String
  def simpleString: String
}

Data Types and Structures

Expression System

Comprehensive expression framework for computations, predicates, aggregations, and data transformations with built-in code generation.

abstract class Expression extends TreeNode[Expression] {
  def dataType: DataType
  def nullable: Boolean
  def eval(input: InternalRow): Any
  def genCode(ctx: CodegenContext): ExprCode
  def prettyName: String
}

abstract class UnaryExpression extends Expression {
  def child: Expression
}

abstract class BinaryExpression extends Expression {
  def left: Expression
  def right: Expression
}

Expression System

Query Plans

Logical and physical query plan representations with tree-based transformations and optimization support.

abstract class LogicalPlan extends QueryPlan[LogicalPlan] {
  def output: Seq[Attribute]
  def references: AttributeSet
  def inputSet: AttributeSet
  def resolved: Boolean
}

abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] extends TreeNode[PlanType] {
  def output: Seq[Attribute]
  def outputSet: AttributeSet
  def references: AttributeSet
}

Query Plans

Analysis Framework

Rule-based analysis system for resolving unresolved references, type checking, and semantic validation.

class Analyzer(catalog: SessionCatalog, conf: SQLConf, maxIterations: Int) 
  extends RuleExecutor[LogicalPlan] {
  def execute(plan: LogicalPlan): LogicalPlan
}

abstract class Rule[TreeType <: TreeNode[TreeType]] {
  def ruleName: String
  def apply(plan: TreeType): TreeType
}

Analysis Framework

Optimization

Query optimization engine with rule-based and cost-based optimization techniques.

abstract class Optimizer extends RuleExecutor[LogicalPlan] {
  def batches: Seq[Batch]
}

case class Batch(name: String, strategy: Strategy, rules: Rule[LogicalPlan]*)

abstract class Strategy {
  def maxIterations: Int
}

Optimization

SQL Parsing

SQL parsing interfaces and abstract syntax tree representations.

abstract class ParserInterface {
  def parsePlan(sqlText: String): LogicalPlan
  def parseExpression(sqlText: String): Expression
  def parseDataType(sqlText: String): DataType
}

trait AstBuilder extends SqlBaseBaseVisitor[AnyRef] {
  def visitSingleStatement(ctx: SingleStatementContext): LogicalPlan
  def visitSingleExpression(ctx: SingleExpressionContext): Expression
}

SQL Parsing

Code Generation

Framework for generating efficient Java code for expression evaluation and query execution.

class CodegenContext {
  def freshName(name: String): String
  def addReferenceObj(objName: String, obj: Any, className: String = null): String
  def addMutableState(javaType: String, variableName: String, initFunc: String = ""): String
}

case class ExprCode(code: String, isNull: String, value: String)

trait CodeGenerator[InType <: AnyRef, OutType <: AnyRef] {
  def generate(expressions: InType): OutType
}

Code Generation

Utilities

Utility classes for date/time operations, string manipulation, and other common operations.

object DateTimeUtils {
  def stringToTimestamp(s: UTF8String): Option[Long]
  def timestampToString(us: Long): String
  def dateToString(days: Int): String
  def stringToDate(s: UTF8String): Option[Int]
}

object UTF8String {
  def fromString(str: String): UTF8String
  def fromBytes(bytes: Array[Byte]): UTF8String
}

Utilities

Install with Tessl CLI

npx tessl i tessl/maven-org-apache-spark--spark-catalyst
Workspace
tessl
Visibility
Public
Created
Last updated
Describes
mavenpkg:maven/org.apache.spark/spark-catalyst_2.11@2.2.x
Badge
tessl/maven-org-apache-spark--spark-catalyst badge