CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/maven-org-apache-spark--spark-repl-2-11

Interactive Scala shell for Apache Spark with distributed computing capabilities

Pending
Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Pending

The risk profile of this skill

Overview
Eval results
Files

Apache Spark REPL

Apache Spark REPL is an interactive Scala shell that provides a command-line interface for Apache Spark. It allows users to interactively execute Spark code, explore data, run SQL queries, and perform distributed computing operations in real-time. The REPL extends the standard Scala interpreter with Spark-specific functionality, automatically creating a SparkSession and SparkContext, and providing seamless access to Spark's core APIs including RDDs, DataFrames, and Datasets.

Package Information

  • Package Name: spark-repl_2.11
  • Package Type: maven
  • Language: Scala
  • Installation: <dependency><groupId>org.apache.spark</groupId><artifactId>spark-repl_2.11</artifactId><version>2.4.8</version></dependency>

Core Imports

import org.apache.spark.repl._

For main entry point:

import org.apache.spark.repl.Main

For interactive loop:

import org.apache.spark.repl.SparkILoop

For custom class loading:

import org.apache.spark.repl.ExecutorClassLoader

Basic Usage

Command Line Usage

# Start Spark REPL
spark-shell

# Or via main class
scala -cp <spark-classpath> org.apache.spark.repl.Main

Programmatic Usage

import org.apache.spark.repl.SparkILoop
import scala.tools.nsc.Settings

// Execute code in REPL
val code = """
val data = sc.parallelize(1 to 10)
data.sum()
"""
val result = SparkILoop.run(code)

// Create custom REPL instance
val settings = new Settings
val repl = new SparkILoop()
repl.process(settings)

Architecture

Apache Spark REPL is built around several key components:

  • Main Entry Point: The Main object provides application entry point and SparkSession/SparkContext creation
  • Interactive Shell: SparkILoop extends Scala's standard REPL with Spark-specific initialization and commands
  • Distributed Class Loading: ExecutorClassLoader enables loading of REPL-compiled classes on remote executors
  • Signal Handling: Integration with Spark's job cancellation system for interactive interruption
  • Scala Version Support: Special handling for Scala 2.11 compatibility issues with imports and type inference

Capabilities

REPL Entry Point and Session Management

Main application entry point and SparkSession/SparkContext lifecycle management for the interactive shell.

object Main extends Logging {
  var sparkContext: SparkContext
  var sparkSession: SparkSession  
  var interp: SparkILoop
  val conf: SparkConf
  
  def main(args: Array[String]): Unit
  def createSparkSession(): SparkSession
}

REPL Entry Point

Interactive Shell Loop

Core interactive shell functionality with Spark-specific initialization, commands, and REPL processing.

class SparkILoop(in0: Option[BufferedReader], out: JPrintWriter) extends ILoop {
  def this(in0: BufferedReader, out: JPrintWriter)
  def this()
  
  def initializeSpark(): Unit
  def process(settings: Settings): Boolean
  override def createInterpreter(): Unit
  override def printWelcome(): Unit
  override def commands: List[LoopCommand]
  override def resetCommand(line: String): Unit
  override def replay(): Unit
}

object SparkILoop {
  def run(code: String, sets: Settings = new Settings): String  
  def run(lines: List[String]): String
}

Interactive Shell

Distributed Class Loading

Custom class loader system for loading REPL-compiled classes on remote Spark executors with support for RPC and Hadoop filesystem access.

class ExecutorClassLoader(
  conf: SparkConf,
  env: SparkEnv, 
  classUri: String,
  parent: ClassLoader,
  userClassPathFirst: Boolean
) extends ClassLoader with Logging {
  
  override def findClass(name: String): Class[_]
  def findClassLocally(name: String): Option[Class[_]]
  def readAndTransformClass(name: String, in: InputStream): Array[Byte]
  def urlEncode(str: String): String
  override def getResource(name: String): URL
  override def getResources(name: String): java.util.Enumeration[URL]
  override def getResourceAsStream(name: String): InputStream
}

Distributed Class Loading

Signal Handling

Signal handling utilities for interactive job cancellation and REPL interrupt management.

object Signaling extends Logging {
  def cancelOnInterrupt(): Unit
}

Signal Handling

Scala 2.11 Compatibility Components

Specialized interpreter and expression typing components for Scala 2.11 compatibility fixes.

class SparkILoopInterpreter(settings: Settings, out: JPrintWriter) extends IMain {
  def symbolOfLine(code: String): global.Symbol
  def typeOfExpression(expr: String, silent: Boolean): global.Type
  def importsCode(wanted: Set[Name], wrapper: Request#Wrapper, 
                  definesClass: Boolean, generousImports: Boolean): ComputedImports
}

trait SparkExprTyper extends ExprTyper {
  def doInterpret(code: String): IR.Result
  def symbolOfLine(code: String): Symbol
}

Scala 2.11 Compatibility

Types

Core Configuration Types

// From Spark Core
case class SparkConf()
class SparkContext
class SparkSession
class SparkEnv

// From Scala
class Settings extends scala.tools.nsc.Settings
class BufferedReader extends java.io.BufferedReader
class JPrintWriter extends scala.tools.nsc.interpreter.JPrintWriter

REPL-Specific Types

// REPL interpreter result types
object IR {
  sealed abstract class Result
  case object Success extends Result  
  case class Error(exception: Throwable) extends Result
  case object Incomplete extends Result
}

// Class loading types
trait ClassLoader extends java.lang.ClassLoader
trait Logging
Workspace
tessl
Visibility
Public
Created
Last updated
Describes
mavenpkg:maven/org.apache.spark/spark-repl_2.11@2.4.x
Publish Source
CLI
Badge
tessl/maven-org-apache-spark--spark-repl-2-11 badge