Interactive Scala shell for Apache Spark with distributed computing capabilities
—
Pending
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Pending
The risk profile of this skill
Apache Spark REPL is an interactive Scala shell that provides a command-line interface for Apache Spark. It allows users to interactively execute Spark code, explore data, run SQL queries, and perform distributed computing operations in real-time. The REPL extends the standard Scala interpreter with Spark-specific functionality, automatically creating a SparkSession and SparkContext, and providing seamless access to Spark's core APIs including RDDs, DataFrames, and Datasets.
<dependency><groupId>org.apache.spark</groupId><artifactId>spark-repl_2.11</artifactId><version>2.4.8</version></dependency>import org.apache.spark.repl._For main entry point:
import org.apache.spark.repl.MainFor interactive loop:
import org.apache.spark.repl.SparkILoopFor custom class loading:
import org.apache.spark.repl.ExecutorClassLoader# Start Spark REPL
spark-shell
# Or via main class
scala -cp <spark-classpath> org.apache.spark.repl.Mainimport org.apache.spark.repl.SparkILoop
import scala.tools.nsc.Settings
// Execute code in REPL
val code = """
val data = sc.parallelize(1 to 10)
data.sum()
"""
val result = SparkILoop.run(code)
// Create custom REPL instance
val settings = new Settings
val repl = new SparkILoop()
repl.process(settings)Apache Spark REPL is built around several key components:
Main object provides application entry point and SparkSession/SparkContext creationSparkILoop extends Scala's standard REPL with Spark-specific initialization and commandsExecutorClassLoader enables loading of REPL-compiled classes on remote executorsMain application entry point and SparkSession/SparkContext lifecycle management for the interactive shell.
object Main extends Logging {
var sparkContext: SparkContext
var sparkSession: SparkSession
var interp: SparkILoop
val conf: SparkConf
def main(args: Array[String]): Unit
def createSparkSession(): SparkSession
}Core interactive shell functionality with Spark-specific initialization, commands, and REPL processing.
class SparkILoop(in0: Option[BufferedReader], out: JPrintWriter) extends ILoop {
def this(in0: BufferedReader, out: JPrintWriter)
def this()
def initializeSpark(): Unit
def process(settings: Settings): Boolean
override def createInterpreter(): Unit
override def printWelcome(): Unit
override def commands: List[LoopCommand]
override def resetCommand(line: String): Unit
override def replay(): Unit
}
object SparkILoop {
def run(code: String, sets: Settings = new Settings): String
def run(lines: List[String]): String
}Custom class loader system for loading REPL-compiled classes on remote Spark executors with support for RPC and Hadoop filesystem access.
class ExecutorClassLoader(
conf: SparkConf,
env: SparkEnv,
classUri: String,
parent: ClassLoader,
userClassPathFirst: Boolean
) extends ClassLoader with Logging {
override def findClass(name: String): Class[_]
def findClassLocally(name: String): Option[Class[_]]
def readAndTransformClass(name: String, in: InputStream): Array[Byte]
def urlEncode(str: String): String
override def getResource(name: String): URL
override def getResources(name: String): java.util.Enumeration[URL]
override def getResourceAsStream(name: String): InputStream
}Signal handling utilities for interactive job cancellation and REPL interrupt management.
object Signaling extends Logging {
def cancelOnInterrupt(): Unit
}Specialized interpreter and expression typing components for Scala 2.11 compatibility fixes.
class SparkILoopInterpreter(settings: Settings, out: JPrintWriter) extends IMain {
def symbolOfLine(code: String): global.Symbol
def typeOfExpression(expr: String, silent: Boolean): global.Type
def importsCode(wanted: Set[Name], wrapper: Request#Wrapper,
definesClass: Boolean, generousImports: Boolean): ComputedImports
}
trait SparkExprTyper extends ExprTyper {
def doInterpret(code: String): IR.Result
def symbolOfLine(code: String): Symbol
}// From Spark Core
case class SparkConf()
class SparkContext
class SparkSession
class SparkEnv
// From Scala
class Settings extends scala.tools.nsc.Settings
class BufferedReader extends java.io.BufferedReader
class JPrintWriter extends scala.tools.nsc.interpreter.JPrintWriter// REPL interpreter result types
object IR {
sealed abstract class Result
case object Success extends Result
case class Error(exception: Throwable) extends Result
case object Incomplete extends Result
}
// Class loading types
trait ClassLoader extends java.lang.ClassLoader
trait Logging