Interactive Scala shell for Apache Spark with distributed computing capabilities
npx @tessl/cli install tessl/maven-org-apache-spark--spark-repl-2-11@2.4.0Apache Spark REPL is an interactive Scala shell that provides a command-line interface for Apache Spark. It allows users to interactively execute Spark code, explore data, run SQL queries, and perform distributed computing operations in real-time. The REPL extends the standard Scala interpreter with Spark-specific functionality, automatically creating a SparkSession and SparkContext, and providing seamless access to Spark's core APIs including RDDs, DataFrames, and Datasets.
<dependency><groupId>org.apache.spark</groupId><artifactId>spark-repl_2.11</artifactId><version>2.4.8</version></dependency>import org.apache.spark.repl._For main entry point:
import org.apache.spark.repl.MainFor interactive loop:
import org.apache.spark.repl.SparkILoopFor custom class loading:
import org.apache.spark.repl.ExecutorClassLoader# Start Spark REPL
spark-shell
# Or via main class
scala -cp <spark-classpath> org.apache.spark.repl.Mainimport org.apache.spark.repl.SparkILoop
import scala.tools.nsc.Settings
// Execute code in REPL
val code = """
val data = sc.parallelize(1 to 10)
data.sum()
"""
val result = SparkILoop.run(code)
// Create custom REPL instance
val settings = new Settings
val repl = new SparkILoop()
repl.process(settings)Apache Spark REPL is built around several key components:
Main object provides application entry point and SparkSession/SparkContext creationSparkILoop extends Scala's standard REPL with Spark-specific initialization and commandsExecutorClassLoader enables loading of REPL-compiled classes on remote executorsMain application entry point and SparkSession/SparkContext lifecycle management for the interactive shell.
object Main extends Logging {
var sparkContext: SparkContext
var sparkSession: SparkSession
var interp: SparkILoop
val conf: SparkConf
def main(args: Array[String]): Unit
def createSparkSession(): SparkSession
}Core interactive shell functionality with Spark-specific initialization, commands, and REPL processing.
class SparkILoop(in0: Option[BufferedReader], out: JPrintWriter) extends ILoop {
def this(in0: BufferedReader, out: JPrintWriter)
def this()
def initializeSpark(): Unit
def process(settings: Settings): Boolean
override def createInterpreter(): Unit
override def printWelcome(): Unit
override def commands: List[LoopCommand]
override def resetCommand(line: String): Unit
override def replay(): Unit
}
object SparkILoop {
def run(code: String, sets: Settings = new Settings): String
def run(lines: List[String]): String
}Custom class loader system for loading REPL-compiled classes on remote Spark executors with support for RPC and Hadoop filesystem access.
class ExecutorClassLoader(
conf: SparkConf,
env: SparkEnv,
classUri: String,
parent: ClassLoader,
userClassPathFirst: Boolean
) extends ClassLoader with Logging {
override def findClass(name: String): Class[_]
def findClassLocally(name: String): Option[Class[_]]
def readAndTransformClass(name: String, in: InputStream): Array[Byte]
def urlEncode(str: String): String
override def getResource(name: String): URL
override def getResources(name: String): java.util.Enumeration[URL]
override def getResourceAsStream(name: String): InputStream
}Signal handling utilities for interactive job cancellation and REPL interrupt management.
object Signaling extends Logging {
def cancelOnInterrupt(): Unit
}Specialized interpreter and expression typing components for Scala 2.11 compatibility fixes.
class SparkILoopInterpreter(settings: Settings, out: JPrintWriter) extends IMain {
def symbolOfLine(code: String): global.Symbol
def typeOfExpression(expr: String, silent: Boolean): global.Type
def importsCode(wanted: Set[Name], wrapper: Request#Wrapper,
definesClass: Boolean, generousImports: Boolean): ComputedImports
}
trait SparkExprTyper extends ExprTyper {
def doInterpret(code: String): IR.Result
def symbolOfLine(code: String): Symbol
}// From Spark Core
case class SparkConf()
class SparkContext
class SparkSession
class SparkEnv
// From Scala
class Settings extends scala.tools.nsc.Settings
class BufferedReader extends java.io.BufferedReader
class JPrintWriter extends scala.tools.nsc.interpreter.JPrintWriter// REPL interpreter result types
object IR {
sealed abstract class Result
case object Success extends Result
case class Error(exception: Throwable) extends Result
case object Incomplete extends Result
}
// Class loading types
trait ClassLoader extends java.lang.ClassLoader
trait Logging