Interactive Scala shell for Apache Spark with distributed computing capabilities
npx @tessl/cli install tessl/maven-org-apache-spark--spark-repl-2-10@1.6.0The Apache Spark REPL (Read-Eval-Print Loop) provides an interactive Scala shell specifically designed for Apache Spark. It enables users to interactively execute Spark operations, explore datasets, and prototype distributed computing solutions in real-time with automatic SparkContext initialization and seamless integration with Spark's core APIs.
pom.xml:<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-repl_2.10</artifactId>
<version>1.6.3</version>
</dependency>For SBT:
libraryDependencies += "org.apache.spark" %% "spark-repl" % "1.6.3"import org.apache.spark.repl.Main
import org.apache.spark.repl.SparkILoop
import org.apache.spark.repl.SparkIMainFor advanced usage:
import org.apache.spark.repl.{SparkCommandLine, ExecutorClassLoader}
import org.apache.spark.repl.SparkJLineCompletionimport org.apache.spark.repl.Main
// Start interactive REPL
Main.main(Array.empty)
// Access current interpreter
val interpreter = Main.interpimport org.apache.spark.repl.SparkIMain
import scala.tools.nsc.interpreter.Results
// Create interpreter
val interpreter = new SparkIMain()
interpreter.initializeSynchronous()
// Execute Scala code
val result = interpreter.interpret("val x = 42")
result match {
case Results.Success => println("Code executed successfully")
case Results.Error => println("Execution failed")
case Results.Incomplete => println("Code incomplete")
}
// Bind values
interpreter.bind("myValue", "String", "Hello World")
// Add imports
interpreter.addImports("scala.collection.mutable._")import org.apache.spark.repl.SparkILoop
import java.io.{BufferedReader, InputStreamReader, PrintWriter}
// Create custom REPL
val in = new BufferedReader(new InputStreamReader(System.in))
val out = new PrintWriter(System.out, true)
val repl = new SparkILoop(in, out)
// Process with arguments
repl.process(Array("-i", "init.scala"))The Spark REPL is built around several key components:
Main object provides the primary application entry point and manages the global interpreter instanceSparkILoop handles user interaction, command processing, and session managementSparkIMain performs Scala code compilation and execution with Spark integrationExecutorClassLoader enables loading of REPL-defined classes across Spark clustersSparkCommandLine handles Spark-specific command line options and settingsSparkJLineCompletion provides intelligent tab completion for Scala codeCore REPL loop functionality for interactive Scala development with Spark integration. Provides command processing, prompt customization, and session management.
class SparkILoop(
in0: Option[BufferedReader],
out: JPrintWriter,
master: Option[String]
)
def process(args: Array[String]): Boolean
def setPrompt(prompt: String): Unit
def prompt: String
def commands: List[LoopCommand]Scala code compilation and execution engine with Spark context integration. Handles code parsing, compilation, binding, and result evaluation.
class SparkIMain(
initialSettings: Settings,
out: JPrintWriter,
propagateExceptions: Boolean = false
)
def interpret(line: String): Results.Result
def bind(name: String, boundType: String, value: Any, modifiers: List[String] = Nil): Results.Result
def addImports(ids: String*): Results.Result
def compileString(code: String): BooleanClassLoader implementation for loading REPL-defined classes from Hadoop FileSystem or HTTP URIs, enabling distributed execution of user-defined code across Spark clusters.
class ExecutorClassLoader(
conf: SparkConf,
classUri: String,
parent: ClassLoader,
userClassPathFirst: Boolean
)
def findClass(name: String): Class[_]
def findClassLocally(name: String): Option[Class[_]]Command line option handling and settings management for Spark-specific REPL configurations and compiler settings.
class SparkCommandLine(
args: List[String],
override val settings: Settings
)
val settings: SettingsIntelligent tab completion system for Scala code within the REPL environment, providing context-aware suggestions for methods, variables, and types.
class SparkJLineCompletion(val intp: SparkIMain)
def completer(): ScalaCompleter
var verbosity: Int
def resetVerbosity(): Unit