Apache Spark interactive Scala shell (REPL) for Scala 2.10 providing read-eval-print loop interface for interactive data analysis and cluster computing.
—
Pending
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Pending
The risk profile of this skill
Apache Spark REPL provides an interactive Scala shell environment for distributed data processing and cluster computing. It offers a read-eval-print loop interface that enables users to interactively execute Scala code against Spark clusters, providing real-time data analysis capabilities with automatic SparkContext and SparkSession initialization.
mvn dependency:get -Dartifact=org.apache.spark:spark-repl_2.10:2.2.3import org.apache.spark.repl.{Main, SparkILoop, SparkIMain}
import org.apache.spark.repl.SparkJLineCompletion
import org.apache.spark.repl.{ExecutorClassLoader, Signaling}
import scala.tools.nsc.interpreter.JPrintWriter
import scala.tools.nsc.Settings
import java.io.BufferedReaderimport org.apache.spark.repl.{Main, SparkILoop}
// Start REPL with default settings
Main.main(Array())
// Or create custom REPL instance
val repl = new SparkILoop()
repl.process(Array("-master", "local[*]"))import org.apache.spark.repl.SparkIMain
import scala.tools.nsc.interpreter.{Results => IR}
val interpreter = new SparkIMain()
interpreter.initializeSynchronous()
// Execute Scala code
val result = interpreter.interpret("val x = 1 + 1")
val value = interpreter.valueOfTerm("x")The Spark REPL is built around several key components:
Main object provides the application entry point and global interpreter accessSparkILoop class manages the read-eval-print cycle with Spark-specific featuresSparkIMain class handles code compilation, execution, and state managementSparkJLineCompletion provides auto-completion functionality using JLineExecutorClassLoader enables loading REPL-defined classes on Spark executorsSignaling utilities for proper interrupt handling and job cancellationMain REPL functionality providing an interactive shell with Spark integration, command processing, and session management.
object Main {
def main(args: Array[String]): Unit
def interp: SparkILoop
def interp_=(i: SparkILoop): Unit
}
@DeveloperApi
class SparkILoop(
in0: Option[BufferedReader] = None,
out: JPrintWriter = new JPrintWriter(Console.out, true),
master: Option[String] = None
) {
def process(args: Array[String]): Boolean
def createSparkSession(): SparkSession
var sparkContext: SparkContext
}Core interpreter providing code compilation, execution, variable management, and introspection capabilities for interactive Scala code evaluation.
@DeveloperApi
class SparkIMain(
initialSettings: Settings,
out: JPrintWriter,
propagateExceptions: Boolean = false
) {
def initializeSynchronous(): Unit
def interpret(line: String): IR.Result
def compileSources(sources: SourceFile*): Boolean
def compileString(code: String): Boolean
}Interactive code completion functionality providing context-aware suggestions and symbol completion using JLine integration.
@DeveloperApi
class SparkJLineCompletion(val intp: SparkIMain) {
var verbosity: Int
def resetVerbosity(): Unit
def completer(): JLineTabCompletion
}Class loading system for distributing REPL-compiled classes to Spark executors across the cluster from various sources including HTTP, Hadoop FS, and Spark RPC.
class ExecutorClassLoader(
conf: SparkConf,
env: SparkEnv,
classUri: String,
parent: ClassLoader,
userClassPathFirst: Boolean
) extends ClassLoader {
override def findClass(name: String): Class[_]
def readAndTransformClass(name: String, in: InputStream): Array[Byte]
}// Result enumeration for code interpretation
object IR {
sealed abstract class Result
case object Success extends Result
case object Error extends Result
case object Incomplete extends Result
}
// Command line settings
@DeveloperApi
class SparkRunnerSettings(error: String => Unit) extends Settings {
val loadfiles: MultiStringSetting
}
// Signal handling utilities
object Signaling {
def cancelOnInterrupt(): Unit
}