Interactive Scala shell for Apache Spark with distributed computing capabilities
—
Main application entry point and SparkSession/SparkContext lifecycle management for the interactive shell.
Entry point object that manages the REPL application lifecycle and Spark session creation.
/**
* Main entry point for the Spark REPL application
* Manages SparkContext, SparkSession, and SparkILoop instances
*/
object Main extends Logging {
/** Spark configuration object */
val conf: SparkConf
/** Current Spark context (mutable, null initially) */
var sparkContext: SparkContext
/** Current Spark session (mutable, null initially) */
var sparkSession: SparkSession
/** Current interpreter instance (mutable, used by tests) */
var interp: SparkILoop
/**
* Main application entry point
* @param args Command line arguments passed to the REPL
*/
def main(args: Array[String]): Unit
/**
* Creates and configures a new SparkSession with proper catalog support
* Automatically determines Hive support based on classpath availability
* @return Configured SparkSession instance
*/
def createSparkSession(): SparkSession
}Usage Examples:
// Start REPL programmatically
org.apache.spark.repl.Main.main(Array("-classpath", "/path/to/jars"))
// Access current session
val session = org.apache.spark.repl.Main.sparkSession
val context = org.apache.spark.repl.Main.sparkContext
// Create new session
val newSession = org.apache.spark.repl.Main.createSparkSession()Internal main method used for testing and custom REPL initialization.
/**
* Internal main method used by tests and custom initialization
* @param args Command line arguments
* @param _interp Custom SparkILoop instance to use
*/
private[repl] def doMain(args: Array[String], _interp: SparkILoop): UnitThe Main object uses a pre-configured SparkConf instance with REPL-specific settings:
spark.repl.classdir: Directory for REPL class filesspark.repl.class.outputDir: Output directory for compiled classesspark.app.name: Default application name ("Spark shell")spark.executor.uri: Executor URI from environmentspark.home: Spark home directory from environmentREPL manages temporary directories for dynamically compiled classes:
val rootDir = conf.getOption("spark.repl.classdir").getOrElse(Utils.getLocalDir(conf))
val outputDir = Utils.createTempDir(root = rootDir, namePrefix = "repl")Automatic Hive catalog support based on classpath:
spark.sql.catalogImplementation=hive: Enables Hive supportspark.sql.catalogImplementation property// Shell session initialization failure
case e: Exception if isShellSession =>
logError("Failed to initialize Spark session.", e)
sys.exit(1)Invalid Scala compiler arguments are handled via error callback:
private def scalaOptionError(msg: String): Unit = {
hasErrors = true
Console.err.println(msg)
}SPARK_EXECUTOR_URI: Sets executor URI for distributed executionSPARK_HOME: Sets Spark installation directoryArguments are processed and passed to Scala interpreter:
val interpArguments = List(
"-Yrepl-class-based",
"-Yrepl-outdir", s"${outputDir.getAbsolutePath}",
"-classpath", jars
) ++ args.toListAutomatically registers interrupt signal handling:
Signaling.cancelOnInterrupt()Install with Tessl CLI
npx tessl i tessl/maven-org-apache-spark--spark-repl-2-11