tessl/maven-org-apache-spark--spark-repl-2-11

Interactive Scala shell for Apache Spark with distributed computing capabilities

—

Pending

Overview

Eval results

Files

Interactive Shell

Name: tessl/maven-org-apache-spark--spark-repl-2-11
Author: tessl

Core interactive shell functionality with Spark-specific initialization, commands, and REPL processing.

Capabilities

SparkILoop Class

Spark-specific interactive shell loop that extends Scala's standard ILoop with Spark initialization and custom behavior.

/**
 * A Spark-specific interactive shell extending Scala's ILoop
 * Provides automatic Spark context/session creation and initialization
 */
class SparkILoop(in0: Option[BufferedReader], out: JPrintWriter) extends ILoop(in0, out) {
  /**
   * Alternative constructor with BufferedReader
   * @param in0 Input reader for REPL commands
   * @param out Output writer for REPL responses
   */
  def this(in0: BufferedReader, out: JPrintWriter)
  
  /**
   * Default constructor using console I/O
   */
  def this()
  
  /**
   * Initialize Spark context and session in the REPL environment
   * Executes initialization commands to create 'spark' and 'sc' variables
   * Imports common Spark APIs automatically
   */
  def initializeSpark(): Unit
  
  /**
   * Main REPL processing loop
   * Handles startup, interpreter creation, and command processing
   * @param settings Scala compiler settings
   * @return true if processing completed successfully
   */  
  def process(settings: Settings): Boolean
  
  /**
   * Create the Scala interpreter with Spark-specific customizations
   * Uses SparkILoopInterpreter for Scala 2.11 compatibility
   */
  override def createInterpreter(): Unit
  
  /** Print Spark welcome message with version info */
  override def printWelcome(): Unit
  
  /** Available REPL commands (uses standard commands) */
  override def commands: List[LoopCommand]
  
  /**
   * Handle :reset command
   * Preserves SparkSession and SparkContext state after reset
   * @param line Command line input
   */
  override def resetCommand(line: String): Unit
  
  /** Replay command history with Spark re-initialization */
  override def replay(): Unit
}

Usage Examples:

import org.apache.spark.repl.SparkILoop
import java.io.{BufferedReader, StringReader, PrintWriter, StringWriter}

// Create REPL with custom I/O
val input = new BufferedReader(new StringReader("val data = sc.parallelize(1 to 10)\ndata.sum()"))
val output = new StringWriter()
val repl = new SparkILoop(input, new PrintWriter(output))

// Process with default settings
import scala.tools.nsc.Settings
val settings = new Settings
repl.process(settings)

// Access output
val result = output.toString

SparkILoop Companion Object

Utility methods for running code in REPL instances programmatically.

object SparkILoop {
  /**
   * Creates an interpreter loop with default settings and feeds
   * the given code to it as input
   * @param code Scala code to execute
   * @param sets Scala compiler settings (optional)
   * @return String output from REPL execution
   */
  def run(code: String, sets: Settings = new Settings): String
  
  /**
   * Run multiple lines of code in REPL
   * @param lines List of code lines to execute
   * @return String output from REPL execution  
   */
  def run(lines: List[String]): String
}

Usage Examples:

// Execute single code block
val result = SparkILoop.run("""
  val rdd = sc.parallelize(1 to 100)
  rdd.filter(_ % 2 == 0).count()
""")

// Execute multiple lines
val lines = List(
  "val data = sc.parallelize(1 to 10)",
  "val doubled = data.map(_ * 2)", 
  "doubled.collect()"
)
val output = SparkILoop.run(lines)

Initialization Commands

Pre-defined commands executed during REPL startup to set up Spark environment.

/**
 * Commands run automatically during REPL initialization
 * Creates 'spark' and 'sc' variables and imports common APIs
 */
val initializationCommands: Seq[String]

The initialization commands include:

SparkSession Creation: Creates spark variable
SparkContext Access: Creates sc variable
Standard Imports: Imports SparkContext implicits, SQL functions, etc.
UI Information: Displays Spark UI URL

// Actual initialization commands:
"""
@transient val spark = if (org.apache.spark.repl.Main.sparkSession != null) {
    org.apache.spark.repl.Main.sparkSession
  } else {
    org.apache.spark.repl.Main.createSparkSession()
  }
@transient val sc = {
  val _sc = spark.sparkContext
  // UI URL display logic
  _sc
}
"""
"import org.apache.spark.SparkContext._"
"import spark.implicits._" 
"import spark.sql"
"import org.apache.spark.sql.functions._"

REPL Customizations

Welcome Message

Custom Spark ASCII art welcome message with version information:

Welcome to
      ____              __  
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.4.8
      /_/
      
Using Scala 2.11.12 (OpenJDK 64-Bit Server VM, Java 1.8.0_275)
Type in expressions to have them evaluated.
Type :help for more information.

Scala Version Compatibility

Special handling for Scala 2.11 compatibility issues:

Uses SparkILoopInterpreter for Scala 2.11 to fix import handling bugs
Manages context classloader correctly for thread safety
Custom process method to ensure proper initialization order

Command Processing

Enhanced command processing with Spark-specific features:

Reset Command: Preserves Spark session state across resets
Replay Command: Re-initializes Spark environment during replay
Help System: Standard Scala REPL help with Spark context

Error Handling

Interpreter Errors

if (!intp.reporter.hasErrors) {
  // Proceed with initialization
} else {
  throw new RuntimeException(s"Scala $versionString interpreter encountered errors during initialization")
}

Context Classloader Management

Special handling for Scala 2.11 classloader bugs:

private def runClosure(body: => Boolean): Boolean = {
  if (isScala2_11) {
    val original = Thread.currentThread().getContextClassLoader
    try {
      body
    } finally {
      Thread.currentThread().setContextClassLoader(original)
    }
  } else {
    body
  }
}

Integration Features

Auto-Import System

Automatic import of commonly used Spark APIs:

SparkContext._: RDD operations and implicits
spark.implicits._: Dataset/DataFrame encoders
spark.sql: SQL interface access
org.apache.spark.sql.functions._: SQL functions

UI Integration

Automatic display of Spark UI information:

Detects reverse proxy configuration
Shows appropriate UI URLs based on deployment
Displays master URL and application ID

File Loading

Support for loading Scala files during startup:

:load command support for script files
:paste command support for code blocks
Integration with Scala compiler settings

Install with Tessl CLI

npx tessl i tessl/maven-org-apache-spark--spark-repl-2-11

docs

scala-compatibility.md

signal-handling.md

tile.json

tessl/maven-org-apache-spark--spark-repl-2-11