CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/maven-org-apache-spark--spark-repl-2-11

Interactive Scala shell for Apache Spark with distributed computing capabilities

Pending
Overview
Eval results
Files

interactive-shell.mddocs/

Interactive Shell

Core interactive shell functionality with Spark-specific initialization, commands, and REPL processing.

Capabilities

SparkILoop Class

Spark-specific interactive shell loop that extends Scala's standard ILoop with Spark initialization and custom behavior.

/**
 * A Spark-specific interactive shell extending Scala's ILoop
 * Provides automatic Spark context/session creation and initialization
 */
class SparkILoop(in0: Option[BufferedReader], out: JPrintWriter) extends ILoop(in0, out) {
  /**
   * Alternative constructor with BufferedReader
   * @param in0 Input reader for REPL commands
   * @param out Output writer for REPL responses
   */
  def this(in0: BufferedReader, out: JPrintWriter)
  
  /**
   * Default constructor using console I/O
   */
  def this()
  
  /**
   * Initialize Spark context and session in the REPL environment
   * Executes initialization commands to create 'spark' and 'sc' variables
   * Imports common Spark APIs automatically
   */
  def initializeSpark(): Unit
  
  /**
   * Main REPL processing loop
   * Handles startup, interpreter creation, and command processing
   * @param settings Scala compiler settings
   * @return true if processing completed successfully
   */  
  def process(settings: Settings): Boolean
  
  /**
   * Create the Scala interpreter with Spark-specific customizations
   * Uses SparkILoopInterpreter for Scala 2.11 compatibility
   */
  override def createInterpreter(): Unit
  
  /** Print Spark welcome message with version info */
  override def printWelcome(): Unit
  
  /** Available REPL commands (uses standard commands) */
  override def commands: List[LoopCommand]
  
  /**
   * Handle :reset command
   * Preserves SparkSession and SparkContext state after reset
   * @param line Command line input
   */
  override def resetCommand(line: String): Unit
  
  /** Replay command history with Spark re-initialization */
  override def replay(): Unit
}

Usage Examples:

import org.apache.spark.repl.SparkILoop
import java.io.{BufferedReader, StringReader, PrintWriter, StringWriter}

// Create REPL with custom I/O
val input = new BufferedReader(new StringReader("val data = sc.parallelize(1 to 10)\ndata.sum()"))
val output = new StringWriter()
val repl = new SparkILoop(input, new PrintWriter(output))

// Process with default settings
import scala.tools.nsc.Settings
val settings = new Settings
repl.process(settings)

// Access output
val result = output.toString

SparkILoop Companion Object

Utility methods for running code in REPL instances programmatically.

object SparkILoop {
  /**
   * Creates an interpreter loop with default settings and feeds
   * the given code to it as input
   * @param code Scala code to execute
   * @param sets Scala compiler settings (optional)
   * @return String output from REPL execution
   */
  def run(code: String, sets: Settings = new Settings): String
  
  /**
   * Run multiple lines of code in REPL
   * @param lines List of code lines to execute
   * @return String output from REPL execution  
   */
  def run(lines: List[String]): String
}

Usage Examples:

// Execute single code block
val result = SparkILoop.run("""
  val rdd = sc.parallelize(1 to 100)
  rdd.filter(_ % 2 == 0).count()
""")

// Execute multiple lines
val lines = List(
  "val data = sc.parallelize(1 to 10)",
  "val doubled = data.map(_ * 2)", 
  "doubled.collect()"
)
val output = SparkILoop.run(lines)

Initialization Commands

Pre-defined commands executed during REPL startup to set up Spark environment.

/**
 * Commands run automatically during REPL initialization
 * Creates 'spark' and 'sc' variables and imports common APIs
 */
val initializationCommands: Seq[String]

The initialization commands include:

  1. SparkSession Creation: Creates spark variable
  2. SparkContext Access: Creates sc variable
  3. Standard Imports: Imports SparkContext implicits, SQL functions, etc.
  4. UI Information: Displays Spark UI URL
// Actual initialization commands:
"""
@transient val spark = if (org.apache.spark.repl.Main.sparkSession != null) {
    org.apache.spark.repl.Main.sparkSession
  } else {
    org.apache.spark.repl.Main.createSparkSession()
  }
@transient val sc = {
  val _sc = spark.sparkContext
  // UI URL display logic
  _sc
}
"""
"import org.apache.spark.SparkContext._"
"import spark.implicits._" 
"import spark.sql"
"import org.apache.spark.sql.functions._"

REPL Customizations

Welcome Message

Custom Spark ASCII art welcome message with version information:

Welcome to
      ____              __  
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.4.8
      /_/
      
Using Scala 2.11.12 (OpenJDK 64-Bit Server VM, Java 1.8.0_275)
Type in expressions to have them evaluated.
Type :help for more information.

Scala Version Compatibility

Special handling for Scala 2.11 compatibility issues:

  • Uses SparkILoopInterpreter for Scala 2.11 to fix import handling bugs
  • Manages context classloader correctly for thread safety
  • Custom process method to ensure proper initialization order

Command Processing

Enhanced command processing with Spark-specific features:

  • Reset Command: Preserves Spark session state across resets
  • Replay Command: Re-initializes Spark environment during replay
  • Help System: Standard Scala REPL help with Spark context

Error Handling

Interpreter Errors

if (!intp.reporter.hasErrors) {
  // Proceed with initialization
} else {
  throw new RuntimeException(s"Scala $versionString interpreter encountered errors during initialization")
}

Context Classloader Management

Special handling for Scala 2.11 classloader bugs:

private def runClosure(body: => Boolean): Boolean = {
  if (isScala2_11) {
    val original = Thread.currentThread().getContextClassLoader
    try {
      body
    } finally {
      Thread.currentThread().setContextClassLoader(original)
    }
  } else {
    body
  }
}

Integration Features

Auto-Import System

Automatic import of commonly used Spark APIs:

  • SparkContext._: RDD operations and implicits
  • spark.implicits._: Dataset/DataFrame encoders
  • spark.sql: SQL interface access
  • org.apache.spark.sql.functions._: SQL functions

UI Integration

Automatic display of Spark UI information:

  • Detects reverse proxy configuration
  • Shows appropriate UI URLs based on deployment
  • Displays master URL and application ID

File Loading

Support for loading Scala files during startup:

  • :load command support for script files
  • :paste command support for code blocks
  • Integration with Scala compiler settings

Install with Tessl CLI

npx tessl i tessl/maven-org-apache-spark--spark-repl-2-11

docs

class-loading.md

index.md

interactive-shell.md

main-entry.md

scala-compatibility.md

signal-handling.md

tile.json