tessl/maven-org-apache-spark--spark-repl-2-11

Interactive Scala shell for Apache Spark with distributed computing capabilities

—

Pending

Overview

Eval results

Files

Apache Spark REPL

Name: tessl/maven-org-apache-spark--spark-repl-2-11
Author: tessl

Apache Spark REPL is an interactive Scala shell that provides a command-line interface for Apache Spark. It allows users to interactively execute Spark code, explore data, run SQL queries, and perform distributed computing operations in real-time. The REPL extends the standard Scala interpreter with Spark-specific functionality, automatically creating a SparkSession and SparkContext, and providing seamless access to Spark's core APIs including RDDs, DataFrames, and Datasets.

Package Information

Package Name: spark-repl_2.11
Package Type: maven
Language: Scala
Installation: <dependency><groupId>org.apache.spark</groupId><artifactId>spark-repl_2.11</artifactId><version>2.4.8</version></dependency>

Core Imports

import org.apache.spark.repl._

For main entry point:

import org.apache.spark.repl.Main

For interactive loop:

import org.apache.spark.repl.SparkILoop

For custom class loading:

import org.apache.spark.repl.ExecutorClassLoader

Basic Usage

Command Line Usage

# Start Spark REPL
spark-shell

# Or via main class
scala -cp <spark-classpath> org.apache.spark.repl.Main

Programmatic Usage

import org.apache.spark.repl.SparkILoop
import scala.tools.nsc.Settings

// Execute code in REPL
val code = """
val data = sc.parallelize(1 to 10)
data.sum()
"""
val result = SparkILoop.run(code)

// Create custom REPL instance
val settings = new Settings
val repl = new SparkILoop()
repl.process(settings)

Architecture

Apache Spark REPL is built around several key components:

Main Entry Point: The Main object provides application entry point and SparkSession/SparkContext creation
Interactive Shell: SparkILoop extends Scala's standard REPL with Spark-specific initialization and commands
Distributed Class Loading: ExecutorClassLoader enables loading of REPL-compiled classes on remote executors
Signal Handling: Integration with Spark's job cancellation system for interactive interruption
Scala Version Support: Special handling for Scala 2.11 compatibility issues with imports and type inference

Capabilities

REPL Entry Point and Session Management

Main application entry point and SparkSession/SparkContext lifecycle management for the interactive shell.

object Main extends Logging {
  var sparkContext: SparkContext
  var sparkSession: SparkSession  
  var interp: SparkILoop
  val conf: SparkConf
  
  def main(args: Array[String]): Unit
  def createSparkSession(): SparkSession
}

REPL Entry Point

Interactive Shell Loop

Core interactive shell functionality with Spark-specific initialization, commands, and REPL processing.

class SparkILoop(in0: Option[BufferedReader], out: JPrintWriter) extends ILoop {
  def this(in0: BufferedReader, out: JPrintWriter)
  def this()
  
  def initializeSpark(): Unit
  def process(settings: Settings): Boolean
  override def createInterpreter(): Unit
  override def printWelcome(): Unit
  override def commands: List[LoopCommand]
  override def resetCommand(line: String): Unit
  override def replay(): Unit
}

object SparkILoop {
  def run(code: String, sets: Settings = new Settings): String  
  def run(lines: List[String]): String
}

Interactive Shell

Distributed Class Loading

Custom class loader system for loading REPL-compiled classes on remote Spark executors with support for RPC and Hadoop filesystem access.

class ExecutorClassLoader(
  conf: SparkConf,
  env: SparkEnv, 
  classUri: String,
  parent: ClassLoader,
  userClassPathFirst: Boolean
) extends ClassLoader with Logging {
  
  override def findClass(name: String): Class[_]
  def findClassLocally(name: String): Option[Class[_]]
  def readAndTransformClass(name: String, in: InputStream): Array[Byte]
  def urlEncode(str: String): String
  override def getResource(name: String): URL
  override def getResources(name: String): java.util.Enumeration[URL]
  override def getResourceAsStream(name: String): InputStream
}

Distributed Class Loading

Signal Handling

Signal handling utilities for interactive job cancellation and REPL interrupt management.

object Signaling extends Logging {
  def cancelOnInterrupt(): Unit
}

Signal Handling

Scala 2.11 Compatibility Components

Specialized interpreter and expression typing components for Scala 2.11 compatibility fixes.

class SparkILoopInterpreter(settings: Settings, out: JPrintWriter) extends IMain {
  def symbolOfLine(code: String): global.Symbol
  def typeOfExpression(expr: String, silent: Boolean): global.Type
  def importsCode(wanted: Set[Name], wrapper: Request#Wrapper, 
                  definesClass: Boolean, generousImports: Boolean): ComputedImports
}

trait SparkExprTyper extends ExprTyper {
  def doInterpret(code: String): IR.Result
  def symbolOfLine(code: String): Symbol
}

Scala 2.11 Compatibility

Types

Core Configuration Types

// From Spark Core
case class SparkConf()
class SparkContext
class SparkSession
class SparkEnv

// From Scala
class Settings extends scala.tools.nsc.Settings
class BufferedReader extends java.io.BufferedReader
class JPrintWriter extends scala.tools.nsc.interpreter.JPrintWriter

REPL-Specific Types

// REPL interpreter result types
object IR {
  sealed abstract class Result
  case object Success extends Result  
  case class Error(exception: Throwable) extends Result
  case object Incomplete extends Result
}

// Class loading types
trait ClassLoader extends java.lang.ClassLoader
trait Logging

Install with Tessl CLI

npx tessl i tessl/maven-org-apache-spark--spark-repl-2-11

Workspace: tessl
Visibility: Public
Created: 6 months ago
Last updated: about 1 month ago
Describes: pkg:maven/org.apache.spark/spark-repl_2.11@2.4.x
Badge