or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

auto-completion.mdclass-loading.mdcode-interpretation.mdcommand-line.mdindex.mdinteractive-shell.md
tile.json

tessl/maven-org-apache-spark--spark-repl-2-10

Interactive Scala shell for Apache Spark with distributed computing capabilities

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
mavenpkg:maven/org.apache.spark/spark-repl_2.10@1.6.x

To install, run

npx @tessl/cli install tessl/maven-org-apache-spark--spark-repl-2-10@1.6.0

index.mddocs/

Apache Spark REPL

The Apache Spark REPL (Read-Eval-Print Loop) provides an interactive Scala shell specifically designed for Apache Spark. It enables users to interactively execute Spark operations, explore datasets, and prototype distributed computing solutions in real-time with automatic SparkContext initialization and seamless integration with Spark's core APIs.

Package Information

  • Package Name: spark-repl_2.10
  • Package Type: maven
  • Language: Scala (with Java interoperability)
  • Installation: Add dependency to your Maven pom.xml:
<dependency>
  <groupId>org.apache.spark</groupId>
  <artifactId>spark-repl_2.10</artifactId>
  <version>1.6.3</version>
</dependency>

For SBT:

libraryDependencies += "org.apache.spark" %% "spark-repl" % "1.6.3"

Core Imports

import org.apache.spark.repl.Main
import org.apache.spark.repl.SparkILoop
import org.apache.spark.repl.SparkIMain

For advanced usage:

import org.apache.spark.repl.{SparkCommandLine, ExecutorClassLoader}
import org.apache.spark.repl.SparkJLineCompletion

Basic Usage

Starting the REPL

import org.apache.spark.repl.Main

// Start interactive REPL
Main.main(Array.empty)

// Access current interpreter
val interpreter = Main.interp

Programmatic Code Execution

import org.apache.spark.repl.SparkIMain
import scala.tools.nsc.interpreter.Results

// Create interpreter
val interpreter = new SparkIMain()
interpreter.initializeSynchronous()

// Execute Scala code
val result = interpreter.interpret("val x = 42")
result match {
  case Results.Success => println("Code executed successfully")
  case Results.Error => println("Execution failed")
  case Results.Incomplete => println("Code incomplete")
}

// Bind values
interpreter.bind("myValue", "String", "Hello World")

// Add imports
interpreter.addImports("scala.collection.mutable._")

Custom REPL Loop

import org.apache.spark.repl.SparkILoop
import java.io.{BufferedReader, InputStreamReader, PrintWriter}

// Create custom REPL
val in = new BufferedReader(new InputStreamReader(System.in))
val out = new PrintWriter(System.out, true)
val repl = new SparkILoop(in, out)

// Process with arguments
repl.process(Array("-i", "init.scala"))

Architecture

The Spark REPL is built around several key components:

  • Main Entry Point: Main object provides the primary application entry point and manages the global interpreter instance
  • Interactive Loop: SparkILoop handles user interaction, command processing, and session management
  • Code Interpreter: SparkIMain performs Scala code compilation and execution with Spark integration
  • Distributed Class Loading: ExecutorClassLoader enables loading of REPL-defined classes across Spark clusters
  • Command Line Processing: SparkCommandLine handles Spark-specific command line options and settings
  • Auto-completion: SparkJLineCompletion provides intelligent tab completion for Scala code

Capabilities

Interactive Shell Management

Core REPL loop functionality for interactive Scala development with Spark integration. Provides command processing, prompt customization, and session management.

class SparkILoop(
  in0: Option[BufferedReader], 
  out: JPrintWriter, 
  master: Option[String]
)

def process(args: Array[String]): Boolean
def setPrompt(prompt: String): Unit
def prompt: String
def commands: List[LoopCommand]

Interactive Shell

Code Interpretation and Execution

Scala code compilation and execution engine with Spark context integration. Handles code parsing, compilation, binding, and result evaluation.

class SparkIMain(
  initialSettings: Settings, 
  out: JPrintWriter, 
  propagateExceptions: Boolean = false
)

def interpret(line: String): Results.Result
def bind(name: String, boundType: String, value: Any, modifiers: List[String] = Nil): Results.Result
def addImports(ids: String*): Results.Result
def compileString(code: String): Boolean

Code Interpretation

Distributed Class Loading

ClassLoader implementation for loading REPL-defined classes from Hadoop FileSystem or HTTP URIs, enabling distributed execution of user-defined code across Spark clusters.

class ExecutorClassLoader(
  conf: SparkConf, 
  classUri: String, 
  parent: ClassLoader, 
  userClassPathFirst: Boolean
)

def findClass(name: String): Class[_]
def findClassLocally(name: String): Option[Class[_]]

Distributed Class Loading

Command Line Configuration

Command line option handling and settings management for Spark-specific REPL configurations and compiler settings.

class SparkCommandLine(
  args: List[String], 
  override val settings: Settings
)

val settings: Settings

Command Line Configuration

Auto-completion System

Intelligent tab completion system for Scala code within the REPL environment, providing context-aware suggestions for methods, variables, and types.

class SparkJLineCompletion(val intp: SparkIMain)

def completer(): ScalaCompleter
var verbosity: Int
def resetVerbosity(): Unit

Auto-completion