or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/maven-org-apache-spark--spark-repl_2-12

Interactive Scala shell (REPL) component for Apache Spark providing real-time data processing capabilities and exploratory data analysis

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
mavenpkg:maven/org.apache.spark/spark-repl_2.12@3.5.x

To install, run

npx @tessl/cli install tessl/maven-org-apache-spark--spark-repl_2-12@3.5.0

0

# Apache Spark REPL

1

2

Apache Spark REPL provides an interactive Scala shell for Apache Spark, enabling developers to interactively explore data and execute Spark computations in a command-line environment. It integrates seamlessly with Spark's core functionality to provide real-time data processing capabilities and serves as both a learning tool and development environment for Spark applications.

3

4

## Package Information

5

6

- **Package Name**: org.apache.spark:spark-repl_2.12

7

- **Package Type**: Maven

8

- **Language**: Scala 2.12

9

- **Installation**: Include as Maven dependency or use via `spark-shell` command

10

- **Version**: 3.5.6

11

12

## Maven Dependency

13

14

```xml

15

<dependency>

16

<groupId>org.apache.spark</groupId>

17

<artifactId>spark-repl_2.12</artifactId>

18

<version>3.5.6</version>

19

</dependency>

20

```

21

22

## Core Imports

23

24

```scala

25

import org.apache.spark.repl.{Main, SparkILoop, Signaling}

26

import org.apache.spark.{SparkConf, SparkContext}

27

import org.apache.spark.sql.SparkSession

28

import scala.tools.nsc.Settings

29

import scala.tools.nsc.interpreter.JPrintWriter

30

import java.io.BufferedReader

31

```

32

33

## Basic Usage

34

35

### Starting the Interactive Shell

36

37

```scala

38

// Command-line usage (typical)

39

$ spark-shell

40

41

// Programmatic startup

42

import org.apache.spark.repl.Main

43

44

object MyApp {

45

def main(args: Array[String]): Unit = {

46

Main.main(args)

47

}

48

}

49

```

50

51

### Programmatic Code Execution

52

53

```scala

54

import org.apache.spark.repl.SparkILoop

55

56

// Execute code in REPL and capture output

57

val result = SparkILoop.run("""

58

val rdd = sc.parallelize(1 to 100)

59

val sum = rdd.sum()

60

println(s"Sum: $sum")

61

""")

62

63

// Execute multiple code blocks

64

val lines = List(

65

"val data = 1 to 1000",

66

"val rdd = sc.parallelize(data)",

67

"val squares = rdd.map(x => x * x)",

68

"squares.take(10)"

69

)

70

val output = SparkILoop.run(lines)

71

```

72

73

## Architecture

74

75

The Spark REPL is built around several key components:

76

77

- **Main Entry Point**: `Main` object handles application startup, SparkSession creation, and REPL lifecycle management

78

- **Interactive Loop**: `SparkILoop` class extends Scala's standard REPL with Spark-specific functionality and initialization commands

79

- **Session Management**: Automatic SparkSession and SparkContext setup with proper configuration for interactive use

80

- **Signal Handling**: Graceful job cancellation via Ctrl+C interrupt handling

81

- **Class Loading**: Dynamic compilation and loading of user code with proper Spark integration

82

83

## Capabilities

84

85

### REPL Session Management

86

87

Core functionality for starting, configuring, and managing interactive Spark shell sessions. Handles SparkSession creation, configuration, and lifecycle management.

88

89

```scala { .api }

90

object Main extends Logging {

91

val conf: SparkConf

92

val outputDir: File

93

var sparkContext: SparkContext

94

var sparkSession: SparkSession

95

var interp: SparkILoop

96

97

def main(args: Array[String]): Unit

98

def createSparkSession(): SparkSession

99

private[repl] def doMain(args: Array[String], _interp: SparkILoop): Unit

100

}

101

```

102

103

[Session Management](./session-management.md)

104

105

### Interactive Shell Interface

106

107

Interactive shell implementation providing Spark-specific REPL functionality with automatic context initialization and enhanced command support.

108

109

```scala { .api }

110

class SparkILoop(in0: Option[BufferedReader], out: JPrintWriter) extends ILoop(in0, out) {

111

def this(in0: BufferedReader, out: JPrintWriter)

112

def this()

113

114

val initializationCommands: Seq[String]

115

def initializeSpark(): Unit

116

def printWelcome(): Unit

117

def resetCommand(line: String): Unit

118

def replay(): Unit

119

def process(settings: Settings): Boolean

120

def commands: List[LoopCommand]

121

}

122

123

object SparkILoop {

124

def run(code: String, sets: Settings = new Settings): String

125

def run(lines: List[String]): String

126

}

127

```

128

129

[Interactive Shell](./interactive-shell.md)

130

131

### Signal Handling

132

133

Interrupt and job cancellation functionality for graceful handling of Ctrl+C and job termination in interactive sessions.

134

135

```scala { .api }

136

object Signaling extends Logging {

137

def cancelOnInterrupt(): Unit

138

}

139

```

140

141

[Signal Handling](./signaling.md)

142

143

### Global Variables and Context

144

145

When the REPL starts, several key variables are automatically available:

146

147

```scala { .api }

148

// Available in REPL session after initialization

149

@transient val spark: SparkSession // The active SparkSession

150

@transient val sc: SparkContext // The SparkContext from the session

151

152

// Standard imports are automatically available:

153

import org.apache.spark.SparkContext._

154

import spark.implicits._

155

import spark.sql

156

import org.apache.spark.sql.functions._

157

```

158

159

## Error Handling

160

161

The REPL provides robust error handling for common scenarios:

162

163

- **Initialization Failures**: Graceful handling of SparkSession creation errors

164

- **Job Cancellation**: Ctrl+C handling for running jobs with user-friendly messaging

165

- **Compilation Errors**: Clear reporting of Scala compilation issues

166

- **Runtime Exceptions**: Proper exception handling and reporting within the REPL context

167

168

## Platform Considerations

169

170

### Scala Version Compatibility

171

172

The REPL supports multiple Scala versions with version-specific implementations:

173

174

- **Scala 2.12**: Uses `process()` method for REPL execution

175

- **Scala 2.13**: Uses `run()` method (API change in Scala compiler)

176

177

### Environment Integration

178

179

- **SPARK_HOME**: Automatically detected and configured via `System.getenv("SPARK_HOME")`

180

- **SPARK_EXECUTOR_URI**: Custom executor URI configuration via environment variable

181

- **Classpath Management**: Dynamic JAR loading with file:// URL scheme normalization

182

- **Class Output**: Temporary directory creation with `spark.repl.classdir` configuration

183

- **Web UI**: Automatic display of Spark Web UI URL with reverse proxy support

184

- **Hive Support**: Conditional enablement based on `SparkSession.hiveClassesArePresent`