or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

class-loading.mdindex.mdinteractive-shell.mdmain-api.md

index.mddocs/

0

# Apache Spark REPL

1

2

Apache Spark REPL is an interactive Scala shell specifically designed for Apache Spark distributed computing. It provides a command-line interface that allows users to interactively execute Spark code, explore datasets, and prototype distributed data processing workflows in real-time. The REPL extends the standard Scala interpreter with Spark-specific functionality, automatically providing access to SparkContext and SparkSession objects.

3

4

## Package Information

5

6

- **Package Name**: spark-repl_2.11

7

- **Package Type**: maven

8

- **Language**: Scala

9

- **Group ID**: org.apache.spark

10

- **Artifact ID**: spark-repl_2.11

11

- **Version**: 2.4.8

12

- **Installation**: Add dependency to your Maven/SBT project or use as part of Apache Spark distribution

13

14

## Core Imports

15

16

```scala

17

import org.apache.spark.repl.Main

18

import org.apache.spark.repl.SparkILoop

19

import org.apache.spark.repl.ExecutorClassLoader

20

```

21

22

## Basic Usage

23

24

### Starting the REPL

25

26

```scala

27

// Command line usage - launches interactive shell

28

org.apache.spark.repl.Main.main(Array.empty)

29

30

// Programmatic usage with custom configuration

31

import org.apache.spark.repl.SparkILoop

32

import scala.tools.nsc.Settings

33

34

val settings = new Settings()

35

val interp = new SparkILoop()

36

interp.process(settings)

37

```

38

39

### Running Code in REPL

40

41

```scala

42

import org.apache.spark.repl.SparkILoop

43

44

// Run code and capture output

45

val output = SparkILoop.run("val x = 1 + 1; println(x)")

46

47

// Run multiple lines of code

48

val lines = List(

49

"val data = spark.range(1000)",

50

"val squares = data.map(x => x * x)",

51

"squares.count()"

52

)

53

val result = SparkILoop.run(lines)

54

```

55

56

## Architecture

57

58

The Spark REPL consists of several key components:

59

60

- **Main Entry Point**: `Main` object provides the application entry point and SparkSession management

61

- **Interactive Loop**: `SparkILoop` extends Scala's standard REPL with Spark-specific initialization and features

62

- **Dynamic Class Loading**: `ExecutorClassLoader` enables distribution of REPL-generated classes to cluster executors

63

- **Signal Handling**: `Signaling` provides graceful job cancellation on interrupt signals

64

- **Interpreter Components**: Scala 2.11-specific components for advanced import handling and expression typing

65

66

## Capabilities

67

68

### Main REPL Application

69

70

Core REPL application functionality including entry points, SparkSession management, and initialization.

71

72

```scala { .api }

73

object Main {

74

def main(args: Array[String]): Unit

75

def createSparkSession(): SparkSession

76

private[repl] def doMain(args: Array[String], _interp: SparkILoop): Unit

77

78

var sparkContext: SparkContext

79

var sparkSession: SparkSession

80

var interp: SparkILoop

81

val conf: SparkConf

82

val outputDir: File

83

}

84

```

85

86

[Main REPL API](./main-api.md)

87

88

### Interactive Shell Interface

89

90

Interactive shell implementation with Spark-specific features and command processing.

91

92

```scala { .api }

93

class SparkILoop(in0: Option[BufferedReader], out: JPrintWriter) extends ILoop {

94

def this()

95

def this(in0: BufferedReader, out: JPrintWriter)

96

97

def initializeSpark(): Unit

98

def printWelcome(): Unit

99

def resetCommand(line: String): Unit

100

def replay(): Unit

101

def process(settings: Settings): Boolean

102

def createInterpreter(): Unit

103

104

val initializationCommands: Seq[String]

105

val commands: List[LoopCommand]

106

}

107

108

object SparkILoop {

109

def run(code: String, sets: Settings = new Settings): String

110

def run(lines: List[String]): String

111

}

112

```

113

114

[Interactive Shell](./interactive-shell.md)

115

116

### Dynamic Class Loading

117

118

Class loading infrastructure for distributing REPL-generated classes to cluster executors.

119

120

```scala { .api }

121

class ExecutorClassLoader(

122

conf: SparkConf,

123

env: SparkEnv,

124

classUri: String,

125

parent: ClassLoader,

126

userClassPathFirst: Boolean

127

) extends ClassLoader {

128

129

def findClass(name: String): Class[_]

130

def findClassLocally(name: String): Option[Class[_]]

131

def readAndTransformClass(name: String, in: InputStream): Array[Byte]

132

def urlEncode(str: String): String

133

134

val uri: URI

135

val directory: String

136

val parentLoader: ParentClassLoader

137

}

138

```

139

140

[Class Loading](./class-loading.md)

141

142

### Signal Handling

143

144

Signal handling utilities for graceful job cancellation and interrupt management.

145

146

```scala { .api }

147

object Signaling {

148

def cancelOnInterrupt(): Unit

149

}

150

```

151

152

### Scala 2.11 Interpreter Components

153

154

Scala 2.11-specific interpreter components that provide enhanced import handling and expression typing capabilities.

155

156

```scala { .api }

157

class SparkILoopInterpreter(settings: Settings, out: JPrintWriter) extends IMain {

158

def chooseHandler(member: Tree): MemberHandler

159

160

class SparkImportHandler(imp: Import) extends ImportHandler {

161

def targetType: Type

162

}

163

}

164

165

trait SparkExprTyper extends ExprTyper {

166

def doInterpret(code: String): IR.Result

167

}

168

```

169

170

## Types

171

172

```scala { .api }

173

// Core Spark types used throughout the API

174

type SparkContext = org.apache.spark.SparkContext

175

type SparkSession = org.apache.spark.sql.SparkSession

176

type SparkConf = org.apache.spark.SparkConf

177

type SparkEnv = org.apache.spark.SparkEnv

178

179

// Scala REPL types

180

type Settings = scala.tools.nsc.Settings

181

type GenericRunnerSettings = scala.tools.nsc.GenericRunnerSettings

182

type ILoop = scala.tools.nsc.interpreter.ILoop

183

type IMain = scala.tools.nsc.interpreter.IMain

184

type LoopCommand = scala.tools.nsc.interpreter.LoopCommand

185

type JPrintWriter = scala.tools.nsc.interpreter.JPrintWriter

186

type MemberHandler = scala.tools.nsc.interpreter.MemberHandler

187

type ImportHandler = scala.tools.nsc.interpreter.ImportHandler

188

type ExprTyper = scala.tools.nsc.interpreter.ExprTyper

189

190

// Scala compiler types

191

type Tree = scala.tools.nsc.ast.Trees#Tree

192

type Import = scala.tools.nsc.ast.Trees#Import

193

type Type = scala.tools.nsc.Global#Type

194

type IR = scala.tools.nsc.interpreter.IR

195

196

// Java I/O types

197

type BufferedReader = java.io.BufferedReader

198

type InputStream = java.io.InputStream

199

type ByteArrayOutputStream = java.io.ByteArrayOutputStream

200

type FilterInputStream = java.io.FilterInputStream

201

type File = java.io.File

202

type URI = java.net.URI

203

type URL = java.net.URL

204

205

// Class loading types

206

type ClassLoader = java.lang.ClassLoader

207

type ParentClassLoader = org.apache.spark.util.ParentClassLoader

208

type ClassVisitor = org.apache.xbean.asm6.ClassVisitor

209

type ClassWriter = org.apache.xbean.asm6.ClassWriter

210

type ClassReader = org.apache.xbean.asm6.ClassReader

211

type MethodVisitor = org.apache.xbean.asm6.MethodVisitor

212

213

// Hadoop FileSystem types

214

type FileSystem = org.apache.hadoop.fs.FileSystem

215

type Path = org.apache.hadoop.fs.Path

216

```