Interactive Scala Shell for Apache Spark distributed computing
npx @tessl/cli install tessl/maven-org-apache-spark--spark-repl_2-11@2.4.00
# Apache Spark REPL
1
2
Apache Spark REPL is an interactive Scala shell specifically designed for Apache Spark distributed computing. It provides a command-line interface that allows users to interactively execute Spark code, explore datasets, and prototype distributed data processing workflows in real-time. The REPL extends the standard Scala interpreter with Spark-specific functionality, automatically providing access to SparkContext and SparkSession objects.
3
4
## Package Information
5
6
- **Package Name**: spark-repl_2.11
7
- **Package Type**: maven
8
- **Language**: Scala
9
- **Group ID**: org.apache.spark
10
- **Artifact ID**: spark-repl_2.11
11
- **Version**: 2.4.8
12
- **Installation**: Add dependency to your Maven/SBT project or use as part of Apache Spark distribution
13
14
## Core Imports
15
16
```scala
17
import org.apache.spark.repl.Main
18
import org.apache.spark.repl.SparkILoop
19
import org.apache.spark.repl.ExecutorClassLoader
20
```
21
22
## Basic Usage
23
24
### Starting the REPL
25
26
```scala
27
// Command line usage - launches interactive shell
28
org.apache.spark.repl.Main.main(Array.empty)
29
30
// Programmatic usage with custom configuration
31
import org.apache.spark.repl.SparkILoop
32
import scala.tools.nsc.Settings
33
34
val settings = new Settings()
35
val interp = new SparkILoop()
36
interp.process(settings)
37
```
38
39
### Running Code in REPL
40
41
```scala
42
import org.apache.spark.repl.SparkILoop
43
44
// Run code and capture output
45
val output = SparkILoop.run("val x = 1 + 1; println(x)")
46
47
// Run multiple lines of code
48
val lines = List(
49
"val data = spark.range(1000)",
50
"val squares = data.map(x => x * x)",
51
"squares.count()"
52
)
53
val result = SparkILoop.run(lines)
54
```
55
56
## Architecture
57
58
The Spark REPL consists of several key components:
59
60
- **Main Entry Point**: `Main` object provides the application entry point and SparkSession management
61
- **Interactive Loop**: `SparkILoop` extends Scala's standard REPL with Spark-specific initialization and features
62
- **Dynamic Class Loading**: `ExecutorClassLoader` enables distribution of REPL-generated classes to cluster executors
63
- **Signal Handling**: `Signaling` provides graceful job cancellation on interrupt signals
64
- **Interpreter Components**: Scala 2.11-specific components for advanced import handling and expression typing
65
66
## Capabilities
67
68
### Main REPL Application
69
70
Core REPL application functionality including entry points, SparkSession management, and initialization.
71
72
```scala { .api }
73
object Main {
74
def main(args: Array[String]): Unit
75
def createSparkSession(): SparkSession
76
private[repl] def doMain(args: Array[String], _interp: SparkILoop): Unit
77
78
var sparkContext: SparkContext
79
var sparkSession: SparkSession
80
var interp: SparkILoop
81
val conf: SparkConf
82
val outputDir: File
83
}
84
```
85
86
[Main REPL API](./main-api.md)
87
88
### Interactive Shell Interface
89
90
Interactive shell implementation with Spark-specific features and command processing.
91
92
```scala { .api }
93
class SparkILoop(in0: Option[BufferedReader], out: JPrintWriter) extends ILoop {
94
def this()
95
def this(in0: BufferedReader, out: JPrintWriter)
96
97
def initializeSpark(): Unit
98
def printWelcome(): Unit
99
def resetCommand(line: String): Unit
100
def replay(): Unit
101
def process(settings: Settings): Boolean
102
def createInterpreter(): Unit
103
104
val initializationCommands: Seq[String]
105
val commands: List[LoopCommand]
106
}
107
108
object SparkILoop {
109
def run(code: String, sets: Settings = new Settings): String
110
def run(lines: List[String]): String
111
}
112
```
113
114
[Interactive Shell](./interactive-shell.md)
115
116
### Dynamic Class Loading
117
118
Class loading infrastructure for distributing REPL-generated classes to cluster executors.
119
120
```scala { .api }
121
class ExecutorClassLoader(
122
conf: SparkConf,
123
env: SparkEnv,
124
classUri: String,
125
parent: ClassLoader,
126
userClassPathFirst: Boolean
127
) extends ClassLoader {
128
129
def findClass(name: String): Class[_]
130
def findClassLocally(name: String): Option[Class[_]]
131
def readAndTransformClass(name: String, in: InputStream): Array[Byte]
132
def urlEncode(str: String): String
133
134
val uri: URI
135
val directory: String
136
val parentLoader: ParentClassLoader
137
}
138
```
139
140
[Class Loading](./class-loading.md)
141
142
### Signal Handling
143
144
Signal handling utilities for graceful job cancellation and interrupt management.
145
146
```scala { .api }
147
object Signaling {
148
def cancelOnInterrupt(): Unit
149
}
150
```
151
152
### Scala 2.11 Interpreter Components
153
154
Scala 2.11-specific interpreter components that provide enhanced import handling and expression typing capabilities.
155
156
```scala { .api }
157
class SparkILoopInterpreter(settings: Settings, out: JPrintWriter) extends IMain {
158
def chooseHandler(member: Tree): MemberHandler
159
160
class SparkImportHandler(imp: Import) extends ImportHandler {
161
def targetType: Type
162
}
163
}
164
165
trait SparkExprTyper extends ExprTyper {
166
def doInterpret(code: String): IR.Result
167
}
168
```
169
170
## Types
171
172
```scala { .api }
173
// Core Spark types used throughout the API
174
type SparkContext = org.apache.spark.SparkContext
175
type SparkSession = org.apache.spark.sql.SparkSession
176
type SparkConf = org.apache.spark.SparkConf
177
type SparkEnv = org.apache.spark.SparkEnv
178
179
// Scala REPL types
180
type Settings = scala.tools.nsc.Settings
181
type GenericRunnerSettings = scala.tools.nsc.GenericRunnerSettings
182
type ILoop = scala.tools.nsc.interpreter.ILoop
183
type IMain = scala.tools.nsc.interpreter.IMain
184
type LoopCommand = scala.tools.nsc.interpreter.LoopCommand
185
type JPrintWriter = scala.tools.nsc.interpreter.JPrintWriter
186
type MemberHandler = scala.tools.nsc.interpreter.MemberHandler
187
type ImportHandler = scala.tools.nsc.interpreter.ImportHandler
188
type ExprTyper = scala.tools.nsc.interpreter.ExprTyper
189
190
// Scala compiler types
191
type Tree = scala.tools.nsc.ast.Trees#Tree
192
type Import = scala.tools.nsc.ast.Trees#Import
193
type Type = scala.tools.nsc.Global#Type
194
type IR = scala.tools.nsc.interpreter.IR
195
196
// Java I/O types
197
type BufferedReader = java.io.BufferedReader
198
type InputStream = java.io.InputStream
199
type ByteArrayOutputStream = java.io.ByteArrayOutputStream
200
type FilterInputStream = java.io.FilterInputStream
201
type File = java.io.File
202
type URI = java.net.URI
203
type URL = java.net.URL
204
205
// Class loading types
206
type ClassLoader = java.lang.ClassLoader
207
type ParentClassLoader = org.apache.spark.util.ParentClassLoader
208
type ClassVisitor = org.apache.xbean.asm6.ClassVisitor
209
type ClassWriter = org.apache.xbean.asm6.ClassWriter
210
type ClassReader = org.apache.xbean.asm6.ClassReader
211
type MethodVisitor = org.apache.xbean.asm6.MethodVisitor
212
213
// Hadoop FileSystem types
214
type FileSystem = org.apache.hadoop.fs.FileSystem
215
type Path = org.apache.hadoop.fs.Path
216
```