0
# Apache Spark REPL
1
2
Apache Spark REPL is an interactive Scala shell that provides a command-line interface for Apache Spark. It allows users to interactively execute Spark code, explore data, run SQL queries, and perform distributed computing operations in real-time. The REPL extends the standard Scala interpreter with Spark-specific functionality, automatically creating a SparkSession and SparkContext, and providing seamless access to Spark's core APIs including RDDs, DataFrames, and Datasets.
3
4
## Package Information
5
6
- **Package Name**: spark-repl_2.11
7
- **Package Type**: maven
8
- **Language**: Scala
9
- **Installation**: `<dependency><groupId>org.apache.spark</groupId><artifactId>spark-repl_2.11</artifactId><version>2.4.8</version></dependency>`
10
11
## Core Imports
12
13
```scala
14
import org.apache.spark.repl._
15
```
16
17
For main entry point:
18
```scala
19
import org.apache.spark.repl.Main
20
```
21
22
For interactive loop:
23
```scala
24
import org.apache.spark.repl.SparkILoop
25
```
26
27
For custom class loading:
28
```scala
29
import org.apache.spark.repl.ExecutorClassLoader
30
```
31
32
## Basic Usage
33
34
### Command Line Usage
35
36
```bash
37
# Start Spark REPL
38
spark-shell
39
40
# Or via main class
41
scala -cp <spark-classpath> org.apache.spark.repl.Main
42
```
43
44
### Programmatic Usage
45
46
```scala
47
import org.apache.spark.repl.SparkILoop
48
import scala.tools.nsc.Settings
49
50
// Execute code in REPL
51
val code = """
52
val data = sc.parallelize(1 to 10)
53
data.sum()
54
"""
55
val result = SparkILoop.run(code)
56
57
// Create custom REPL instance
58
val settings = new Settings
59
val repl = new SparkILoop()
60
repl.process(settings)
61
```
62
63
## Architecture
64
65
Apache Spark REPL is built around several key components:
66
67
- **Main Entry Point**: The `Main` object provides application entry point and SparkSession/SparkContext creation
68
- **Interactive Shell**: `SparkILoop` extends Scala's standard REPL with Spark-specific initialization and commands
69
- **Distributed Class Loading**: `ExecutorClassLoader` enables loading of REPL-compiled classes on remote executors
70
- **Signal Handling**: Integration with Spark's job cancellation system for interactive interruption
71
- **Scala Version Support**: Special handling for Scala 2.11 compatibility issues with imports and type inference
72
73
## Capabilities
74
75
### REPL Entry Point and Session Management
76
77
Main application entry point and SparkSession/SparkContext lifecycle management for the interactive shell.
78
79
```scala { .api }
80
object Main extends Logging {
81
var sparkContext: SparkContext
82
var sparkSession: SparkSession
83
var interp: SparkILoop
84
val conf: SparkConf
85
86
def main(args: Array[String]): Unit
87
def createSparkSession(): SparkSession
88
}
89
```
90
91
[REPL Entry Point](./main-entry.md)
92
93
### Interactive Shell Loop
94
95
Core interactive shell functionality with Spark-specific initialization, commands, and REPL processing.
96
97
```scala { .api }
98
class SparkILoop(in0: Option[BufferedReader], out: JPrintWriter) extends ILoop {
99
def this(in0: BufferedReader, out: JPrintWriter)
100
def this()
101
102
def initializeSpark(): Unit
103
def process(settings: Settings): Boolean
104
override def createInterpreter(): Unit
105
override def printWelcome(): Unit
106
override def commands: List[LoopCommand]
107
override def resetCommand(line: String): Unit
108
override def replay(): Unit
109
}
110
111
object SparkILoop {
112
def run(code: String, sets: Settings = new Settings): String
113
def run(lines: List[String]): String
114
}
115
```
116
117
[Interactive Shell](./interactive-shell.md)
118
119
### Distributed Class Loading
120
121
Custom class loader system for loading REPL-compiled classes on remote Spark executors with support for RPC and Hadoop filesystem access.
122
123
```scala { .api }
124
class ExecutorClassLoader(
125
conf: SparkConf,
126
env: SparkEnv,
127
classUri: String,
128
parent: ClassLoader,
129
userClassPathFirst: Boolean
130
) extends ClassLoader with Logging {
131
132
override def findClass(name: String): Class[_]
133
def findClassLocally(name: String): Option[Class[_]]
134
def readAndTransformClass(name: String, in: InputStream): Array[Byte]
135
def urlEncode(str: String): String
136
override def getResource(name: String): URL
137
override def getResources(name: String): java.util.Enumeration[URL]
138
override def getResourceAsStream(name: String): InputStream
139
}
140
```
141
142
[Distributed Class Loading](./class-loading.md)
143
144
### Signal Handling
145
146
Signal handling utilities for interactive job cancellation and REPL interrupt management.
147
148
```scala { .api }
149
object Signaling extends Logging {
150
def cancelOnInterrupt(): Unit
151
}
152
```
153
154
[Signal Handling](./signal-handling.md)
155
156
### Scala 2.11 Compatibility Components
157
158
Specialized interpreter and expression typing components for Scala 2.11 compatibility fixes.
159
160
```scala { .api }
161
class SparkILoopInterpreter(settings: Settings, out: JPrintWriter) extends IMain {
162
def symbolOfLine(code: String): global.Symbol
163
def typeOfExpression(expr: String, silent: Boolean): global.Type
164
def importsCode(wanted: Set[Name], wrapper: Request#Wrapper,
165
definesClass: Boolean, generousImports: Boolean): ComputedImports
166
}
167
168
trait SparkExprTyper extends ExprTyper {
169
def doInterpret(code: String): IR.Result
170
def symbolOfLine(code: String): Symbol
171
}
172
```
173
174
[Scala 2.11 Compatibility](./scala-compatibility.md)
175
176
## Types
177
178
### Core Configuration Types
179
180
```scala { .api }
181
// From Spark Core
182
case class SparkConf()
183
class SparkContext
184
class SparkSession
185
class SparkEnv
186
187
// From Scala
188
class Settings extends scala.tools.nsc.Settings
189
class BufferedReader extends java.io.BufferedReader
190
class JPrintWriter extends scala.tools.nsc.interpreter.JPrintWriter
191
```
192
193
### REPL-Specific Types
194
195
```scala { .api }
196
// REPL interpreter result types
197
object IR {
198
sealed abstract class Result
199
case object Success extends Result
200
case class Error(exception: Throwable) extends Result
201
case object Incomplete extends Result
202
}
203
204
// Class loading types
205
trait ClassLoader extends java.lang.ClassLoader
206
trait Logging
207
```