0
# Apache Spark REPL
1
2
The Apache Spark REPL (Read-Eval-Print Loop) provides an interactive Scala shell specifically designed for Apache Spark. It enables users to interactively execute Spark operations, explore datasets, and prototype distributed computing solutions in real-time with automatic SparkContext initialization and seamless integration with Spark's core APIs.
3
4
## Package Information
5
6
- **Package Name**: spark-repl_2.10
7
- **Package Type**: maven
8
- **Language**: Scala (with Java interoperability)
9
- **Installation**: Add dependency to your Maven `pom.xml`:
10
11
```xml
12
<dependency>
13
<groupId>org.apache.spark</groupId>
14
<artifactId>spark-repl_2.10</artifactId>
15
<version>1.6.3</version>
16
</dependency>
17
```
18
19
For SBT:
20
```scala
21
libraryDependencies += "org.apache.spark" %% "spark-repl" % "1.6.3"
22
```
23
24
## Core Imports
25
26
```scala
27
import org.apache.spark.repl.Main
28
import org.apache.spark.repl.SparkILoop
29
import org.apache.spark.repl.SparkIMain
30
```
31
32
For advanced usage:
33
```scala
34
import org.apache.spark.repl.{SparkCommandLine, ExecutorClassLoader}
35
import org.apache.spark.repl.SparkJLineCompletion
36
```
37
38
## Basic Usage
39
40
### Starting the REPL
41
42
```scala
43
import org.apache.spark.repl.Main
44
45
// Start interactive REPL
46
Main.main(Array.empty)
47
48
// Access current interpreter
49
val interpreter = Main.interp
50
```
51
52
### Programmatic Code Execution
53
54
```scala
55
import org.apache.spark.repl.SparkIMain
56
import scala.tools.nsc.interpreter.Results
57
58
// Create interpreter
59
val interpreter = new SparkIMain()
60
interpreter.initializeSynchronous()
61
62
// Execute Scala code
63
val result = interpreter.interpret("val x = 42")
64
result match {
65
case Results.Success => println("Code executed successfully")
66
case Results.Error => println("Execution failed")
67
case Results.Incomplete => println("Code incomplete")
68
}
69
70
// Bind values
71
interpreter.bind("myValue", "String", "Hello World")
72
73
// Add imports
74
interpreter.addImports("scala.collection.mutable._")
75
```
76
77
### Custom REPL Loop
78
79
```scala
80
import org.apache.spark.repl.SparkILoop
81
import java.io.{BufferedReader, InputStreamReader, PrintWriter}
82
83
// Create custom REPL
84
val in = new BufferedReader(new InputStreamReader(System.in))
85
val out = new PrintWriter(System.out, true)
86
val repl = new SparkILoop(in, out)
87
88
// Process with arguments
89
repl.process(Array("-i", "init.scala"))
90
```
91
92
## Architecture
93
94
The Spark REPL is built around several key components:
95
96
- **Main Entry Point**: `Main` object provides the primary application entry point and manages the global interpreter instance
97
- **Interactive Loop**: `SparkILoop` handles user interaction, command processing, and session management
98
- **Code Interpreter**: `SparkIMain` performs Scala code compilation and execution with Spark integration
99
- **Distributed Class Loading**: `ExecutorClassLoader` enables loading of REPL-defined classes across Spark clusters
100
- **Command Line Processing**: `SparkCommandLine` handles Spark-specific command line options and settings
101
- **Auto-completion**: `SparkJLineCompletion` provides intelligent tab completion for Scala code
102
103
## Capabilities
104
105
### Interactive Shell Management
106
107
Core REPL loop functionality for interactive Scala development with Spark integration. Provides command processing, prompt customization, and session management.
108
109
```scala { .api }
110
class SparkILoop(
111
in0: Option[BufferedReader],
112
out: JPrintWriter,
113
master: Option[String]
114
)
115
116
def process(args: Array[String]): Boolean
117
def setPrompt(prompt: String): Unit
118
def prompt: String
119
def commands: List[LoopCommand]
120
```
121
122
[Interactive Shell](./interactive-shell.md)
123
124
### Code Interpretation and Execution
125
126
Scala code compilation and execution engine with Spark context integration. Handles code parsing, compilation, binding, and result evaluation.
127
128
```scala { .api }
129
class SparkIMain(
130
initialSettings: Settings,
131
out: JPrintWriter,
132
propagateExceptions: Boolean = false
133
)
134
135
def interpret(line: String): Results.Result
136
def bind(name: String, boundType: String, value: Any, modifiers: List[String] = Nil): Results.Result
137
def addImports(ids: String*): Results.Result
138
def compileString(code: String): Boolean
139
```
140
141
[Code Interpretation](./code-interpretation.md)
142
143
### Distributed Class Loading
144
145
ClassLoader implementation for loading REPL-defined classes from Hadoop FileSystem or HTTP URIs, enabling distributed execution of user-defined code across Spark clusters.
146
147
```scala { .api }
148
class ExecutorClassLoader(
149
conf: SparkConf,
150
classUri: String,
151
parent: ClassLoader,
152
userClassPathFirst: Boolean
153
)
154
155
def findClass(name: String): Class[_]
156
def findClassLocally(name: String): Option[Class[_]]
157
```
158
159
[Distributed Class Loading](./class-loading.md)
160
161
### Command Line Configuration
162
163
Command line option handling and settings management for Spark-specific REPL configurations and compiler settings.
164
165
```scala { .api }
166
class SparkCommandLine(
167
args: List[String],
168
override val settings: Settings
169
)
170
171
val settings: Settings
172
```
173
174
[Command Line Configuration](./command-line.md)
175
176
### Auto-completion System
177
178
Intelligent tab completion system for Scala code within the REPL environment, providing context-aware suggestions for methods, variables, and types.
179
180
```scala { .api }
181
class SparkJLineCompletion(val intp: SparkIMain)
182
183
def completer(): ScalaCompleter
184
var verbosity: Int
185
def resetVerbosity(): Unit
186
```
187
188
[Auto-completion](./auto-completion.md)