Tessl Tile for maven/org.apache.spark/spark-repl_2.11@2.4.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

tessl/maven-org-apache-spark--spark-repl_2-11

Interactive Scala Shell for Apache Spark distributed computing

Workspace: tessl
Visibility: Public
Created: 3 months ago
Last updated: 3 months ago
Describes: pkg:maven/org.apache.spark/spark-repl_2.11@2.4.x

To install, run

npx @tessl/cli install tessl/maven-org-apache-spark--spark-repl_2-11@2.4.0

0
# Apache Spark REPL
1

2
Apache Spark REPL is an interactive Scala shell specifically designed for Apache Spark distributed computing. It provides a command-line interface that allows users to interactively execute Spark code, explore datasets, and prototype distributed data processing workflows in real-time. The REPL extends the standard Scala interpreter with Spark-specific functionality, automatically providing access to SparkContext and SparkSession objects.
3

4
## Package Information
5

6
- **Package Name**: spark-repl_2.11
7
- **Package Type**: maven
8
- **Language**: Scala
9
- **Group ID**: org.apache.spark
10
- **Artifact ID**: spark-repl_2.11
11
- **Version**: 2.4.8
12
- **Installation**: Add dependency to your Maven/SBT project or use as part of Apache Spark distribution
13

14
## Core Imports
15

16
```scala
17
import org.apache.spark.repl.Main
18
import org.apache.spark.repl.SparkILoop
19
import org.apache.spark.repl.ExecutorClassLoader
20
```
21

22
## Basic Usage
23

24
### Starting the REPL
25

26
```scala
27
// Command line usage - launches interactive shell
28
org.apache.spark.repl.Main.main(Array.empty)
29

30
// Programmatic usage with custom configuration
31
import org.apache.spark.repl.SparkILoop
32
import scala.tools.nsc.Settings
33

34
val settings = new Settings()
35
val interp = new SparkILoop()
36
interp.process(settings)
37
```
38

39
### Running Code in REPL
40

41
```scala
42
import org.apache.spark.repl.SparkILoop
43

44
// Run code and capture output
45
val output = SparkILoop.run("val x = 1 + 1; println(x)")
46

47
// Run multiple lines of code
48
val lines = List(
49
  "val data = spark.range(1000)",
50
  "val squares = data.map(x => x * x)",
51
  "squares.count()"
52
)
53
val result = SparkILoop.run(lines)
54
```
55

56
## Architecture
57

58
The Spark REPL consists of several key components:
59

60
- **Main Entry Point**: `Main` object provides the application entry point and SparkSession management
61
- **Interactive Loop**: `SparkILoop` extends Scala's standard REPL with Spark-specific initialization and features
62
- **Dynamic Class Loading**: `ExecutorClassLoader` enables distribution of REPL-generated classes to cluster executors
63
- **Signal Handling**: `Signaling` provides graceful job cancellation on interrupt signals
64
- **Interpreter Components**: Scala 2.11-specific components for advanced import handling and expression typing
65

66
## Capabilities
67

68
### Main REPL Application
69

70
Core REPL application functionality including entry points, SparkSession management, and initialization.
71

72
```scala { .api }
73
object Main {
74
  def main(args: Array[String]): Unit
75
  def createSparkSession(): SparkSession
76
  private[repl] def doMain(args: Array[String], _interp: SparkILoop): Unit
77
  
78
  var sparkContext: SparkContext
79
  var sparkSession: SparkSession
80
  var interp: SparkILoop
81
  val conf: SparkConf
82
  val outputDir: File
83
}
84
```
85

86
[Main REPL API](./main-api.md)
87

88
### Interactive Shell Interface
89

90
Interactive shell implementation with Spark-specific features and command processing.
91

92
```scala { .api }
93
class SparkILoop(in0: Option[BufferedReader], out: JPrintWriter) extends ILoop {
94
  def this()
95
  def this(in0: BufferedReader, out: JPrintWriter)
96
  
97
  def initializeSpark(): Unit
98
  def printWelcome(): Unit
99
  def resetCommand(line: String): Unit
100
  def replay(): Unit
101
  def process(settings: Settings): Boolean
102
  def createInterpreter(): Unit
103
  
104
  val initializationCommands: Seq[String]
105
  val commands: List[LoopCommand]
106
}
107

108
object SparkILoop {
109
  def run(code: String, sets: Settings = new Settings): String
110
  def run(lines: List[String]): String
111
}
112
```
113

114
[Interactive Shell](./interactive-shell.md)
115

116
### Dynamic Class Loading
117

118
Class loading infrastructure for distributing REPL-generated classes to cluster executors.
119

120
```scala { .api }
121
class ExecutorClassLoader(
122
  conf: SparkConf,
123
  env: SparkEnv,
124
  classUri: String,
125
  parent: ClassLoader,
126
  userClassPathFirst: Boolean
127
) extends ClassLoader {
128
  
129
  def findClass(name: String): Class[_]
130
  def findClassLocally(name: String): Option[Class[_]]
131
  def readAndTransformClass(name: String, in: InputStream): Array[Byte]
132
  def urlEncode(str: String): String
133
  
134
  val uri: URI
135
  val directory: String
136
  val parentLoader: ParentClassLoader
137
}
138
```
139

140
[Class Loading](./class-loading.md)
141

142
### Signal Handling
143

144
Signal handling utilities for graceful job cancellation and interrupt management.
145

146
```scala { .api }
147
object Signaling {
148
  def cancelOnInterrupt(): Unit
149
}
150
```
151

152
### Scala 2.11 Interpreter Components
153

154
Scala 2.11-specific interpreter components that provide enhanced import handling and expression typing capabilities.
155

156
```scala { .api }
157
class SparkILoopInterpreter(settings: Settings, out: JPrintWriter) extends IMain {
158
  def chooseHandler(member: Tree): MemberHandler
159
  
160
  class SparkImportHandler(imp: Import) extends ImportHandler {
161
    def targetType: Type
162
  }
163
}
164

165
trait SparkExprTyper extends ExprTyper {
166
  def doInterpret(code: String): IR.Result
167
}
168
```
169

170
## Types
171

172
```scala { .api }
173
// Core Spark types used throughout the API
174
type SparkContext = org.apache.spark.SparkContext
175
type SparkSession = org.apache.spark.sql.SparkSession
176
type SparkConf = org.apache.spark.SparkConf
177
type SparkEnv = org.apache.spark.SparkEnv
178

179
// Scala REPL types
180
type Settings = scala.tools.nsc.Settings
181
type GenericRunnerSettings = scala.tools.nsc.GenericRunnerSettings
182
type ILoop = scala.tools.nsc.interpreter.ILoop
183
type IMain = scala.tools.nsc.interpreter.IMain
184
type LoopCommand = scala.tools.nsc.interpreter.LoopCommand
185
type JPrintWriter = scala.tools.nsc.interpreter.JPrintWriter
186
type MemberHandler = scala.tools.nsc.interpreter.MemberHandler
187
type ImportHandler = scala.tools.nsc.interpreter.ImportHandler
188
type ExprTyper = scala.tools.nsc.interpreter.ExprTyper
189

190
// Scala compiler types
191
type Tree = scala.tools.nsc.ast.Trees#Tree
192
type Import = scala.tools.nsc.ast.Trees#Import
193
type Type = scala.tools.nsc.Global#Type
194
type IR = scala.tools.nsc.interpreter.IR
195

196
// Java I/O types
197
type BufferedReader = java.io.BufferedReader
198
type InputStream = java.io.InputStream
199
type ByteArrayOutputStream = java.io.ByteArrayOutputStream
200
type FilterInputStream = java.io.FilterInputStream
201
type File = java.io.File
202
type URI = java.net.URI
203
type URL = java.net.URL
204

205
// Class loading types
206
type ClassLoader = java.lang.ClassLoader
207
type ParentClassLoader = org.apache.spark.util.ParentClassLoader
208
type ClassVisitor = org.apache.xbean.asm6.ClassVisitor
209
type ClassWriter = org.apache.xbean.asm6.ClassWriter
210
type ClassReader = org.apache.xbean.asm6.ClassReader
211
type MethodVisitor = org.apache.xbean.asm6.MethodVisitor
212

213
// Hadoop FileSystem types
214
type FileSystem = org.apache.hadoop.fs.FileSystem
215
type Path = org.apache.hadoop.fs.Path
216
```