0
# Main REPL API
1
2
Core REPL application functionality including entry points, SparkSession management, and signal handling.
3
4
## Capabilities
5
6
### Main Application Entry Point
7
8
The `Main` object serves as the primary entry point for the Spark REPL application and manages the global SparkSession instance.
9
10
```scala { .api }
11
/**
12
* Main entry point for the Spark REPL application
13
* @param args Command line arguments for REPL configuration
14
*/
15
def main(args: Array[String]): Unit
16
17
/**
18
* Creates and configures a SparkSession for the REPL with appropriate defaults
19
* @returns Configured SparkSession instance with Hive support if available
20
*/
21
def createSparkSession(): SparkSession
22
23
/**
24
* Internal main method used for testing and custom REPL configurations
25
* Package-private for testing purposes
26
* @param args Command line arguments
27
* @param _interp Custom SparkILoop interpreter instance
28
*/
29
private[repl] def doMain(args: Array[String], _interp: SparkILoop): Unit
30
```
31
32
**Usage Examples:**
33
34
```scala
35
import org.apache.spark.repl.Main
36
37
// Start interactive REPL from command line
38
Main.main(Array.empty)
39
40
// Start REPL with custom arguments
41
Main.main(Array("-classpath", "/path/to/jars"))
42
43
// Create SparkSession programmatically
44
val spark = Main.createSparkSession()
45
println(s"Spark version: ${spark.version}")
46
```
47
48
### Global State Management
49
50
The Main object maintains global state for the REPL session including SparkContext, SparkSession, and interpreter instances.
51
52
```scala { .api }
53
/**
54
* Current SparkContext instance, created by createSparkSession()
55
* This is a mutable variable that can be reset for testing
56
*/
57
var sparkContext: SparkContext
58
59
/**
60
* Current SparkSession instance, created by createSparkSession()
61
* This is a mutable variable that can be reset for testing
62
*/
63
var sparkSession: SparkSession
64
65
/**
66
* Current SparkILoop interpreter instance
67
* This is a public variable because tests need to reset it
68
*/
69
var interp: SparkILoop
70
71
/**
72
* Spark configuration instance used for creating SparkContext
73
* Initialized with default REPL-specific settings
74
*/
75
val conf: SparkConf
76
77
/**
78
* Output directory for REPL-generated class files
79
* Created as a temporary directory under spark.repl.classdir or local dir
80
*/
81
val outputDir: File
82
```
83
84
**Usage Examples:**
85
86
```scala
87
import org.apache.spark.repl.Main
88
89
// Access current SparkContext
90
if (Main.sparkContext != null) {
91
println(s"Master: ${Main.sparkContext.master}")
92
println(s"App ID: ${Main.sparkContext.applicationId}")
93
}
94
95
// Access SparkSession
96
if (Main.sparkSession != null) {
97
val df = Main.sparkSession.range(10)
98
df.show()
99
}
100
101
// Check REPL configuration
102
println(s"Output directory: ${Main.outputDir.getAbsolutePath}")
103
val appName = Main.conf.get("spark.app.name", "Unknown")
104
println(s"App name: ${appName}")
105
```
106
107
### Configuration and Initialization
108
109
The Main object handles REPL-specific configuration including class output directories, executor URIs, and Spark home detection.
110
111
**Configuration Behavior:**
112
113
- Sets `spark.app.name` to "Spark shell" if not specified
114
- Configures `spark.repl.class.outputDir` for class distribution
115
- Detects and sets `SPARK_HOME` from environment variables
116
- Handles `SPARK_EXECUTOR_URI` for custom executor configurations
117
- Enables Hive support automatically if Hive classes are present
118
- Falls back to in-memory catalog if Hive is not available
119
120
**Class Output Directory:**
121
122
The REPL creates a temporary directory for compiled classes that need to be distributed to executors:
123
124
```scala
125
// Directory creation logic
126
val rootDir = conf.getOption("spark.repl.classdir").getOrElse(Utils.getLocalDir(conf))
127
val outputDir = Utils.createTempDir(root = rootDir, namePrefix = "repl")
128
```
129
130
### Signal Handling
131
132
Graceful interrupt handling for canceling running Spark jobs.
133
134
```scala { .api }
135
object Signaling {
136
/**
137
* Registers a SIGINT handler that cancels all active Spark jobs
138
* Allows users to interrupt long-running operations with Ctrl+C
139
* If no jobs are running, the signal terminates the REPL
140
*/
141
def cancelOnInterrupt(): Unit
142
}
143
```
144
145
**Usage Examples:**
146
147
```scala
148
import org.apache.spark.repl.Signaling
149
150
// Enable interrupt handling (called automatically by Main)
151
Signaling.cancelOnInterrupt()
152
153
// After this, users can press Ctrl+C to:
154
// 1. Cancel running Spark jobs if any are active
155
// 2. Exit the REPL if no jobs are running
156
```
157
158
**Signal Handling Behavior:**
159
160
1. When Ctrl+C is pressed and Spark jobs are running:
161
- Displays warning: "Cancelling all active jobs, this can take a while. Press Ctrl+C again to exit now."
162
- Calls `SparkContext.cancelAllJobs()`
163
- Returns control to REPL prompt
164
165
2. When Ctrl+C is pressed and no jobs are running:
166
- Terminates the REPL session immediately
167
168
## Error Handling
169
170
The Main object includes error handling for common REPL initialization scenarios:
171
172
- **Scala Option Errors**: Command line argument parsing errors are displayed to stderr
173
- **SparkSession Creation Failures**: In shell sessions, initialization errors cause `sys.exit(1)`
174
- **Non-shell Sessions**: Exceptions are propagated to the caller for custom handling
175
176
## Thread Safety Notes
177
178
- Global variables (`sparkContext`, `sparkSession`, `interp`) are not thread-safe
179
- These variables are designed for single-threaded REPL usage
180
- Tests may reset these variables between test cases
181
- The `conf` and `outputDir` values are immutable after initialization