0
# Apache Spark Launcher
1
2
Apache Spark Launcher provides a programmatic API for launching and monitoring Spark applications from Java applications. It offers two primary launch modes: child process execution with full monitoring capabilities, and in-process execution for cluster deployments. The library handles Spark application lifecycle management including configuration, execution, state monitoring, and provides comprehensive control interfaces for running applications.
3
4
## Package Information
5
6
- **Package Name**: org.apache.spark:spark-launcher_2.11
7
- **Package Type**: Maven (Java)
8
- **Language**: Java
9
- **Installation**: Add to Maven dependencies:
10
```xml
11
<dependency>
12
<groupId>org.apache.spark</groupId>
13
<artifactId>spark-launcher_2.11</artifactId>
14
<version>2.4.8</version>
15
</dependency>
16
```
17
18
## Core Imports
19
20
```java
21
import org.apache.spark.launcher.SparkLauncher;
22
import org.apache.spark.launcher.InProcessLauncher;
23
import org.apache.spark.launcher.SparkAppHandle;
24
```
25
26
## Basic Usage
27
28
### Child Process Launch with Monitoring
29
30
```java
31
import org.apache.spark.launcher.SparkLauncher;
32
import org.apache.spark.launcher.SparkAppHandle;
33
34
// Configure and launch Spark application as child process
35
SparkAppHandle handle = new SparkLauncher()
36
.setAppResource("/path/to/my-app.jar")
37
.setMainClass("com.example.MySparkApp")
38
.setMaster("local[*]")
39
.setConf(SparkLauncher.DRIVER_MEMORY, "2g")
40
.setConf(SparkLauncher.EXECUTOR_MEMORY, "1g")
41
.setAppName("My Spark Application")
42
.startApplication();
43
44
// Monitor application state
45
handle.addListener(new SparkAppHandle.Listener() {
46
public void stateChanged(SparkAppHandle handle) {
47
System.out.println("State: " + handle.getState());
48
if (handle.getState().isFinal()) {
49
System.out.println("Application finished with ID: " + handle.getAppId());
50
}
51
}
52
53
public void infoChanged(SparkAppHandle handle) {
54
System.out.println("Info updated for app: " + handle.getAppId());
55
}
56
});
57
58
// Wait for completion or stop/kill if needed
59
if (handle.getState() == SparkAppHandle.State.RUNNING) {
60
handle.stop(); // Graceful shutdown
61
// handle.kill(); // Force kill if needed
62
}
63
```
64
65
### Raw Process Launch
66
67
```java
68
import org.apache.spark.launcher.SparkLauncher;
69
70
// Launch as raw process (manual management required)
71
Process sparkProcess = new SparkLauncher()
72
.setAppResource("/path/to/my-app.jar")
73
.setMainClass("com.example.MySparkApp")
74
.setMaster("yarn")
75
.setDeployMode("cluster")
76
.setConf(SparkLauncher.DRIVER_MEMORY, "4g")
77
.launch();
78
79
// Manual process management
80
int exitCode = sparkProcess.waitFor();
81
System.out.println("Spark application exited with code: " + exitCode);
82
```
83
84
### In-Process Launch (Cluster Mode)
85
86
```java
87
import org.apache.spark.launcher.InProcessLauncher;
88
import org.apache.spark.launcher.SparkAppHandle;
89
90
// Launch application in same JVM (cluster mode recommended)
91
SparkAppHandle handle = new InProcessLauncher()
92
.setAppResource("/path/to/my-app.jar")
93
.setMainClass("com.example.MySparkApp")
94
.setMaster("yarn")
95
.setDeployMode("cluster")
96
.setConf("spark.sql.adaptive.enabled", "true")
97
.startApplication();
98
```
99
100
## Architecture
101
102
The Spark Launcher library is built around several key components:
103
104
- **Launcher Classes**: `SparkLauncher` and `InProcessLauncher` provide fluent configuration APIs for different launch modes
105
- **Abstract Base**: `AbstractLauncher` provides common configuration methods shared by both launcher implementations
106
- **Handle Interface**: `SparkAppHandle` provides runtime application control and monitoring with state-based lifecycle management
107
- **State Management**: Comprehensive state tracking through `SparkAppHandle.State` enum with final state detection
108
- **Event System**: Listener-based callbacks for real-time application state and information updates
109
- **Configuration System**: Extensive configuration options through constants and fluent methods
110
- **Process Management**: Robust child process handling with output redirection and logging capabilities
111
112
## Capabilities
113
114
### Application Launchers
115
116
Primary interfaces for launching Spark applications with comprehensive configuration options. Supports both child process and in-process execution modes.
117
118
```java { .api }
119
// Child process launcher with monitoring
120
public class SparkLauncher extends AbstractLauncher<SparkLauncher> {
121
public SparkLauncher();
122
public SparkLauncher(Map<String, String> env);
123
public SparkAppHandle startApplication(SparkAppHandle.Listener... listeners);
124
public Process launch();
125
}
126
127
// In-process launcher (cluster mode recommended)
128
public class InProcessLauncher extends AbstractLauncher<InProcessLauncher> {
129
public SparkAppHandle startApplication(SparkAppHandle.Listener... listeners);
130
}
131
```
132
133
[Application Launchers](./launchers.md)
134
135
### Application Handles
136
137
Runtime control and monitoring interface for launched Spark applications. Provides state tracking, application control, and event notifications.
138
139
```java { .api }
140
public interface SparkAppHandle {
141
void addListener(Listener l);
142
State getState();
143
String getAppId();
144
void stop();
145
void kill();
146
void disconnect();
147
148
public enum State {
149
UNKNOWN(false), CONNECTED(false), SUBMITTED(false), RUNNING(false),
150
FINISHED(true), FAILED(true), KILLED(true), LOST(true);
151
152
public boolean isFinal();
153
}
154
155
public interface Listener {
156
void stateChanged(SparkAppHandle handle);
157
void infoChanged(SparkAppHandle handle);
158
}
159
}
160
```
161
162
[Application Handles](./application-handles.md)
163
164
### Configuration Management
165
166
Comprehensive configuration system with predefined constants for common Spark settings and fluent configuration methods.
167
168
```java { .api }
169
public abstract class AbstractLauncher<T extends AbstractLauncher<T>> {
170
public T setPropertiesFile(String path);
171
public T setConf(String key, String value);
172
public T setAppName(String appName);
173
public T setMaster(String master);
174
public T setDeployMode(String mode);
175
public T setAppResource(String resource);
176
public T setMainClass(String mainClass);
177
public T addJar(String jar);
178
public T addFile(String file);
179
public T addPyFile(String file);
180
public T addAppArgs(String... args);
181
public T setVerbose(boolean verbose);
182
}
183
184
// Configuration constants in SparkLauncher
185
public static final String DRIVER_MEMORY = "spark.driver.memory";
186
public static final String EXECUTOR_MEMORY = "spark.executor.memory";
187
public static final String EXECUTOR_CORES = "spark.executor.cores";
188
// ... additional constants
189
```
190
191
[Configuration Management](./configuration.md)
192
193
## Common Use Cases
194
195
### Batch Job Orchestration
196
Use `SparkLauncher` with monitoring to manage batch processing pipelines, track job completion, and handle failures gracefully.
197
198
### Interactive Application Management
199
Leverage `SparkAppHandle` state notifications to build interactive dashboards that display real-time Spark application status.
200
201
### Cluster Resource Management
202
Deploy applications to YARN, Mesos, or Kubernetes clusters using cluster mode with proper resource allocation through configuration constants.
203
204
### Development and Testing
205
Use local mode execution for development and testing with simplified configuration and immediate feedback.
206
207
## Environment Requirements
208
209
- **Spark Installation**: Child process launches require SPARK_HOME environment variable or explicit `setSparkHome()` configuration
210
- **Java Runtime**: Custom JAVA_HOME can be set via `setJavaHome()` method
211
- **Classpath**: In-process launches require Spark dependencies in application classpath
212
- **Cluster Integration**: Supports YARN, Mesos, Kubernetes, and Standalone cluster managers
213
- **Platform Support**: Cross-platform with Windows-specific command handling
214
215
## Error Handling
216
217
The library provides multiple layers of error handling:
218
219
- **Configuration Validation**: Parameter validation with descriptive error messages
220
- **Launch Failures**: IOException handling for process creation failures
221
- **Runtime Monitoring**: State-based error detection through `SparkAppHandle.State.FAILED`
222
- **Connection Issues**: Timeout handling for launcher server communication
223
- **Process Management**: Robust child process lifecycle management with cleanup