or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

application-handles.mdconfiguration.mdindex.mdlaunchers.md

index.mddocs/

0

# Apache Spark Launcher

1

2

Apache Spark Launcher provides a programmatic API for launching and monitoring Spark applications from Java applications. It offers two primary launch modes: child process execution with full monitoring capabilities, and in-process execution for cluster deployments. The library handles Spark application lifecycle management including configuration, execution, state monitoring, and provides comprehensive control interfaces for running applications.

3

4

## Package Information

5

6

- **Package Name**: org.apache.spark:spark-launcher_2.11

7

- **Package Type**: Maven (Java)

8

- **Language**: Java

9

- **Installation**: Add to Maven dependencies:

10

```xml

11

<dependency>

12

<groupId>org.apache.spark</groupId>

13

<artifactId>spark-launcher_2.11</artifactId>

14

<version>2.4.8</version>

15

</dependency>

16

```

17

18

## Core Imports

19

20

```java

21

import org.apache.spark.launcher.SparkLauncher;

22

import org.apache.spark.launcher.InProcessLauncher;

23

import org.apache.spark.launcher.SparkAppHandle;

24

```

25

26

## Basic Usage

27

28

### Child Process Launch with Monitoring

29

30

```java

31

import org.apache.spark.launcher.SparkLauncher;

32

import org.apache.spark.launcher.SparkAppHandle;

33

34

// Configure and launch Spark application as child process

35

SparkAppHandle handle = new SparkLauncher()

36

.setAppResource("/path/to/my-app.jar")

37

.setMainClass("com.example.MySparkApp")

38

.setMaster("local[*]")

39

.setConf(SparkLauncher.DRIVER_MEMORY, "2g")

40

.setConf(SparkLauncher.EXECUTOR_MEMORY, "1g")

41

.setAppName("My Spark Application")

42

.startApplication();

43

44

// Monitor application state

45

handle.addListener(new SparkAppHandle.Listener() {

46

public void stateChanged(SparkAppHandle handle) {

47

System.out.println("State: " + handle.getState());

48

if (handle.getState().isFinal()) {

49

System.out.println("Application finished with ID: " + handle.getAppId());

50

}

51

}

52

53

public void infoChanged(SparkAppHandle handle) {

54

System.out.println("Info updated for app: " + handle.getAppId());

55

}

56

});

57

58

// Wait for completion or stop/kill if needed

59

if (handle.getState() == SparkAppHandle.State.RUNNING) {

60

handle.stop(); // Graceful shutdown

61

// handle.kill(); // Force kill if needed

62

}

63

```

64

65

### Raw Process Launch

66

67

```java

68

import org.apache.spark.launcher.SparkLauncher;

69

70

// Launch as raw process (manual management required)

71

Process sparkProcess = new SparkLauncher()

72

.setAppResource("/path/to/my-app.jar")

73

.setMainClass("com.example.MySparkApp")

74

.setMaster("yarn")

75

.setDeployMode("cluster")

76

.setConf(SparkLauncher.DRIVER_MEMORY, "4g")

77

.launch();

78

79

// Manual process management

80

int exitCode = sparkProcess.waitFor();

81

System.out.println("Spark application exited with code: " + exitCode);

82

```

83

84

### In-Process Launch (Cluster Mode)

85

86

```java

87

import org.apache.spark.launcher.InProcessLauncher;

88

import org.apache.spark.launcher.SparkAppHandle;

89

90

// Launch application in same JVM (cluster mode recommended)

91

SparkAppHandle handle = new InProcessLauncher()

92

.setAppResource("/path/to/my-app.jar")

93

.setMainClass("com.example.MySparkApp")

94

.setMaster("yarn")

95

.setDeployMode("cluster")

96

.setConf("spark.sql.adaptive.enabled", "true")

97

.startApplication();

98

```

99

100

## Architecture

101

102

The Spark Launcher library is built around several key components:

103

104

- **Launcher Classes**: `SparkLauncher` and `InProcessLauncher` provide fluent configuration APIs for different launch modes

105

- **Abstract Base**: `AbstractLauncher` provides common configuration methods shared by both launcher implementations

106

- **Handle Interface**: `SparkAppHandle` provides runtime application control and monitoring with state-based lifecycle management

107

- **State Management**: Comprehensive state tracking through `SparkAppHandle.State` enum with final state detection

108

- **Event System**: Listener-based callbacks for real-time application state and information updates

109

- **Configuration System**: Extensive configuration options through constants and fluent methods

110

- **Process Management**: Robust child process handling with output redirection and logging capabilities

111

112

## Capabilities

113

114

### Application Launchers

115

116

Primary interfaces for launching Spark applications with comprehensive configuration options. Supports both child process and in-process execution modes.

117

118

```java { .api }

119

// Child process launcher with monitoring

120

public class SparkLauncher extends AbstractLauncher<SparkLauncher> {

121

public SparkLauncher();

122

public SparkLauncher(Map<String, String> env);

123

public SparkAppHandle startApplication(SparkAppHandle.Listener... listeners);

124

public Process launch();

125

}

126

127

// In-process launcher (cluster mode recommended)

128

public class InProcessLauncher extends AbstractLauncher<InProcessLauncher> {

129

public SparkAppHandle startApplication(SparkAppHandle.Listener... listeners);

130

}

131

```

132

133

[Application Launchers](./launchers.md)

134

135

### Application Handles

136

137

Runtime control and monitoring interface for launched Spark applications. Provides state tracking, application control, and event notifications.

138

139

```java { .api }

140

public interface SparkAppHandle {

141

void addListener(Listener l);

142

State getState();

143

String getAppId();

144

void stop();

145

void kill();

146

void disconnect();

147

148

public enum State {

149

UNKNOWN(false), CONNECTED(false), SUBMITTED(false), RUNNING(false),

150

FINISHED(true), FAILED(true), KILLED(true), LOST(true);

151

152

public boolean isFinal();

153

}

154

155

public interface Listener {

156

void stateChanged(SparkAppHandle handle);

157

void infoChanged(SparkAppHandle handle);

158

}

159

}

160

```

161

162

[Application Handles](./application-handles.md)

163

164

### Configuration Management

165

166

Comprehensive configuration system with predefined constants for common Spark settings and fluent configuration methods.

167

168

```java { .api }

169

public abstract class AbstractLauncher<T extends AbstractLauncher<T>> {

170

public T setPropertiesFile(String path);

171

public T setConf(String key, String value);

172

public T setAppName(String appName);

173

public T setMaster(String master);

174

public T setDeployMode(String mode);

175

public T setAppResource(String resource);

176

public T setMainClass(String mainClass);

177

public T addJar(String jar);

178

public T addFile(String file);

179

public T addPyFile(String file);

180

public T addAppArgs(String... args);

181

public T setVerbose(boolean verbose);

182

}

183

184

// Configuration constants in SparkLauncher

185

public static final String DRIVER_MEMORY = "spark.driver.memory";

186

public static final String EXECUTOR_MEMORY = "spark.executor.memory";

187

public static final String EXECUTOR_CORES = "spark.executor.cores";

188

// ... additional constants

189

```

190

191

[Configuration Management](./configuration.md)

192

193

## Common Use Cases

194

195

### Batch Job Orchestration

196

Use `SparkLauncher` with monitoring to manage batch processing pipelines, track job completion, and handle failures gracefully.

197

198

### Interactive Application Management

199

Leverage `SparkAppHandle` state notifications to build interactive dashboards that display real-time Spark application status.

200

201

### Cluster Resource Management

202

Deploy applications to YARN, Mesos, or Kubernetes clusters using cluster mode with proper resource allocation through configuration constants.

203

204

### Development and Testing

205

Use local mode execution for development and testing with simplified configuration and immediate feedback.

206

207

## Environment Requirements

208

209

- **Spark Installation**: Child process launches require SPARK_HOME environment variable or explicit `setSparkHome()` configuration

210

- **Java Runtime**: Custom JAVA_HOME can be set via `setJavaHome()` method

211

- **Classpath**: In-process launches require Spark dependencies in application classpath

212

- **Cluster Integration**: Supports YARN, Mesos, Kubernetes, and Standalone cluster managers

213

- **Platform Support**: Cross-platform with Windows-specific command handling

214

215

## Error Handling

216

217

The library provides multiple layers of error handling:

218

219

- **Configuration Validation**: Parameter validation with descriptive error messages

220

- **Launch Failures**: IOException handling for process creation failures

221

- **Runtime Monitoring**: State-based error detection through `SparkAppHandle.State.FAILED`

222

- **Connection Issues**: Timeout handling for launcher server communication

223

- **Process Management**: Robust child process lifecycle management with cleanup