Library for launching Spark applications programmatically with monitoring and control capabilities.
npx @tessl/cli install tessl/maven-org-apache-spark--spark-launcher_2-11@2.4.0Apache Spark Launcher provides a programmatic API for launching and monitoring Spark applications from Java applications. It offers two primary launch modes: child process execution with full monitoring capabilities, and in-process execution for cluster deployments. The library handles Spark application lifecycle management including configuration, execution, state monitoring, and provides comprehensive control interfaces for running applications.
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-launcher_2.11</artifactId>
<version>2.4.8</version>
</dependency>import org.apache.spark.launcher.SparkLauncher;
import org.apache.spark.launcher.InProcessLauncher;
import org.apache.spark.launcher.SparkAppHandle;import org.apache.spark.launcher.SparkLauncher;
import org.apache.spark.launcher.SparkAppHandle;
// Configure and launch Spark application as child process
SparkAppHandle handle = new SparkLauncher()
.setAppResource("/path/to/my-app.jar")
.setMainClass("com.example.MySparkApp")
.setMaster("local[*]")
.setConf(SparkLauncher.DRIVER_MEMORY, "2g")
.setConf(SparkLauncher.EXECUTOR_MEMORY, "1g")
.setAppName("My Spark Application")
.startApplication();
// Monitor application state
handle.addListener(new SparkAppHandle.Listener() {
public void stateChanged(SparkAppHandle handle) {
System.out.println("State: " + handle.getState());
if (handle.getState().isFinal()) {
System.out.println("Application finished with ID: " + handle.getAppId());
}
}
public void infoChanged(SparkAppHandle handle) {
System.out.println("Info updated for app: " + handle.getAppId());
}
});
// Wait for completion or stop/kill if needed
if (handle.getState() == SparkAppHandle.State.RUNNING) {
handle.stop(); // Graceful shutdown
// handle.kill(); // Force kill if needed
}import org.apache.spark.launcher.SparkLauncher;
// Launch as raw process (manual management required)
Process sparkProcess = new SparkLauncher()
.setAppResource("/path/to/my-app.jar")
.setMainClass("com.example.MySparkApp")
.setMaster("yarn")
.setDeployMode("cluster")
.setConf(SparkLauncher.DRIVER_MEMORY, "4g")
.launch();
// Manual process management
int exitCode = sparkProcess.waitFor();
System.out.println("Spark application exited with code: " + exitCode);import org.apache.spark.launcher.InProcessLauncher;
import org.apache.spark.launcher.SparkAppHandle;
// Launch application in same JVM (cluster mode recommended)
SparkAppHandle handle = new InProcessLauncher()
.setAppResource("/path/to/my-app.jar")
.setMainClass("com.example.MySparkApp")
.setMaster("yarn")
.setDeployMode("cluster")
.setConf("spark.sql.adaptive.enabled", "true")
.startApplication();The Spark Launcher library is built around several key components:
SparkLauncher and InProcessLauncher provide fluent configuration APIs for different launch modesAbstractLauncher provides common configuration methods shared by both launcher implementationsSparkAppHandle provides runtime application control and monitoring with state-based lifecycle managementSparkAppHandle.State enum with final state detectionPrimary interfaces for launching Spark applications with comprehensive configuration options. Supports both child process and in-process execution modes.
// Child process launcher with monitoring
public class SparkLauncher extends AbstractLauncher<SparkLauncher> {
public SparkLauncher();
public SparkLauncher(Map<String, String> env);
public SparkAppHandle startApplication(SparkAppHandle.Listener... listeners);
public Process launch();
}
// In-process launcher (cluster mode recommended)
public class InProcessLauncher extends AbstractLauncher<InProcessLauncher> {
public SparkAppHandle startApplication(SparkAppHandle.Listener... listeners);
}Runtime control and monitoring interface for launched Spark applications. Provides state tracking, application control, and event notifications.
public interface SparkAppHandle {
void addListener(Listener l);
State getState();
String getAppId();
void stop();
void kill();
void disconnect();
public enum State {
UNKNOWN(false), CONNECTED(false), SUBMITTED(false), RUNNING(false),
FINISHED(true), FAILED(true), KILLED(true), LOST(true);
public boolean isFinal();
}
public interface Listener {
void stateChanged(SparkAppHandle handle);
void infoChanged(SparkAppHandle handle);
}
}Comprehensive configuration system with predefined constants for common Spark settings and fluent configuration methods.
public abstract class AbstractLauncher<T extends AbstractLauncher<T>> {
public T setPropertiesFile(String path);
public T setConf(String key, String value);
public T setAppName(String appName);
public T setMaster(String master);
public T setDeployMode(String mode);
public T setAppResource(String resource);
public T setMainClass(String mainClass);
public T addJar(String jar);
public T addFile(String file);
public T addPyFile(String file);
public T addAppArgs(String... args);
public T setVerbose(boolean verbose);
}
// Configuration constants in SparkLauncher
public static final String DRIVER_MEMORY = "spark.driver.memory";
public static final String EXECUTOR_MEMORY = "spark.executor.memory";
public static final String EXECUTOR_CORES = "spark.executor.cores";
// ... additional constantsUse SparkLauncher with monitoring to manage batch processing pipelines, track job completion, and handle failures gracefully.
Leverage SparkAppHandle state notifications to build interactive dashboards that display real-time Spark application status.
Deploy applications to YARN, Mesos, or Kubernetes clusters using cluster mode with proper resource allocation through configuration constants.
Use local mode execution for development and testing with simplified configuration and immediate feedback.
setSparkHome() configurationsetJavaHome() methodThe library provides multiple layers of error handling:
SparkAppHandle.State.FAILED