or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

application-handles.mdconfiguration.mdindex.mdlaunchers.md
tile.json

tessl/maven-org-apache-spark--spark-launcher_2-11

Library for launching Spark applications programmatically with monitoring and control capabilities.

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
mavenpkg:maven/org.apache.spark/spark-launcher_2.11@2.4.x

To install, run

npx @tessl/cli install tessl/maven-org-apache-spark--spark-launcher_2-11@2.4.0

index.mddocs/

Apache Spark Launcher

Apache Spark Launcher provides a programmatic API for launching and monitoring Spark applications from Java applications. It offers two primary launch modes: child process execution with full monitoring capabilities, and in-process execution for cluster deployments. The library handles Spark application lifecycle management including configuration, execution, state monitoring, and provides comprehensive control interfaces for running applications.

Package Information

  • Package Name: org.apache.spark:spark-launcher_2.11
  • Package Type: Maven (Java)
  • Language: Java
  • Installation: Add to Maven dependencies:
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-launcher_2.11</artifactId>
      <version>2.4.8</version>
    </dependency>

Core Imports

import org.apache.spark.launcher.SparkLauncher;
import org.apache.spark.launcher.InProcessLauncher;
import org.apache.spark.launcher.SparkAppHandle;

Basic Usage

Child Process Launch with Monitoring

import org.apache.spark.launcher.SparkLauncher;
import org.apache.spark.launcher.SparkAppHandle;

// Configure and launch Spark application as child process
SparkAppHandle handle = new SparkLauncher()
    .setAppResource("/path/to/my-app.jar")
    .setMainClass("com.example.MySparkApp")
    .setMaster("local[*]")
    .setConf(SparkLauncher.DRIVER_MEMORY, "2g")
    .setConf(SparkLauncher.EXECUTOR_MEMORY, "1g")
    .setAppName("My Spark Application")
    .startApplication();

// Monitor application state
handle.addListener(new SparkAppHandle.Listener() {
    public void stateChanged(SparkAppHandle handle) {
        System.out.println("State: " + handle.getState());
        if (handle.getState().isFinal()) {
            System.out.println("Application finished with ID: " + handle.getAppId());
        }
    }
    
    public void infoChanged(SparkAppHandle handle) {
        System.out.println("Info updated for app: " + handle.getAppId());
    }
});

// Wait for completion or stop/kill if needed
if (handle.getState() == SparkAppHandle.State.RUNNING) {
    handle.stop(); // Graceful shutdown
    // handle.kill(); // Force kill if needed
}

Raw Process Launch

import org.apache.spark.launcher.SparkLauncher;

// Launch as raw process (manual management required)
Process sparkProcess = new SparkLauncher()
    .setAppResource("/path/to/my-app.jar")
    .setMainClass("com.example.MySparkApp")
    .setMaster("yarn")
    .setDeployMode("cluster")
    .setConf(SparkLauncher.DRIVER_MEMORY, "4g")
    .launch();

// Manual process management
int exitCode = sparkProcess.waitFor();
System.out.println("Spark application exited with code: " + exitCode);

In-Process Launch (Cluster Mode)

import org.apache.spark.launcher.InProcessLauncher;
import org.apache.spark.launcher.SparkAppHandle;

// Launch application in same JVM (cluster mode recommended)
SparkAppHandle handle = new InProcessLauncher()
    .setAppResource("/path/to/my-app.jar")
    .setMainClass("com.example.MySparkApp")
    .setMaster("yarn")
    .setDeployMode("cluster")
    .setConf("spark.sql.adaptive.enabled", "true")
    .startApplication();

Architecture

The Spark Launcher library is built around several key components:

  • Launcher Classes: SparkLauncher and InProcessLauncher provide fluent configuration APIs for different launch modes
  • Abstract Base: AbstractLauncher provides common configuration methods shared by both launcher implementations
  • Handle Interface: SparkAppHandle provides runtime application control and monitoring with state-based lifecycle management
  • State Management: Comprehensive state tracking through SparkAppHandle.State enum with final state detection
  • Event System: Listener-based callbacks for real-time application state and information updates
  • Configuration System: Extensive configuration options through constants and fluent methods
  • Process Management: Robust child process handling with output redirection and logging capabilities

Capabilities

Application Launchers

Primary interfaces for launching Spark applications with comprehensive configuration options. Supports both child process and in-process execution modes.

// Child process launcher with monitoring
public class SparkLauncher extends AbstractLauncher<SparkLauncher> {
    public SparkLauncher();
    public SparkLauncher(Map<String, String> env);
    public SparkAppHandle startApplication(SparkAppHandle.Listener... listeners);
    public Process launch();
}

// In-process launcher (cluster mode recommended)
public class InProcessLauncher extends AbstractLauncher<InProcessLauncher> {
    public SparkAppHandle startApplication(SparkAppHandle.Listener... listeners);
}

Application Launchers

Application Handles

Runtime control and monitoring interface for launched Spark applications. Provides state tracking, application control, and event notifications.

public interface SparkAppHandle {
    void addListener(Listener l);
    State getState();
    String getAppId();
    void stop();
    void kill();
    void disconnect();
    
    public enum State {
        UNKNOWN(false), CONNECTED(false), SUBMITTED(false), RUNNING(false),
        FINISHED(true), FAILED(true), KILLED(true), LOST(true);
        
        public boolean isFinal();
    }
    
    public interface Listener {
        void stateChanged(SparkAppHandle handle);
        void infoChanged(SparkAppHandle handle);
    }
}

Application Handles

Configuration Management

Comprehensive configuration system with predefined constants for common Spark settings and fluent configuration methods.

public abstract class AbstractLauncher<T extends AbstractLauncher<T>> {
    public T setPropertiesFile(String path);
    public T setConf(String key, String value);
    public T setAppName(String appName);
    public T setMaster(String master);
    public T setDeployMode(String mode);
    public T setAppResource(String resource);
    public T setMainClass(String mainClass);
    public T addJar(String jar);
    public T addFile(String file);
    public T addPyFile(String file);
    public T addAppArgs(String... args);
    public T setVerbose(boolean verbose);
}

// Configuration constants in SparkLauncher
public static final String DRIVER_MEMORY = "spark.driver.memory";
public static final String EXECUTOR_MEMORY = "spark.executor.memory";
public static final String EXECUTOR_CORES = "spark.executor.cores";
// ... additional constants

Configuration Management

Common Use Cases

Batch Job Orchestration

Use SparkLauncher with monitoring to manage batch processing pipelines, track job completion, and handle failures gracefully.

Interactive Application Management

Leverage SparkAppHandle state notifications to build interactive dashboards that display real-time Spark application status.

Cluster Resource Management

Deploy applications to YARN, Mesos, or Kubernetes clusters using cluster mode with proper resource allocation through configuration constants.

Development and Testing

Use local mode execution for development and testing with simplified configuration and immediate feedback.

Environment Requirements

  • Spark Installation: Child process launches require SPARK_HOME environment variable or explicit setSparkHome() configuration
  • Java Runtime: Custom JAVA_HOME can be set via setJavaHome() method
  • Classpath: In-process launches require Spark dependencies in application classpath
  • Cluster Integration: Supports YARN, Mesos, Kubernetes, and Standalone cluster managers
  • Platform Support: Cross-platform with Windows-specific command handling

Error Handling

The library provides multiple layers of error handling:

  • Configuration Validation: Parameter validation with descriptive error messages
  • Launch Failures: IOException handling for process creation failures
  • Runtime Monitoring: State-based error detection through SparkAppHandle.State.FAILED
  • Connection Issues: Timeout handling for launcher server communication
  • Process Management: Robust child process lifecycle management with cleanup