tessl/maven-co-cask-cdap--cdap-spark-core

Core Spark 1.x integration component for the Cask Data Application Platform providing runtime services and execution context for CDAP applications

—

Pending

Overview

Eval results

Files

Distributed Execution

Name: tessl/maven-co-cask-cdap--cdap-spark-core
Author: tessl

Distributed execution framework built on Apache Twill that provides scalable, fault-tolerant Spark application deployment across YARN clusters with proper resource management, lifecycle control, and integration with CDAP's distributed application infrastructure.

Capabilities

Spark Execution Service

Service for managing distributed Spark execution with full lifecycle management and resource allocation across cluster nodes.

/**
 * Service for managing distributed Spark execution
 * Provides scalable deployment and management of Spark applications across clusters
 */
public class SparkExecutionService {
    /**
     * Submits a Spark program for distributed execution
     * @param programRunId Unique identifier for the program run
     * @param programOptions Configuration options for program execution
     * @return Future containing the program controller for managing execution
     * @throws ExecutionException if submission fails
     */
    public ListenableFuture<ProgramController> submit(ProgramRunId programRunId, ProgramOptions programOptions);
    
    /**
     * Stops the execution service and all running programs
     * Gracefully shuts down all managed Spark applications
     */
    public void stop();
    
    /**
     * Gets the current state of the execution service
     * @return ServiceState indicating current service status
     */
    public ServiceState getState();
    
    /**
     * Gets information about running programs
     * @return Set of ProgramRunId for currently running programs
     */
    public Set<ProgramRunId> getRunningPrograms();
    
    /**
     * Gets program controller for a specific run
     * @param programRunId Program run identifier
     * @return ProgramController for the specified run, or null if not found
     */
    public ProgramController getProgramController(ProgramRunId programRunId);
}

Spark Twill Runnable

Twill runnable implementation that enables Spark applications to run as distributed applications with proper resource management and fault tolerance.

/**
 * Twill runnable for distributed Spark execution
 * Enables Spark applications to run as distributed services with fault tolerance
 */
public class SparkTwillRunnable implements TwillRunnable {
    /**
     * Main execution method for the runnable
     * Starts the Spark application and manages its lifecycle
     */
    public void run();
    
    /**
     * Stops the running Spark application gracefully
     * Ensures proper cleanup of resources and state
     */
    public void stop();
    
    /**
     * Handles commands sent to the running application
     * @param command Command to execute
     * @throws Exception if command execution fails
     */
    public void handleCommand(Command command) throws Exception;
    
    /**
     * Initializes the runnable with context
     * @param context Twill runtime context
     */
    public void initialize(TwillContext context);
    
    /**
     * Destroys the runnable and cleans up resources
     */
    public void destroy();
    
    /**
     * Gets the Twill context
     * @return TwillContext for accessing runtime information
     */
    protected TwillContext getContext();
}

Spark Twill Program Controller

Program controller implementation for managing distributed Spark execution through the Twill framework.

/**
 * Program controller for distributed Spark execution via Twill
 * Provides lifecycle management and command interface for distributed Spark programs
 */
public class SparkTwillProgramController implements ProgramController {
    /**
     * Sends a command to the distributed Spark program
     * @param command Command name to execute
     * @param args Command arguments
     * @return Future representing the command execution result
     * @throws Exception if command execution fails
     */
    public ListenableFuture<ProgramController> command(String command, Object... args) throws Exception;
    
    /**
     * Stops the distributed Spark program gracefully
     * @return Future representing the stop operation
     * @throws Exception if stop operation fails
     */
    public ListenableFuture<ProgramController> stop() throws Exception;
    
    /**
     * Kills the distributed Spark program forcefully
     * @return Future representing the kill operation
     */
    public ListenableFuture<ProgramController> kill();
    
    /**
     * Gets the current state of the program
     * @return Current program state
     */
    public State getState();
    
    /**
     * Gets the program run ID
     * @return ProgramRunId identifying this program run
     */
    public ProgramRunId getProgramRunId();
    
    /**
     * Gets the Twill controller for low-level operations
     * @return TwillController for direct Twill operations
     */
    public TwillController getTwillController();
    
    /**
     * Gets resource report for the running program
     * @return ResourceReport containing resource usage information
     */
    public ResourceReport getResourceReport();
    
    /**
     * Adds a listener for program state changes
     * @param listener Listener to be notified of state changes
     */
    public void addListener(Listener listener);
}