YARN integration support for Apache Spark cluster computing, enabling Spark applications to run on Hadoop YARN clusters
—
ApplicationMaster functionality for managing Spark applications running on YARN. The ApplicationMaster serves as the central coordinator for Spark applications, handling resource negotiation with the YARN ResourceManager and managing executor lifecycle.
Core ApplicationMaster implementation that manages the Spark application lifecycle on YARN clusters.
/**
* ApplicationMaster for managing Spark applications on YARN
* Handles resource negotiation with ResourceManager and executor management
*/
private[spark] class ApplicationMaster(
args: ApplicationMasterArguments,
client: YarnRMClient
) {
// Application lifecycle management
// Resource negotiation with YARN ResourceManager
// Executor management and monitoring
// Integration with Spark driver (cluster mode) or client (client mode)
}Usage Examples:
import org.apache.spark.deploy.yarn.{ApplicationMaster, ApplicationMasterArguments}
// ApplicationMaster is typically instantiated by YARN runtime
val args = new ApplicationMasterArguments(Array("--class", "MyMainClass"))
val rmClient = // YarnRMClient implementation
val appMaster = new ApplicationMaster(args, rmClient)
// ApplicationMaster lifecycle is managed by YARN container runtimeArgument parsing and configuration for ApplicationMaster operations.
/**
* Argument parsing for ApplicationMaster
* Handles command-line arguments passed to the ApplicationMaster container
*/
class ApplicationMasterArguments(val args: Array[String]) {
/** User application JAR file */
var userJar: String = null
/** User application main class */
var userClass: String = null
/** Arguments to pass to user application */
var userArgs: Seq[String] = Seq[String]()
/** Executor memory in MB (default: 1024) */
var executorMemory: Int = 1024
/** Number of cores per executor (default: 1) */
var executorCores: Int = 1
/** Total number of executors to request (default: DEFAULT_NUMBER_EXECUTORS) */
var numExecutors: Int = DEFAULT_NUMBER_EXECUTORS
/**
* Print usage information and exit with specified exit code
* @param exitCode Exit code to use
* @param unknownParam Optional unknown parameter that caused the error
*/
def printUsageAndExit(exitCode: Int, unknownParam: Any = null): Unit
}
/**
* Companion object for ApplicationMasterArguments
*/
object ApplicationMasterArguments {
val DEFAULT_NUMBER_EXECUTORS = 2
}Usage Examples:
import org.apache.spark.deploy.yarn.ApplicationMasterArguments
// Parse ApplicationMaster arguments (typically from YARN container launch)
val args = Array(
"--class", "com.example.SparkApp",
"--jar", "/path/to/app.jar",
"--executor-memory", "2g",
"--executor-cores", "2"
)
val amArgs = new ApplicationMasterArguments(args)Command-line entry points for ApplicationMaster and ExecutorLauncher operations.
/**
* Main entry point for ApplicationMaster
* Invoked by YARN when starting the ApplicationMaster container
*/
object ApplicationMaster {
def main(args: Array[String]): Unit
}
/**
* Entry point for executor launcher functionality
* Used in client mode when ApplicationMaster only manages executors
*/
object ExecutorLauncher {
def main(args: Array[String]): Unit
}The ApplicationMaster negotiates with the YARN ResourceManager to:
In cluster mode, the ApplicationMaster:
In client mode, the ApplicationMaster:
// Example of ApplicationMaster integration patterns
// (Internal implementation details - shown for understanding)
class ApplicationMaster(args: ApplicationMasterArguments, client: YarnRMClient) {
// Resource negotiation loop
private def allocateExecutors(): Unit = {
// Request containers from ResourceManager
// Launch executor processes in allocated containers
// Monitor executor health and handle failures
}
// Driver integration (cluster mode)
private def runDriver(): Unit = {
// Start Spark driver within ApplicationMaster JVM
// Handle driver completion and cleanup
}
// Client communication (client mode)
private def connectToDriver(): Unit = {
// Establish connection to remote Spark driver
// Report executor status and handle commands
}
}The ApplicationMaster integrates with Spark configuration for:
// Key configuration properties used by ApplicationMaster
spark.yarn.am.memory // ApplicationMaster memory
spark.yarn.am.cores // ApplicationMaster CPU cores
spark.yarn.am.waitTime // Max wait time for SparkContext
spark.yarn.containerLauncherMaxThreads // Executor launch parallelism
spark.yarn.executor.memoryFraction // Executor memory allocationIntegration with Hadoop YARN configuration:
// YARN properties affecting ApplicationMaster behavior
yarn.app.mapreduce.am.resource.mb // Memory resource limits
yarn.app.mapreduce.am.command-opts // JVM options
yarn.nodemanager.aux-services // Required auxiliary services// In cluster mode, ApplicationMaster hosts the driver
object ApplicationMaster {
def main(args: Array[String]): Unit = {
// Parse arguments
val amArgs = new ApplicationMasterArguments(args)
// Create ApplicationMaster instance
val appMaster = new ApplicationMaster(amArgs, yarnRMClient)
// Run driver within ApplicationMaster
// Handle application completion
}
}// In client mode, ApplicationMaster manages only executors
object ExecutorLauncher {
def main(args: Array[String]): Unit = {
// ApplicationMaster acts as executor launcher
// Connects to remote driver
// Manages executor containers only
}
}The ApplicationMaster handles various failure scenarios:
ApplicationMaster provides monitoring capabilities:
Install with Tessl CLI
npx tessl i tessl/maven-org-apache-spark--yarn-parent-2-10