or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/maven-org-apache-spark--spark-yarn-2-12

Apache Spark YARN resource manager integration module that enables Spark applications to run on YARN clusters

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
mavenpkg:maven/org.apache.spark/spark-yarn_2.12@3.5.x

To install, run

npx @tessl/cli install tessl/maven-org-apache-spark--spark-yarn-2-12@3.5.0

0

# Apache Spark YARN Resource Manager

1

2

Apache Spark YARN Resource Manager provides integration between Apache Spark and YARN (Yet Another Resource Negotiator) for running Spark applications on Hadoop clusters. This module enables Spark to leverage YARN's resource management and scheduling capabilities, supporting both client and cluster deployment modes with comprehensive resource allocation, security, and monitoring features.

3

4

## Package Information

5

6

- **Package Name**: org.apache.spark:spark-yarn_2.12

7

- **Package Type**: maven

8

- **Language**: Scala

9

- **Installation**: Add dependency to `pom.xml` or include in Spark distribution

10

11

## Core Imports

12

13

```scala

14

import org.apache.spark.deploy.yarn.{Client, ApplicationMaster}

15

import org.apache.spark.scheduler.cluster.{YarnClusterManager, YarnSchedulerBackend}

16

import org.apache.spark.SparkConf

17

```

18

19

## Basic Usage

20

21

```scala

22

import org.apache.spark.{SparkConf, SparkContext}

23

24

// Configure Spark for YARN

25

val conf = new SparkConf()

26

.setAppName("MySparkApp")

27

.setMaster("yarn")

28

.set("spark.yarn.queue", "default")

29

.set("spark.yarn.am.memory", "1g")

30

.set("spark.executor.memory", "2g")

31

.set("spark.executor.cores", "2")

32

33

// Create SparkContext - YARN integration is handled automatically

34

val sc = new SparkContext(conf)

35

36

// Your Spark application code here

37

val rdd = sc.parallelize(1 to 100)

38

val result = rdd.map(_ * 2).collect()

39

40

sc.stop()

41

```

42

43

## Architecture

44

45

The Apache Spark YARN integration consists of several key components:

46

47

- **Application Management**: Client for submitting applications, ApplicationMaster for managing application lifecycle

48

- **Scheduler Integration**: YarnClusterManager for cluster management, scheduler backends for resource requests

49

- **Resource Management**: YarnAllocator for container allocation, placement strategies for optimal resource utilization

50

- **Executor Integration**: YARN-specific executor backend with container management

51

- **Configuration System**: Comprehensive YARN-specific configuration options

52

- **Security Integration**: Delegation token management and Kerberos authentication support

53

54

## Capabilities

55

56

### Application Management

57

58

Core components for submitting and managing Spark applications on YARN clusters. Handles application submission, monitoring, and lifecycle management.

59

60

```scala { .api }

61

class Client(

62

args: ClientArguments,

63

sparkConf: SparkConf,

64

rpcEnv: RpcEnv

65

)

66

67

class ApplicationMaster(

68

args: ApplicationMasterArguments,

69

sparkConf: SparkConf,

70

yarnConf: YarnConfiguration

71

)

72

```

73

74

[Application Management](./application-management.md)

75

76

### Scheduler Integration

77

78

Integration components that connect Spark's task scheduling system with YARN's resource management. Provides cluster manager and scheduler backends for both client and cluster modes.

79

80

```scala { .api }

81

class YarnClusterManager extends ExternalClusterManager

82

83

abstract class YarnSchedulerBackend(

84

scheduler: TaskSchedulerImpl,

85

sc: SparkContext

86

) extends CoarseGrainedSchedulerBackend

87

88

class YarnClientSchedulerBackend(

89

scheduler: TaskSchedulerImpl,

90

sc: SparkContext

91

) extends YarnSchedulerBackend

92

93

class YarnClusterSchedulerBackend(

94

scheduler: TaskSchedulerImpl,

95

sc: SparkContext

96

) extends YarnSchedulerBackend

97

```

98

99

[Scheduler Integration](./scheduler-integration.md)

100

101

### Resource Management

102

103

Components responsible for allocating and managing YARN containers for Spark executors. Includes allocation strategies, placement policies, and resource request management.

104

105

```scala { .api }

106

class YarnAllocator

107

108

class YarnRMClient

109

110

object ResourceRequestHelper

111

112

class LocalityPreferredContainerPlacementStrategy

113

```

114

115

[Resource Management](./resource-management.md)

116

117

### Configuration System

118

119

Comprehensive configuration system for YARN-specific settings including resource allocation, security, and deployment options.

120

121

```scala { .api }

122

package object config {

123

val APPLICATION_TAGS: ConfigEntry[Set[String]]

124

val QUEUE_NAME: ConfigEntry[String]

125

val AM_MEMORY: ConfigEntry[Long]

126

val AM_CORES: ConfigEntry[Int]

127

val EXECUTOR_NODE_LABEL_EXPRESSION: OptionalConfigEntry[String]

128

// ... and many more configuration options

129

}

130

131

class ClientArguments(args: Array[String])

132

class ApplicationMasterArguments(args: Array[String])

133

```

134

135

[Configuration System](./configuration.md)

136

137

## Types

138

139

### Core Application Types

140

141

```scala { .api }

142

case class YarnAppReport(

143

appState: YarnApplicationState,

144

finalState: FinalApplicationStatus,

145

diagnostics: Option[String]

146

)

147

148

class YarnClusterApplication extends SparkApplication {

149

def start(args: Array[String], conf: SparkConf): Unit

150

}

151

```

152

153

### Scheduler Types

154

155

```scala { .api }

156

class YarnScheduler(sc: SparkContext) extends TaskSchedulerImpl

157

class YarnClusterScheduler(sc: SparkContext) extends YarnScheduler

158

```

159

160

### Executor Types

161

162

```scala { .api }

163

class YarnCoarseGrainedExecutorBackend extends CoarseGrainedExecutorBackend {

164

def getUserClassPath: Seq[URL]

165

def extractLogUrls: Map[String, String]

166

def extractAttributes: Map[String, String]

167

}

168

169

class ExecutorRunnable {

170

def run(): Unit

171

def launchContextDebugInfo(): String

172

}

173

```

174

175

## Entry Points

176

177

### Primary Integration Points

178

179

- **yarn-client mode**: Applications run driver on local machine, executors on YARN

180

- **yarn-cluster mode**: Both driver and executors run on YARN cluster

181

- **Programmatic submission**: Use `Client` class for custom application submission

182

- **SparkSubmit integration**: Transparent integration when using `--master yarn`

183

184

### Main Classes

185

186

- `ApplicationMaster.main()` - Entry point for cluster mode ApplicationMaster

187

- `YarnCoarseGrainedExecutorBackend.main()` - Entry point for executor processes

188

- `YarnClusterApplication.start()` - Entry point for programmatic cluster mode submission

189

- `ExecutorLauncher.main()` - Entry point for client mode executor launcher