or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/maven-org-apache-spark--yarn-parent-2-10

YARN integration support for Apache Spark cluster computing, enabling Spark applications to run on Hadoop YARN clusters

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
mavenpkg:maven/org.apache.spark/yarn-parent_2.10@1.2.x

To install, run

npx @tessl/cli install tessl/maven-org-apache-spark--yarn-parent-2-10@1.2.0

0

# Apache Spark YARN Integration

1

2

Apache Spark YARN Integration (`org.apache.spark:yarn-parent_2.10`) provides YARN (Yet Another Resource Negotiator) integration capabilities for Apache Spark, enabling Spark applications to run on Hadoop YARN clusters. This package serves as a comprehensive solution for deploying, managing, and monitoring Spark applications in Hadoop ecosystems.

3

4

## Package Information

5

6

- **Package Name**: yarn-parent_2.10

7

- **Package Type**: maven

8

- **Language**: Scala

9

- **Installation**: Include as Maven dependency: `org.apache.spark:yarn-parent_2.10:1.2.2`

10

- **Namespace**: `org.apache.spark.deploy.yarn`, `org.apache.spark.scheduler.cluster`

11

12

## Core Imports

13

14

```scala

15

import org.apache.spark.deploy.yarn.{Client, ClientArguments, ApplicationMaster, ApplicationMasterArguments}

16

import org.apache.spark.deploy.yarn.{YarnRMClient, YarnAllocator, AllocationType}

17

import org.apache.spark.scheduler.cluster.{YarnClientSchedulerBackend, YarnClusterSchedulerBackend}

18

import org.apache.hadoop.conf.Configuration

19

import org.apache.hadoop.yarn.api.records._

20

```

21

22

## Basic Usage

23

24

```scala

25

import org.apache.spark.deploy.yarn.{Client, ClientArguments}

26

import org.apache.spark.{SparkConf, SparkContext}

27

import org.apache.hadoop.conf.Configuration

28

29

// Create Spark configuration

30

val sparkConf = new SparkConf()

31

.setAppName("My Spark App")

32

.set("spark.executor.memory", "2g")

33

.set("spark.executor.cores", "2")

34

35

// Configure YARN client

36

val hadoopConf = new Configuration()

37

val args = Array("--jar", "my-app.jar", "--class", "MyMainClass")

38

val clientArgs = new ClientArguments(args, sparkConf)

39

40

// Submit application to YARN

41

val client = new Client(clientArgs, hadoopConf, sparkConf)

42

// Application submission handled by Spark runtime

43

```

44

45

## Architecture

46

47

The Spark YARN integration is built around several key architectural components:

48

49

- **Multi-version Support**: Common abstractions with version-specific implementations for YARN alpha (deprecated) and stable APIs

50

- **Client-Server Model**: YARN client handles application submission, ApplicationMaster manages application lifecycle

51

- **Resource Management**: Centralized allocation and monitoring through dedicated resource management classes

52

- **Scheduler Integration**: Custom scheduler backends for both client and cluster deployment modes

53

- **Configuration-driven**: Extensive configuration options through ClientArguments and SparkConf integration

54

55

## Capabilities

56

57

### YARN Client Management

58

59

Core client functionality for submitting and managing Spark applications on YARN clusters. Handles application lifecycle, resource negotiation, and monitoring.

60

61

```scala { .api }

62

class Client(

63

args: ClientArguments,

64

hadoopConf: Configuration,

65

sparkConf: SparkConf

66

) extends ClientBase {

67

def stop(): Unit

68

}

69

70

private[spark] class ClientArguments(args: Array[String], sparkConf: SparkConf) {

71

var addJars: String

72

var files: String

73

var archives: String

74

var userJar: String

75

var userClass: String

76

var userArgs: Seq[String]

77

var executorMemory: Int

78

var executorCores: Int

79

var numExecutors: Int

80

var amQueue: String

81

var amMemory: Int

82

var appName: String

83

var priority: Int

84

val amMemoryOverhead: Int

85

val executorMemoryOverhead: Int

86

}

87

```

88

89

[YARN Client Management](./yarn-client.md)

90

91

### Application Master

92

93

ApplicationMaster functionality for managing Spark applications running on YARN. Handles resource negotiation with ResourceManager and executor lifecycle management.

94

95

```scala { .api }

96

class ApplicationMaster(

97

args: ApplicationMasterArguments,

98

client: YarnRMClient

99

) {

100

// Application lifecycle management

101

// Resource negotiation with YARN ResourceManager

102

// Executor management and monitoring

103

}

104

105

class ApplicationMasterArguments(val args: Array[String]) {

106

var userJar: String

107

var userClass: String

108

var userArgs: Seq[String]

109

var executorMemory: Int

110

var executorCores: Int

111

var numExecutors: Int

112

def printUsageAndExit(exitCode: Int, unknownParam: Any = null): Unit

113

}

114

```

115

116

[Application Master](./application-master.md)

117

118

### Scheduler Backends

119

120

Scheduler backend implementations for integrating Spark's TaskScheduler with YARN resource management, supporting both client and cluster deployment modes.

121

122

```scala { .api }

123

class YarnClientSchedulerBackend(

124

scheduler: TaskSchedulerImpl,

125

sc: SparkContext

126

) extends YarnSchedulerBackend {

127

def start(): Unit

128

}

129

130

class YarnClusterSchedulerBackend extends YarnSchedulerBackend

131

132

class YarnClientClusterScheduler(sc: SparkContext) extends TaskSchedulerImpl

133

134

class YarnClusterScheduler(sc: SparkContext) extends TaskSchedulerImpl

135

```

136

137

[Scheduler Backends](./scheduler-backends.md)

138

139

### Resource Management

140

141

Resource allocation and management components for negotiating and monitoring YARN cluster resources for Spark executors.

142

143

```scala { .api }

144

trait YarnRMClient {

145

// ResourceManager client interface

146

}

147

148

abstract class YarnAllocator {

149

// Abstract base class for YARN resource allocation logic

150

}

151

152

object AllocationType extends Enumeration {

153

// Enumeration for YARN allocation types

154

}

155

```

156

157

[Resource Management](./resource-management.md)

158

159

### Utilities and Configuration

160

161

Utility classes and configuration management for YARN-specific operations, distributed cache management, and executor container handling.

162

163

```scala { .api }

164

class YarnSparkHadoopUtil extends SparkHadoopUtil {

165

// YARN-specific Hadoop utilities

166

}

167

168

class ClientDistributedCacheManager {

169

// Manages distributed cache for YARN applications

170

}

171

172

trait ExecutorRunnableUtil {

173

// Utility trait for executor container management

174

}

175

```

176

177

[Utilities and Configuration](./utilities.md)

178

179

## Types

180

181

```scala { .api }

182

// Core argument and configuration types

183

private[spark] class ClientArguments(args: Array[String], sparkConf: SparkConf)

184

class ApplicationMasterArguments(val args: Array[String])

185

186

// Resource management interfaces

187

trait YarnRMClient

188

class YarnRMClientImpl(args: ApplicationMasterArguments) extends YarnRMClient

189

private[yarn] abstract class YarnAllocator(

190

conf: Configuration,

191

sparkConf: SparkConf,

192

appAttemptId: ApplicationAttemptId,

193

args: ApplicationMasterArguments,

194

preferredNodes: collection.Map[String, collection.Set[SplitInfo]],

195

securityMgr: SecurityManager

196

) extends Logging

197

198

// Allocation strategy enumeration

199

object AllocationType extends Enumeration {

200

type AllocationType = Value

201

val HOST, RACK, ANY = Value

202

}

203

204

// Scheduler and backend types

205

private[spark] abstract class YarnSchedulerBackend extends CoarseGrainedSchedulerBackend

206

private[spark] class TaskSchedulerImpl extends TaskScheduler

207

208

// Client and utility traits

209

private[spark] trait ClientBase

210

trait ExecutorRunnableUtil

211

```

212

213

## Deployment Modes

214

215

The YARN integration supports two primary deployment modes:

216

217

- **Client Mode**: Driver runs on the client machine, ApplicationMaster manages only executors

218

- **Cluster Mode**: Driver runs inside ApplicationMaster on YARN cluster

219

220

Both modes are handled transparently through the appropriate scheduler backend selection.

221

222

## Version Compatibility

223

224

This package provides support for multiple Hadoop YARN API versions:

225

226

- **Alpha API** (Deprecated): Hadoop 0.23 and 2.0.x - marked deprecated in Spark 1.2

227

- **Stable API** (Recommended): Hadoop 2.2+ - current implementation

228

229

The build system automatically selects the appropriate implementation based on Maven profiles.