Tessl Tile for maven/org.apache.spark/yarn-parent_2.10@1.2.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

tessl/maven-org-apache-spark--yarn-parent-2-10

YARN integration support for Apache Spark cluster computing, enabling Spark applications to run on Hadoop YARN clusters

Workspace: tessl
Visibility: Public
Created: 3 months ago
Last updated: 3 months ago
Describes: pkg:maven/org.apache.spark/yarn-parent_2.10@1.2.x

To install, run

npx @tessl/cli install tessl/maven-org-apache-spark--yarn-parent-2-10@1.2.0

0
# Apache Spark YARN Integration
1

2
Apache Spark YARN Integration (`org.apache.spark:yarn-parent_2.10`) provides YARN (Yet Another Resource Negotiator) integration capabilities for Apache Spark, enabling Spark applications to run on Hadoop YARN clusters. This package serves as a comprehensive solution for deploying, managing, and monitoring Spark applications in Hadoop ecosystems.
3

4
## Package Information
5

6
- **Package Name**: yarn-parent_2.10
7
- **Package Type**: maven
8
- **Language**: Scala
9
- **Installation**: Include as Maven dependency: `org.apache.spark:yarn-parent_2.10:1.2.2`
10
- **Namespace**: `org.apache.spark.deploy.yarn`, `org.apache.spark.scheduler.cluster`
11

12
## Core Imports
13

14
```scala
15
import org.apache.spark.deploy.yarn.{Client, ClientArguments, ApplicationMaster, ApplicationMasterArguments}
16
import org.apache.spark.deploy.yarn.{YarnRMClient, YarnAllocator, AllocationType}
17
import org.apache.spark.scheduler.cluster.{YarnClientSchedulerBackend, YarnClusterSchedulerBackend}
18
import org.apache.hadoop.conf.Configuration
19
import org.apache.hadoop.yarn.api.records._
20
```
21

22
## Basic Usage
23

24
```scala
25
import org.apache.spark.deploy.yarn.{Client, ClientArguments}
26
import org.apache.spark.{SparkConf, SparkContext}
27
import org.apache.hadoop.conf.Configuration
28

29
// Create Spark configuration
30
val sparkConf = new SparkConf()
31
  .setAppName("My Spark App")
32
  .set("spark.executor.memory", "2g")
33
  .set("spark.executor.cores", "2")
34

35
// Configure YARN client
36
val hadoopConf = new Configuration()
37
val args = Array("--jar", "my-app.jar", "--class", "MyMainClass")
38
val clientArgs = new ClientArguments(args, sparkConf)
39

40
// Submit application to YARN
41
val client = new Client(clientArgs, hadoopConf, sparkConf)
42
// Application submission handled by Spark runtime
43
```
44

45
## Architecture
46

47
The Spark YARN integration is built around several key architectural components:
48

49
- **Multi-version Support**: Common abstractions with version-specific implementations for YARN alpha (deprecated) and stable APIs
50
- **Client-Server Model**: YARN client handles application submission, ApplicationMaster manages application lifecycle
51
- **Resource Management**: Centralized allocation and monitoring through dedicated resource management classes
52
- **Scheduler Integration**: Custom scheduler backends for both client and cluster deployment modes
53
- **Configuration-driven**: Extensive configuration options through ClientArguments and SparkConf integration
54

55
## Capabilities
56

57
### YARN Client Management
58

59
Core client functionality for submitting and managing Spark applications on YARN clusters. Handles application lifecycle, resource negotiation, and monitoring.
60

61
```scala { .api }
62
class Client(
63
  args: ClientArguments, 
64
  hadoopConf: Configuration, 
65
  sparkConf: SparkConf
66
) extends ClientBase {
67
  def stop(): Unit
68
}
69

70
private[spark] class ClientArguments(args: Array[String], sparkConf: SparkConf) {
71
  var addJars: String
72
  var files: String
73
  var archives: String
74
  var userJar: String
75
  var userClass: String
76
  var userArgs: Seq[String]
77
  var executorMemory: Int
78
  var executorCores: Int
79
  var numExecutors: Int
80
  var amQueue: String
81
  var amMemory: Int
82
  var appName: String
83
  var priority: Int
84
  val amMemoryOverhead: Int
85
  val executorMemoryOverhead: Int
86
}
87
```
88

89
[YARN Client Management](./yarn-client.md)
90

91
### Application Master
92

93
ApplicationMaster functionality for managing Spark applications running on YARN. Handles resource negotiation with ResourceManager and executor lifecycle management.
94

95
```scala { .api }
96
class ApplicationMaster(
97
  args: ApplicationMasterArguments, 
98
  client: YarnRMClient
99
) {
100
  // Application lifecycle management
101
  // Resource negotiation with YARN ResourceManager  
102
  // Executor management and monitoring
103
}
104

105
class ApplicationMasterArguments(val args: Array[String]) {
106
  var userJar: String
107
  var userClass: String
108
  var userArgs: Seq[String]
109
  var executorMemory: Int
110
  var executorCores: Int
111
  var numExecutors: Int
112
  def printUsageAndExit(exitCode: Int, unknownParam: Any = null): Unit
113
}
114
```
115

116
[Application Master](./application-master.md)
117

118
### Scheduler Backends
119

120
Scheduler backend implementations for integrating Spark's TaskScheduler with YARN resource management, supporting both client and cluster deployment modes.
121

122
```scala { .api }
123
class YarnClientSchedulerBackend(
124
  scheduler: TaskSchedulerImpl, 
125
  sc: SparkContext
126
) extends YarnSchedulerBackend {
127
  def start(): Unit
128
}
129

130
class YarnClusterSchedulerBackend extends YarnSchedulerBackend
131

132
class YarnClientClusterScheduler(sc: SparkContext) extends TaskSchedulerImpl
133

134
class YarnClusterScheduler(sc: SparkContext) extends TaskSchedulerImpl
135
```
136

137
[Scheduler Backends](./scheduler-backends.md)
138

139
### Resource Management
140

141
Resource allocation and management components for negotiating and monitoring YARN cluster resources for Spark executors.
142

143
```scala { .api }
144
trait YarnRMClient {
145
  // ResourceManager client interface
146
}
147

148
abstract class YarnAllocator {
149
  // Abstract base class for YARN resource allocation logic
150
}
151

152
object AllocationType extends Enumeration {
153
  // Enumeration for YARN allocation types
154
}
155
```
156

157
[Resource Management](./resource-management.md)
158

159
### Utilities and Configuration
160

161
Utility classes and configuration management for YARN-specific operations, distributed cache management, and executor container handling.
162

163
```scala { .api }
164
class YarnSparkHadoopUtil extends SparkHadoopUtil {
165
  // YARN-specific Hadoop utilities
166
}
167

168
class ClientDistributedCacheManager {
169
  // Manages distributed cache for YARN applications  
170
}
171

172
trait ExecutorRunnableUtil {
173
  // Utility trait for executor container management
174
}
175
```
176

177
[Utilities and Configuration](./utilities.md)
178

179
## Types
180

181
```scala { .api }
182
// Core argument and configuration types
183
private[spark] class ClientArguments(args: Array[String], sparkConf: SparkConf)
184
class ApplicationMasterArguments(val args: Array[String])
185

186
// Resource management interfaces
187
trait YarnRMClient
188
class YarnRMClientImpl(args: ApplicationMasterArguments) extends YarnRMClient
189
private[yarn] abstract class YarnAllocator(
190
  conf: Configuration,
191
  sparkConf: SparkConf,
192
  appAttemptId: ApplicationAttemptId,
193
  args: ApplicationMasterArguments,
194
  preferredNodes: collection.Map[String, collection.Set[SplitInfo]],
195
  securityMgr: SecurityManager
196
) extends Logging
197

198
// Allocation strategy enumeration
199
object AllocationType extends Enumeration {
200
  type AllocationType = Value
201
  val HOST, RACK, ANY = Value
202
}
203

204
// Scheduler and backend types  
205
private[spark] abstract class YarnSchedulerBackend extends CoarseGrainedSchedulerBackend
206
private[spark] class TaskSchedulerImpl extends TaskScheduler
207

208
// Client and utility traits
209
private[spark] trait ClientBase
210
trait ExecutorRunnableUtil
211
```
212

213
## Deployment Modes
214

215
The YARN integration supports two primary deployment modes:
216

217
- **Client Mode**: Driver runs on the client machine, ApplicationMaster manages only executors
218
- **Cluster Mode**: Driver runs inside ApplicationMaster on YARN cluster
219

220
Both modes are handled transparently through the appropriate scheduler backend selection.
221

222
## Version Compatibility
223

224
This package provides support for multiple Hadoop YARN API versions:
225

226
- **Alpha API** (Deprecated): Hadoop 0.23 and 2.0.x - marked deprecated in Spark 1.2
227
- **Stable API** (Recommended): Hadoop 2.2+ - current implementation
228

229
The build system automatically selects the appropriate implementation based on Maven profiles.