YARN integration support for Apache Spark cluster computing, enabling Spark applications to run on Hadoop YARN clusters
npx @tessl/cli install tessl/maven-org-apache-spark--yarn-parent-2-10@1.2.00
# Apache Spark YARN Integration
1
2
Apache Spark YARN Integration (`org.apache.spark:yarn-parent_2.10`) provides YARN (Yet Another Resource Negotiator) integration capabilities for Apache Spark, enabling Spark applications to run on Hadoop YARN clusters. This package serves as a comprehensive solution for deploying, managing, and monitoring Spark applications in Hadoop ecosystems.
3
4
## Package Information
5
6
- **Package Name**: yarn-parent_2.10
7
- **Package Type**: maven
8
- **Language**: Scala
9
- **Installation**: Include as Maven dependency: `org.apache.spark:yarn-parent_2.10:1.2.2`
10
- **Namespace**: `org.apache.spark.deploy.yarn`, `org.apache.spark.scheduler.cluster`
11
12
## Core Imports
13
14
```scala
15
import org.apache.spark.deploy.yarn.{Client, ClientArguments, ApplicationMaster, ApplicationMasterArguments}
16
import org.apache.spark.deploy.yarn.{YarnRMClient, YarnAllocator, AllocationType}
17
import org.apache.spark.scheduler.cluster.{YarnClientSchedulerBackend, YarnClusterSchedulerBackend}
18
import org.apache.hadoop.conf.Configuration
19
import org.apache.hadoop.yarn.api.records._
20
```
21
22
## Basic Usage
23
24
```scala
25
import org.apache.spark.deploy.yarn.{Client, ClientArguments}
26
import org.apache.spark.{SparkConf, SparkContext}
27
import org.apache.hadoop.conf.Configuration
28
29
// Create Spark configuration
30
val sparkConf = new SparkConf()
31
.setAppName("My Spark App")
32
.set("spark.executor.memory", "2g")
33
.set("spark.executor.cores", "2")
34
35
// Configure YARN client
36
val hadoopConf = new Configuration()
37
val args = Array("--jar", "my-app.jar", "--class", "MyMainClass")
38
val clientArgs = new ClientArguments(args, sparkConf)
39
40
// Submit application to YARN
41
val client = new Client(clientArgs, hadoopConf, sparkConf)
42
// Application submission handled by Spark runtime
43
```
44
45
## Architecture
46
47
The Spark YARN integration is built around several key architectural components:
48
49
- **Multi-version Support**: Common abstractions with version-specific implementations for YARN alpha (deprecated) and stable APIs
50
- **Client-Server Model**: YARN client handles application submission, ApplicationMaster manages application lifecycle
51
- **Resource Management**: Centralized allocation and monitoring through dedicated resource management classes
52
- **Scheduler Integration**: Custom scheduler backends for both client and cluster deployment modes
53
- **Configuration-driven**: Extensive configuration options through ClientArguments and SparkConf integration
54
55
## Capabilities
56
57
### YARN Client Management
58
59
Core client functionality for submitting and managing Spark applications on YARN clusters. Handles application lifecycle, resource negotiation, and monitoring.
60
61
```scala { .api }
62
class Client(
63
args: ClientArguments,
64
hadoopConf: Configuration,
65
sparkConf: SparkConf
66
) extends ClientBase {
67
def stop(): Unit
68
}
69
70
private[spark] class ClientArguments(args: Array[String], sparkConf: SparkConf) {
71
var addJars: String
72
var files: String
73
var archives: String
74
var userJar: String
75
var userClass: String
76
var userArgs: Seq[String]
77
var executorMemory: Int
78
var executorCores: Int
79
var numExecutors: Int
80
var amQueue: String
81
var amMemory: Int
82
var appName: String
83
var priority: Int
84
val amMemoryOverhead: Int
85
val executorMemoryOverhead: Int
86
}
87
```
88
89
[YARN Client Management](./yarn-client.md)
90
91
### Application Master
92
93
ApplicationMaster functionality for managing Spark applications running on YARN. Handles resource negotiation with ResourceManager and executor lifecycle management.
94
95
```scala { .api }
96
class ApplicationMaster(
97
args: ApplicationMasterArguments,
98
client: YarnRMClient
99
) {
100
// Application lifecycle management
101
// Resource negotiation with YARN ResourceManager
102
// Executor management and monitoring
103
}
104
105
class ApplicationMasterArguments(val args: Array[String]) {
106
var userJar: String
107
var userClass: String
108
var userArgs: Seq[String]
109
var executorMemory: Int
110
var executorCores: Int
111
var numExecutors: Int
112
def printUsageAndExit(exitCode: Int, unknownParam: Any = null): Unit
113
}
114
```
115
116
[Application Master](./application-master.md)
117
118
### Scheduler Backends
119
120
Scheduler backend implementations for integrating Spark's TaskScheduler with YARN resource management, supporting both client and cluster deployment modes.
121
122
```scala { .api }
123
class YarnClientSchedulerBackend(
124
scheduler: TaskSchedulerImpl,
125
sc: SparkContext
126
) extends YarnSchedulerBackend {
127
def start(): Unit
128
}
129
130
class YarnClusterSchedulerBackend extends YarnSchedulerBackend
131
132
class YarnClientClusterScheduler(sc: SparkContext) extends TaskSchedulerImpl
133
134
class YarnClusterScheduler(sc: SparkContext) extends TaskSchedulerImpl
135
```
136
137
[Scheduler Backends](./scheduler-backends.md)
138
139
### Resource Management
140
141
Resource allocation and management components for negotiating and monitoring YARN cluster resources for Spark executors.
142
143
```scala { .api }
144
trait YarnRMClient {
145
// ResourceManager client interface
146
}
147
148
abstract class YarnAllocator {
149
// Abstract base class for YARN resource allocation logic
150
}
151
152
object AllocationType extends Enumeration {
153
// Enumeration for YARN allocation types
154
}
155
```
156
157
[Resource Management](./resource-management.md)
158
159
### Utilities and Configuration
160
161
Utility classes and configuration management for YARN-specific operations, distributed cache management, and executor container handling.
162
163
```scala { .api }
164
class YarnSparkHadoopUtil extends SparkHadoopUtil {
165
// YARN-specific Hadoop utilities
166
}
167
168
class ClientDistributedCacheManager {
169
// Manages distributed cache for YARN applications
170
}
171
172
trait ExecutorRunnableUtil {
173
// Utility trait for executor container management
174
}
175
```
176
177
[Utilities and Configuration](./utilities.md)
178
179
## Types
180
181
```scala { .api }
182
// Core argument and configuration types
183
private[spark] class ClientArguments(args: Array[String], sparkConf: SparkConf)
184
class ApplicationMasterArguments(val args: Array[String])
185
186
// Resource management interfaces
187
trait YarnRMClient
188
class YarnRMClientImpl(args: ApplicationMasterArguments) extends YarnRMClient
189
private[yarn] abstract class YarnAllocator(
190
conf: Configuration,
191
sparkConf: SparkConf,
192
appAttemptId: ApplicationAttemptId,
193
args: ApplicationMasterArguments,
194
preferredNodes: collection.Map[String, collection.Set[SplitInfo]],
195
securityMgr: SecurityManager
196
) extends Logging
197
198
// Allocation strategy enumeration
199
object AllocationType extends Enumeration {
200
type AllocationType = Value
201
val HOST, RACK, ANY = Value
202
}
203
204
// Scheduler and backend types
205
private[spark] abstract class YarnSchedulerBackend extends CoarseGrainedSchedulerBackend
206
private[spark] class TaskSchedulerImpl extends TaskScheduler
207
208
// Client and utility traits
209
private[spark] trait ClientBase
210
trait ExecutorRunnableUtil
211
```
212
213
## Deployment Modes
214
215
The YARN integration supports two primary deployment modes:
216
217
- **Client Mode**: Driver runs on the client machine, ApplicationMaster manages only executors
218
- **Cluster Mode**: Driver runs inside ApplicationMaster on YARN cluster
219
220
Both modes are handled transparently through the appropriate scheduler backend selection.
221
222
## Version Compatibility
223
224
This package provides support for multiple Hadoop YARN API versions:
225
226
- **Alpha API** (Deprecated): Hadoop 0.23 and 2.0.x - marked deprecated in Spark 1.2
227
- **Stable API** (Recommended): Hadoop 2.2+ - current implementation
228
229
The build system automatically selects the appropriate implementation based on Maven profiles.