Tessl Tile for maven/org.apache.spark/spark-core_2.12@3.5.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

context-config.md index.md java-api.md rdd-operations.md resource-management.md serialization.md shared-variables.md storage-caching.md task-context.md

context-config.mddocs/

0
# Context and Configuration
1

2
## SparkContext
3

4
The main entry point for Spark functionality. SparkContext represents the connection to a Spark cluster and is used to create RDDs, accumulators, and broadcast variables.
5

6
```scala { .api }
7
class SparkContext(config: SparkConf) {
8
  // RDD Creation
9
  def parallelize[T: ClassTag](seq: Seq[T], numSlices: Int = defaultParallelism): RDD[T]
10
  def textFile(path: String, minPartitions: Int = defaultMinPartitions): RDD[String]
11
  def wholeTextFiles(path: String, minPartitions: Int = defaultMinPartitions): RDD[(String, String)]
12
  
13
  // Hadoop Integration
14
  def newAPIHadoopFile[K, V, F <: NewInputFormat[K, V]](
15
    path: String,
16
    fClass: Class[F],
17
    kClass: Class[K],
18
    vClass: Class[V],
19
    conf: Configuration = hadoopConfiguration
20
  ): RDD[(K, V)]
21
  
22
  def hadoopFile[K, V](
23
    path: String,
24
    inputFormatClass: Class[_ <: InputFormat[K, V]],
25
    keyClass: Class[K],
26
    valueClass: Class[V],
27
    minPartitions: Int = defaultMinPartitions
28
  ): RDD[(K, V)]
29
  
30
  // Shared Variables  
31
  def broadcast[T: ClassTag](value: T): Broadcast[T]
32
  def longAccumulator(): LongAccumulator
33
  def longAccumulator(name: String): LongAccumulator
34
  def doubleAccumulator(): DoubleAccumulator
35
  def doubleAccumulator(name: String): DoubleAccumulator
36
  def collectionAccumulator[T](): CollectionAccumulator[T]
37
  def collectionAccumulator[T](name: String): CollectionAccumulator[T]
38
  
39
  // Configuration and Properties
40
  def getConf: SparkConf
41
  def master: String
42
  def appName: String
43
  def applicationId: String
44
  def defaultParallelism: Int
45
  val startTime: Long
46
  def version: String
47
  
48
  // File and Jar Management
49
  def addFile(path: String): Unit
50
  def addFile(path: String, recursive: Boolean): Unit
51
  def addJar(path: String): Unit
52
  
53
  // Job Management
54
  def setJobGroup(groupId: String, description: String, interruptOnCancel: Boolean = false): Unit
55
  def clearJobGroup(): Unit
56
  def cancelJobGroup(groupId: String): Unit
57
  
58
  // Additional RDD Creation
59
  def binaryFiles(path: String, minPartitions: Int = defaultMinPartitions): RDD[(String, PortableDataStream)]
60
  def union[T: ClassTag](rdds: Seq[RDD[T]]): RDD[T]
61
  def union[T: ClassTag](first: RDD[T], rest: RDD[T]*): RDD[T]
62
  
63
  // Lifecycle Management
64
  def stop(): Unit
65
  def statusTracker: SparkStatusTracker
66
}
67
```
68

69
### Usage Examples
70

71
Creating a SparkContext:
72
```scala
73
import org.apache.spark.{SparkContext, SparkConf}
74

75
val conf = new SparkConf()
76
  .setAppName("My Application")
77
  .setMaster("local[4]") // Use 4 local threads
78

79
val sc = new SparkContext(conf)
80

81
// Use SparkContext...
82

83
sc.stop() // Always stop when done
84
```
85

86
Reading data:
87
```scala
88
// From local collection
89
val numbers = sc.parallelize(1 to 1000, 8) // 8 partitions
90

91
// From text file  
92
val lines = sc.textFile("hdfs://path/to/file.txt")
93

94
// Multiple text files as (filename, content) pairs
95
val files = sc.wholeTextFiles("hdfs://path/to/directory/")
96
```
97

98
## SparkConf
99

100
Configuration object for Spark applications. Supports method chaining for easy configuration setup.
101

102
```scala { .api }
103
class SparkConf(loadDefaults: Boolean = true) {
104
  // Core Configuration
105
  def set(key: String, value: String): SparkConf
106
  def setMaster(master: String): SparkConf
107
  def setAppName(name: String): SparkConf
108
  def setJars(jars: Seq[String]): SparkConf
109
  def setExecutorEnv(variables: Seq[(String, String)]): SparkConf
110
  def setExecutorEnv(variable: String, value: String): SparkConf
111
  
112
  // Retrieval
113
  def get(key: String): String
114
  def get(key: String, defaultValue: String): String
115
  def getAll: Array[(String, String)]
116
  def contains(key: String): Boolean
117
  def getOption(key: String): Option[String]
118
  
119
  // Spark-specific Settings
120
  def setSparkHome(home: String) 
121
  def setExecutorMemory(mem: String): SparkConf
122
  def setAll(settings: Traversable[(String, String)]): SparkConf
123
  
124
  // Management
125
  def clone(): SparkConf
126
  def remove(key: String): SparkConf
127
}
128
```
129

130
### Configuration Examples
131

132
Basic configuration:
133
```scala
134
val conf = new SparkConf()
135
  .setAppName("Data Processing Job")
136
  .setMaster("spark://master:7077")
137
  .set("spark.executor.memory", "4g")
138
  .set("spark.executor.cores", "4")
139
  .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
140
```
141

142
Loading from properties file:
143
```scala
144
// Loads from spark-defaults.conf and system properties
145
val conf = new SparkConf() // loadDefaults = true by default
146
  .setAppName("My App")
147
```
148

149
Resource configuration:
150
```scala
151
val conf = new SparkConf()
152
  .setAppName("Resource Heavy Job")
153
  .set("spark.executor.memory", "8g")
154
  .set("spark.executor.cores", "8")
155
  .set("spark.executor.instances", "10")
156
  .set("spark.sql.adaptive.enabled", "true")
157
  .set("spark.sql.shuffle.partitions", "400")
158
```
159

160
## Common Configuration Properties
161

162
### Execution Properties
163
- `spark.master` - Master URL (local[*], spark://host:port, yarn, etc.)
164
- `spark.app.name` - Application name
165
- `spark.executor.memory` - Executor memory (e.g., "4g", "2048m")
166
- `spark.executor.cores` - CPU cores per executor
167
- `spark.executor.instances` - Number of executors (for YARN/K8s)
168
- `spark.default.parallelism` - Default number of partitions
169

170
### Serialization Properties  
171
- `spark.serializer` - Serializer class (Java or Kryo)
172
- `spark.kryo.registrator` - Kryo registrator class
173
- `spark.kryo.classesToRegister` - Classes to register with Kryo
174

175
### Storage Properties
176
- `spark.storage.level` - Default storage level for RDDs
177
- `spark.storage.memoryFraction` - Fraction of JVM heap for storage
178
- `spark.storage.safetyFraction` - Safety fraction for storage memory
179

180
### Shuffle Properties
181
- `spark.shuffle.service.enabled` - Enable external shuffle service
182
- `spark.shuffle.compress` - Compress shuffle output
183
- `spark.shuffle.spill.compress` - Compress spilled data

Version

Tile

Files

context-config.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

context-config.mddocs/