Core utility classes and functions for Apache Spark including exception handling, logging, storage configuration, and Java API integration
npx @tessl/cli install tessl/maven-spark-common-utils@3.5.00
# Apache Spark Common Utils
1
2
Apache Spark Common Utils provides essential foundational components for the Apache Spark ecosystem. This library contains core utilities for exception handling, storage configuration, logging, Java API integration, and various utility functions that serve as building blocks across all Spark modules.
3
4
## Package Information
5
6
- **Package Name**: spark-common-utils_2.13
7
- **Package Type**: Maven
8
- **Language**: Scala (with Java integration)
9
- **Group ID**: org.apache.spark
10
- **Artifact ID**: spark-common-utils_2.13
11
- **Version**: 3.5.6
12
- **Installation**:
13
```xml
14
<dependency>
15
<groupId>org.apache.spark</groupId>
16
<artifactId>spark-common-utils_2.13</artifactId>
17
<version>3.5.6</version>
18
</dependency>
19
```
20
21
## Core Imports
22
23
```scala
24
import org.apache.spark.{SparkException, SparkThrowable}
25
import org.apache.spark.storage.StorageLevel
26
import org.apache.spark.internal.Logging
27
import org.apache.spark.api.java.function._
28
```
29
30
For Java users:
31
```java
32
import org.apache.spark.SparkException;
33
import org.apache.spark.SparkThrowable;
34
import org.apache.spark.QueryContext;
35
import org.apache.spark.storage.StorageLevel;
36
import org.apache.spark.api.java.function.*;
37
```
38
39
## Basic Usage
40
41
```scala
42
import org.apache.spark.{SparkException, SparkThrowable}
43
import org.apache.spark.storage.StorageLevel
44
import org.apache.spark.internal.Logging
45
46
// Exception handling
47
try {
48
// Some Spark operation
49
} catch {
50
case ex: SparkException =>
51
println(s"Error class: ${ex.getErrorClass}")
52
println(s"Parameters: ${ex.getMessageParameters}")
53
}
54
55
// Storage level configuration
56
val storageLevel = StorageLevel.MEMORY_AND_DISK_SER_2
57
println(s"Uses disk: ${storageLevel.useDisk}")
58
println(s"Uses memory: ${storageLevel.useMemory}")
59
println(s"Replication: ${storageLevel.replication}")
60
61
// Logging in a class
62
class MySparkComponent extends Logging {
63
def processData(): Unit = {
64
logInfo("Starting data processing")
65
logWarning("This is a warning message")
66
}
67
}
68
```
69
70
## Architecture
71
72
Apache Spark Common Utils is structured around several key architectural components:
73
74
- **Exception System**: Standardized error handling with structured error classes, message parameters, and query context
75
- **Storage Configuration**: Comprehensive storage level definitions for RDD and Dataset persistence strategies
76
- **Logging Infrastructure**: Centralized logging using SLF4J with consistent formatting across Spark components
77
- **Java Integration Layer**: Functional interfaces enabling type-safe lambda expressions in Spark's Java API
78
- **Utility Framework**: Internal utilities for file operations, serialization, threading, and class loading
79
80
## Capabilities
81
82
### Exception Handling and Error Management
83
84
Comprehensive exception handling system with structured error reporting, error classes, and detailed context information.
85
86
```scala { .api }
87
class SparkException(
88
message: String,
89
cause: Throwable = null,
90
errorClass: Option[String] = None,
91
messageParameters: Map[String, String] = Map.empty,
92
context: Array[QueryContext] = Array.empty
93
) extends Exception(message, cause) with SparkThrowable
94
95
trait SparkThrowable {
96
def getErrorClass(): String
97
def getSqlState(): String
98
def isInternalError(): Boolean
99
def getMessageParameters(): java.util.Map[String, String]
100
def getQueryContext(): Array[QueryContext]
101
}
102
103
interface QueryContext {
104
String objectType();
105
String objectName();
106
int startIndex();
107
int stopIndex();
108
String fragment();
109
}
110
```
111
112
[Exception Handling](./exception-handling.md)
113
114
### Storage Level Configuration
115
116
Storage level definitions for controlling RDD and Dataset persistence, including memory, disk, serialization, and replication options.
117
118
```scala { .api }
119
class StorageLevel(
120
useDisk: Boolean,
121
useMemory: Boolean,
122
useOffHeap: Boolean,
123
deserialized: Boolean,
124
replication: Int
125
) {
126
def isValid: Boolean
127
def clone(): StorageLevel
128
def description: String
129
}
130
131
object StorageLevel {
132
val NONE: StorageLevel
133
val DISK_ONLY: StorageLevel
134
val DISK_ONLY_2: StorageLevel
135
val MEMORY_ONLY: StorageLevel
136
val MEMORY_ONLY_2: StorageLevel
137
val MEMORY_ONLY_SER: StorageLevel
138
val MEMORY_AND_DISK: StorageLevel
139
val MEMORY_AND_DISK_2: StorageLevel
140
val MEMORY_AND_DISK_SER: StorageLevel
141
val OFF_HEAP: StorageLevel
142
}
143
```
144
145
[Storage Configuration](./storage-configuration.md)
146
147
### Logging Infrastructure
148
149
SLF4J-based logging trait providing consistent logging methods across Spark components.
150
151
```scala { .api }
152
trait Logging {
153
protected def logInfo(msg: => String): Unit
154
protected def logDebug(msg: => String): Unit
155
protected def logTrace(msg: => String): Unit
156
protected def logWarning(msg: => String): Unit
157
protected def logError(msg: => String): Unit
158
protected def logWarning(msg: => String, throwable: Throwable): Unit
159
protected def logError(msg: => String, throwable: Throwable): Unit
160
protected def isTraceEnabled(): Boolean
161
protected def initializeLogIfNecessary(isInterpreter: Boolean): Unit
162
}
163
```
164
165
[Logging](./logging.md)
166
167
### Java API Integration
168
169
Comprehensive functional interfaces for Spark's Java API, enabling type-safe lambda expressions and functional programming patterns.
170
171
```java { .api }
172
@FunctionalInterface
173
public interface Function<T1, R> extends Serializable {
174
R call(T1 v1) throws Exception;
175
}
176
177
@FunctionalInterface
178
public interface Function2<T1, T2, R> extends Serializable {
179
R call(T1 v1, T2 v2) throws Exception;
180
}
181
182
@FunctionalInterface
183
public interface VoidFunction<T> extends Serializable {
184
void call(T t) throws Exception;
185
}
186
187
@FunctionalInterface
188
public interface PairFunction<T, K, V> extends Serializable {
189
Tuple2<K, V> call(T t) throws Exception;
190
}
191
192
@FunctionalInterface
193
public interface FlatMapFunction<T, R> extends Serializable {
194
Iterator<R> call(T t) throws Exception;
195
}
196
```
197
198
[Java API Functions](./java-api-functions.md)
199
200
### Network Utilities
201
202
Essential utilities for network operations and common Java tasks.
203
204
```java { .api }
205
public class JavaUtils {
206
public static final long DEFAULT_DRIVER_MEM_MB = 1024;
207
208
public static void closeQuietly(Closeable closeable);
209
public static int nonNegativeHash(Object obj);
210
public static ByteBuffer stringToBytes(String s);
211
public static String bytesToString(ByteBuffer b);
212
public static void deleteRecursively(File file);
213
}
214
215
public enum MemoryMode {
216
ON_HEAP, OFF_HEAP
217
}
218
```
219
220
[Network Utilities](./network-utilities.md)
221
222
## Error Classes and Message Templates
223
224
The exception system supports structured error reporting with error classes and parameterized messages:
225
226
```scala { .api }
227
class ErrorClassesJsonReader(jsonFileURLs: Seq[URL]) {
228
def getErrorMessage(errorClass: String, messageParameters: Map[String, String]): String
229
def getMessageTemplate(errorClass: String): String
230
def getSqlState(errorClass: String): String
231
}
232
```
233
234
## Type Definitions
235
236
```scala { .api }
237
// Storage level configuration
238
case class StorageLevel(
239
useDisk: Boolean,
240
useMemory: Boolean,
241
useOffHeap: Boolean,
242
deserialized: Boolean,
243
replication: Int
244
)
245
246
// Exception context information
247
trait QueryContext {
248
def objectType(): String
249
def objectName(): String
250
def startIndex(): Int
251
def stopIndex(): Int
252
def fragment(): String
253
}
254
```
255
256
```java { .api }
257
// Memory allocation modes
258
public enum MemoryMode {
259
ON_HEAP,
260
OFF_HEAP
261
}
262
263
// Java functional interface base types
264
public interface Function<T1, R> extends Serializable {
265
R call(T1 v1) throws Exception;
266
}
267
268
public interface VoidFunction<T> extends Serializable {
269
void call(T t) throws Exception;
270
}
271
```