or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/maven-spark-common-utils

Core utility classes and functions for Apache Spark including exception handling, logging, storage configuration, and Java API integration

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
mavenpkg:maven/org.apache.spark/spark-common-utils_2.13@3.5.x

To install, run

npx @tessl/cli install tessl/maven-spark-common-utils@3.5.0

0

# Apache Spark Common Utils

1

2

Apache Spark Common Utils provides essential foundational components for the Apache Spark ecosystem. This library contains core utilities for exception handling, storage configuration, logging, Java API integration, and various utility functions that serve as building blocks across all Spark modules.

3

4

## Package Information

5

6

- **Package Name**: spark-common-utils_2.13

7

- **Package Type**: Maven

8

- **Language**: Scala (with Java integration)

9

- **Group ID**: org.apache.spark

10

- **Artifact ID**: spark-common-utils_2.13

11

- **Version**: 3.5.6

12

- **Installation**:

13

```xml

14

<dependency>

15

<groupId>org.apache.spark</groupId>

16

<artifactId>spark-common-utils_2.13</artifactId>

17

<version>3.5.6</version>

18

</dependency>

19

```

20

21

## Core Imports

22

23

```scala

24

import org.apache.spark.{SparkException, SparkThrowable}

25

import org.apache.spark.storage.StorageLevel

26

import org.apache.spark.internal.Logging

27

import org.apache.spark.api.java.function._

28

```

29

30

For Java users:

31

```java

32

import org.apache.spark.SparkException;

33

import org.apache.spark.SparkThrowable;

34

import org.apache.spark.QueryContext;

35

import org.apache.spark.storage.StorageLevel;

36

import org.apache.spark.api.java.function.*;

37

```

38

39

## Basic Usage

40

41

```scala

42

import org.apache.spark.{SparkException, SparkThrowable}

43

import org.apache.spark.storage.StorageLevel

44

import org.apache.spark.internal.Logging

45

46

// Exception handling

47

try {

48

// Some Spark operation

49

} catch {

50

case ex: SparkException =>

51

println(s"Error class: ${ex.getErrorClass}")

52

println(s"Parameters: ${ex.getMessageParameters}")

53

}

54

55

// Storage level configuration

56

val storageLevel = StorageLevel.MEMORY_AND_DISK_SER_2

57

println(s"Uses disk: ${storageLevel.useDisk}")

58

println(s"Uses memory: ${storageLevel.useMemory}")

59

println(s"Replication: ${storageLevel.replication}")

60

61

// Logging in a class

62

class MySparkComponent extends Logging {

63

def processData(): Unit = {

64

logInfo("Starting data processing")

65

logWarning("This is a warning message")

66

}

67

}

68

```

69

70

## Architecture

71

72

Apache Spark Common Utils is structured around several key architectural components:

73

74

- **Exception System**: Standardized error handling with structured error classes, message parameters, and query context

75

- **Storage Configuration**: Comprehensive storage level definitions for RDD and Dataset persistence strategies

76

- **Logging Infrastructure**: Centralized logging using SLF4J with consistent formatting across Spark components

77

- **Java Integration Layer**: Functional interfaces enabling type-safe lambda expressions in Spark's Java API

78

- **Utility Framework**: Internal utilities for file operations, serialization, threading, and class loading

79

80

## Capabilities

81

82

### Exception Handling and Error Management

83

84

Comprehensive exception handling system with structured error reporting, error classes, and detailed context information.

85

86

```scala { .api }

87

class SparkException(

88

message: String,

89

cause: Throwable = null,

90

errorClass: Option[String] = None,

91

messageParameters: Map[String, String] = Map.empty,

92

context: Array[QueryContext] = Array.empty

93

) extends Exception(message, cause) with SparkThrowable

94

95

trait SparkThrowable {

96

def getErrorClass(): String

97

def getSqlState(): String

98

def isInternalError(): Boolean

99

def getMessageParameters(): java.util.Map[String, String]

100

def getQueryContext(): Array[QueryContext]

101

}

102

103

interface QueryContext {

104

String objectType();

105

String objectName();

106

int startIndex();

107

int stopIndex();

108

String fragment();

109

}

110

```

111

112

[Exception Handling](./exception-handling.md)

113

114

### Storage Level Configuration

115

116

Storage level definitions for controlling RDD and Dataset persistence, including memory, disk, serialization, and replication options.

117

118

```scala { .api }

119

class StorageLevel(

120

useDisk: Boolean,

121

useMemory: Boolean,

122

useOffHeap: Boolean,

123

deserialized: Boolean,

124

replication: Int

125

) {

126

def isValid: Boolean

127

def clone(): StorageLevel

128

def description: String

129

}

130

131

object StorageLevel {

132

val NONE: StorageLevel

133

val DISK_ONLY: StorageLevel

134

val DISK_ONLY_2: StorageLevel

135

val MEMORY_ONLY: StorageLevel

136

val MEMORY_ONLY_2: StorageLevel

137

val MEMORY_ONLY_SER: StorageLevel

138

val MEMORY_AND_DISK: StorageLevel

139

val MEMORY_AND_DISK_2: StorageLevel

140

val MEMORY_AND_DISK_SER: StorageLevel

141

val OFF_HEAP: StorageLevel

142

}

143

```

144

145

[Storage Configuration](./storage-configuration.md)

146

147

### Logging Infrastructure

148

149

SLF4J-based logging trait providing consistent logging methods across Spark components.

150

151

```scala { .api }

152

trait Logging {

153

protected def logInfo(msg: => String): Unit

154

protected def logDebug(msg: => String): Unit

155

protected def logTrace(msg: => String): Unit

156

protected def logWarning(msg: => String): Unit

157

protected def logError(msg: => String): Unit

158

protected def logWarning(msg: => String, throwable: Throwable): Unit

159

protected def logError(msg: => String, throwable: Throwable): Unit

160

protected def isTraceEnabled(): Boolean

161

protected def initializeLogIfNecessary(isInterpreter: Boolean): Unit

162

}

163

```

164

165

[Logging](./logging.md)

166

167

### Java API Integration

168

169

Comprehensive functional interfaces for Spark's Java API, enabling type-safe lambda expressions and functional programming patterns.

170

171

```java { .api }

172

@FunctionalInterface

173

public interface Function<T1, R> extends Serializable {

174

R call(T1 v1) throws Exception;

175

}

176

177

@FunctionalInterface

178

public interface Function2<T1, T2, R> extends Serializable {

179

R call(T1 v1, T2 v2) throws Exception;

180

}

181

182

@FunctionalInterface

183

public interface VoidFunction<T> extends Serializable {

184

void call(T t) throws Exception;

185

}

186

187

@FunctionalInterface

188

public interface PairFunction<T, K, V> extends Serializable {

189

Tuple2<K, V> call(T t) throws Exception;

190

}

191

192

@FunctionalInterface

193

public interface FlatMapFunction<T, R> extends Serializable {

194

Iterator<R> call(T t) throws Exception;

195

}

196

```

197

198

[Java API Functions](./java-api-functions.md)

199

200

### Network Utilities

201

202

Essential utilities for network operations and common Java tasks.

203

204

```java { .api }

205

public class JavaUtils {

206

public static final long DEFAULT_DRIVER_MEM_MB = 1024;

207

208

public static void closeQuietly(Closeable closeable);

209

public static int nonNegativeHash(Object obj);

210

public static ByteBuffer stringToBytes(String s);

211

public static String bytesToString(ByteBuffer b);

212

public static void deleteRecursively(File file);

213

}

214

215

public enum MemoryMode {

216

ON_HEAP, OFF_HEAP

217

}

218

```

219

220

[Network Utilities](./network-utilities.md)

221

222

## Error Classes and Message Templates

223

224

The exception system supports structured error reporting with error classes and parameterized messages:

225

226

```scala { .api }

227

class ErrorClassesJsonReader(jsonFileURLs: Seq[URL]) {

228

def getErrorMessage(errorClass: String, messageParameters: Map[String, String]): String

229

def getMessageTemplate(errorClass: String): String

230

def getSqlState(errorClass: String): String

231

}

232

```

233

234

## Type Definitions

235

236

```scala { .api }

237

// Storage level configuration

238

case class StorageLevel(

239

useDisk: Boolean,

240

useMemory: Boolean,

241

useOffHeap: Boolean,

242

deserialized: Boolean,

243

replication: Int

244

)

245

246

// Exception context information

247

trait QueryContext {

248

def objectType(): String

249

def objectName(): String

250

def startIndex(): Int

251

def stopIndex(): Int

252

def fragment(): String

253

}

254

```

255

256

```java { .api }

257

// Memory allocation modes

258

public enum MemoryMode {

259

ON_HEAP,

260

OFF_HEAP

261

}

262

263

// Java functional interface base types

264

public interface Function<T1, R> extends Serializable {

265

R call(T1 v1) throws Exception;

266

}

267

268

public interface VoidFunction<T> extends Serializable {

269

void call(T t) throws Exception;

270

}

271

```