or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

index.md

index.mddocs/

0

# Spark Tools

1

2

Spark Tools is a development utility for Apache Spark that generates MIMA (Migration Manager for Scala) exclusion files. It analyzes compiled Spark classes to identify package-private APIs that should be excluded from binary compatibility checks, supporting Spark's release engineering process.

3

4

## Package Information

5

6

- **Package Name**: spark-tools_2.12

7

- **Package Type**: maven

8

- **Language**: Scala

9

- **Installation**: Part of Apache Spark distribution

10

- **Maven Coordinates**: `org.apache.spark:spark-tools_2.12:3.0.1`

11

12

## Core Imports

13

14

```scala

15

import org.apache.spark.tools.GenerateMIMAIgnore

16

17

// For direct API usage (advanced scenarios)

18

import scala.reflect.runtime.{universe => unv}

19

import scala.reflect.runtime.universe.runtimeMirror

20

import org.clapper.classutil.ClassFinder

21

```

22

23

## Basic Usage

24

25

This tool is designed to be executed via Apache Spark's `spark-class` script:

26

27

```bash

28

./spark-class org.apache.spark.tools.GenerateMIMAIgnore

29

```

30

31

The tool will:

32

1. Scan all classes in the `org.apache.spark` package

33

2. Identify package-private classes and members

34

3. Generate two exclusion files in the current directory:

35

- `.generated-mima-class-excludes`

36

- `.generated-mima-member-excludes`

37

38

## Architecture

39

40

The tool operates through Scala reflection to analyze compiled bytecode:

41

42

- **Class Discovery**: Uses `org.clapper.classutil.ClassFinder` to locate all Spark classes on the classpath

43

- **Reflection Analysis**: Leverages Scala's runtime reflection API (`scala.reflect.runtime.universe`) to examine visibility modifiers and package privacy

44

- **Privacy Detection**: Implements both direct privacy checking (class-level modifiers) and indirect privacy checking (inheritance from package-private outer classes)

45

- **Filtering Logic**: Applies heuristics to exclude JVM-generated classes, anonymous functions, and compiler artifacts

46

- **Inner Function Detection**: Uses Java reflection to discover Scala-generated inner functions (methods with `$$` patterns)

47

- **File Generation**: Outputs exclusion patterns in MIMA-compatible format with safe file I/O using `scala.util.Try`

48

49

## Processing Algorithm

50

51

1. **Class Scanning**: Discovers all classes in `org.apache.spark` package using `ClassFinder`

52

2. **Privacy Analysis**: For each class, checks direct privacy (`private[spark]` annotations) and indirect privacy (nested within private classes)

53

3. **Member Analysis**: Examines class members for package-private methods and fields

54

4. **Inner Function Detection**: Uses Java reflection to find Scala compiler-generated inner functions

55

5. **Exclusion Generation**: Creates MIMA exclusion patterns and appends to existing exclusion files

56

57

## Capabilities

58

59

### Main Application Entry Point

60

61

Executes the complete MIMA exclusion generation process for Apache Spark classes.

62

63

```scala { .api }

64

def main(args: Array[String]): Unit

65

```

66

67

**Parameters:**

68

- `args: Array[String]` - Command line arguments (currently unused)

69

70

**Side Effects:**

71

- Creates `.generated-mima-class-excludes` file containing class exclusion patterns

72

- Creates `.generated-mima-member-excludes` file containing member exclusion patterns

73

- Prints progress messages to stdout

74

75

**Usage Example:**

76

```scala

77

// Typically invoked via spark-class script

78

object MyApp {

79

def main(args: Array[String]): Unit = {

80

org.apache.spark.tools.GenerateMIMAIgnore.main(Array.empty)

81

}

82

}

83

```

84

85

### Package Privacy Analysis

86

87

Analyzes all classes in a given package to identify package-private classes and members that should be excluded from MIMA binary compatibility checks.

88

89

```scala { .api }

90

def privateWithin(packageName: String): (Set[String], Set[String])

91

```

92

93

**Parameters:**

94

- `packageName: String` - The package name to analyze (typically `"org.apache.spark"`)

95

96

**Returns:**

97

- `(Set[String], Set[String])` - Tuple containing:

98

- First element: Set of package-private class names with MIMA-compatible patterns

99

- Second element: Set of package-private member names

100

101

**Usage Example:**

102

```scala

103

val (privateClasses, privateMembers) = GenerateMIMAIgnore.privateWithin("org.apache.spark")

104

// privateClasses contains: Set("org.apache.spark.internal.SomeClass", "org.apache.spark.internal.SomeClass#")

105

// privateMembers contains: Set("org.apache.spark.SomeClass.privateMethod", ...)

106

```

107

108

### Class Discovery

109

110

Scans all classes accessible from the context class loader which belong to the given package and subpackages, filtering out JVM-generated artifacts.

111

112

```scala { .api }

113

def getClasses(packageName: String): Set[String]

114

```

115

116

**Parameters:**

117

- `packageName: String` - The package name to scan for classes

118

119

**Returns:**

120

- `Set[String]` - Set of fully qualified class names found in the package

121

122

**Implementation Details:**

123

- Uses `org.clapper.classutil.ClassFinder` for efficient class discovery

124

- Applies filtering heuristics via `shouldExclude` to remove JVM-generated artifacts:

125

- Classes containing "anon" (anonymous classes)

126

- Classes ending with "$class" (Scala trait implementations)

127

- Classes containing "$sp" (specialized generic classes)

128

- Classes containing "hive" or "Hive" (Hive-related components)

129

- Scans both directory-based and JAR-based classes on the classpath

130

131

### Inner Function Analysis

132

133

Extracts inner functions from a class using Java reflection, identifying methods with `$$` patterns that Scala generates for inner functions.

134

135

```scala { .api }

136

def getInnerFunctions(classSymbol: unv.ClassSymbol): Seq[String]

137

```

138

139

**Parameters:**

140

- `classSymbol: unv.ClassSymbol` - Scala reflection symbol representing the class to analyze

141

142

**Returns:**

143

- `Seq[String]` - Sequence of fully qualified inner function names found in the class

144

145

**Implementation Details:**

146

- Falls back to Java reflection when Scala reflection cannot detect inner functions

147

- Specifically looks for methods containing `$$` which indicate Scala compiler-generated functions

148

- Gracefully handles class loading failures with warning messages

149

150

**Usage Example:**

151

```scala

152

import scala.reflect.runtime.universe._

153

import scala.reflect.runtime.{universe => unv}

154

155

val mirror = runtimeMirror(getClass.getClassLoader)

156

val classSymbol = mirror.classSymbol(classOf[SomeSparkClass])

157

val innerFunctions = GenerateMIMAIgnore.getInnerFunctions(classSymbol)

158

// Returns: Seq("com.example.SomeSparkClass.$$anonfun$method$1", ...)

159

```

160

161

## Types

162

163

```scala { .api }

164

// Scala reflection universe import alias

165

import scala.reflect.runtime.{universe => unv}

166

167

// Core Scala reflection types used by the API

168

type ClassSymbol = scala.reflect.runtime.universe.ClassSymbol

169

type ModuleSymbol = scala.reflect.runtime.universe.ModuleSymbol

170

type Symbol = scala.reflect.runtime.universe.Symbol

171

type RuntimeMirror = scala.reflect.runtime.universe.Mirror

172

173

// ClassFinder from external library

174

type ClassFinder = org.clapper.classutil.ClassFinder

175

176

// Scala compiler file I/O utilities

177

type File = scala.tools.nsc.io.File

178

```

179

180

## Dependencies

181

182

The tool requires these runtime dependencies:

183

- `scala-reflect` - Scala reflection API

184

- `scala-compiler` - Scala compiler utilities for file I/O

185

- `org.clapper.classutil` - Third-party library for class discovery

186

187

## Error Handling

188

189

The tool includes comprehensive defensive error handling:

190

191

### Class Loading and Reflection Errors

192

- **Exception Catching**: Wraps class reflection operations in try-catch blocks to handle `ClassNotFoundException` and reflection failures

193

- **Error Logging**: Prints descriptive error messages with class names when instrumentation fails: `"Error instrumenting class:" + className`

194

- **Graceful Degradation**: Continues processing other classes when individual class analysis fails

195

196

### Inner Function Detection Errors

197

- **Fallback Strategy**: When Scala reflection fails to detect inner functions, falls back to Java reflection

198

- **Warning Messages**: Logs warnings for classes where inner function detection fails: `"[WARN] Unable to detect inner functions for class:" + classSymbol.fullName`

199

- **Empty Results**: Returns empty sequences rather than failing when inner function detection encounters errors

200

201

### File I/O Error Handling

202

- **Safe File Operations**: Uses `scala.util.Try` for reading existing exclusion files to handle cases where files don't exist

203

- **Append-Only Strategy**: Reads existing file contents before writing to preserve previous exclusions

204

- **Iterator Fallback**: Provides empty iterators when file reading fails: `Try(File(".generated-mima-class-excludes").lines()).getOrElse(Iterator.empty)`

205

206

## Output Format

207

208

The tool generates two exclusion files with specific formatting patterns:

209

210

### `.generated-mima-class-excludes`

211

Contains package-private class exclusions with MIMA-compatible patterns:

212

- **Class Names**: Direct fully qualified class names

213

- **Object Names**: Class names with `$` replaced by `#` for Scala objects

214

- **Append Strategy**: New exclusions are appended to existing file contents

215

216

```

217

org.apache.spark.internal.SomePrivateClass

218

org.apache.spark.internal.SomePrivateClass#

219

org.apache.spark.scheduler.cluster.mesos.MesosTaskLaunchData

220

org.apache.spark.scheduler.cluster.mesos.MesosTaskLaunchData#

221

```

222

223

### `.generated-mima-member-excludes`

224

Contains package-private member exclusions including methods, fields, and inner functions:

225

- **Methods and Fields**: Fully qualified member names

226

- **Inner Functions**: Scala compiler-generated functions with `$$` in the name

227

- **Append Strategy**: New exclusions are appended to existing file contents

228

229

```

230

org.apache.spark.SomeClass.privateMethod

231

org.apache.spark.SomeClass.privateField

232

org.apache.spark.SomeClass.$$anonfun$someMethod$1

233

org.apache.spark.util.Utils.$$anonfun$tryOrIOException$1

234

```

235

236

### File Generation Process

237

1. **Read Existing**: Attempts to read existing exclusion files using `scala.util.Try`

238

2. **Append New**: Concatenates new exclusions with existing content

239

3. **Write Complete**: Writes the combined content to the exclusion files

240

4. **Progress Logging**: Prints confirmation messages when files are created