Development tool for generating MIMA exclusion files to support binary compatibility checking in Apache Spark builds
npx @tessl/cli install tessl/maven-org-apache-spark--spark-tools_2.12@3.0.00
# Spark Tools
1
2
Spark Tools is a development utility for Apache Spark that generates MIMA (Migration Manager for Scala) exclusion files. It analyzes compiled Spark classes to identify package-private APIs that should be excluded from binary compatibility checks, supporting Spark's release engineering process.
3
4
## Package Information
5
6
- **Package Name**: spark-tools_2.12
7
- **Package Type**: maven
8
- **Language**: Scala
9
- **Installation**: Part of Apache Spark distribution
10
- **Maven Coordinates**: `org.apache.spark:spark-tools_2.12:3.0.1`
11
12
## Core Imports
13
14
```scala
15
import org.apache.spark.tools.GenerateMIMAIgnore
16
17
// For direct API usage (advanced scenarios)
18
import scala.reflect.runtime.{universe => unv}
19
import scala.reflect.runtime.universe.runtimeMirror
20
import org.clapper.classutil.ClassFinder
21
```
22
23
## Basic Usage
24
25
This tool is designed to be executed via Apache Spark's `spark-class` script:
26
27
```bash
28
./spark-class org.apache.spark.tools.GenerateMIMAIgnore
29
```
30
31
The tool will:
32
1. Scan all classes in the `org.apache.spark` package
33
2. Identify package-private classes and members
34
3. Generate two exclusion files in the current directory:
35
- `.generated-mima-class-excludes`
36
- `.generated-mima-member-excludes`
37
38
## Architecture
39
40
The tool operates through Scala reflection to analyze compiled bytecode:
41
42
- **Class Discovery**: Uses `org.clapper.classutil.ClassFinder` to locate all Spark classes on the classpath
43
- **Reflection Analysis**: Leverages Scala's runtime reflection API (`scala.reflect.runtime.universe`) to examine visibility modifiers and package privacy
44
- **Privacy Detection**: Implements both direct privacy checking (class-level modifiers) and indirect privacy checking (inheritance from package-private outer classes)
45
- **Filtering Logic**: Applies heuristics to exclude JVM-generated classes, anonymous functions, and compiler artifacts
46
- **Inner Function Detection**: Uses Java reflection to discover Scala-generated inner functions (methods with `$$` patterns)
47
- **File Generation**: Outputs exclusion patterns in MIMA-compatible format with safe file I/O using `scala.util.Try`
48
49
## Processing Algorithm
50
51
1. **Class Scanning**: Discovers all classes in `org.apache.spark` package using `ClassFinder`
52
2. **Privacy Analysis**: For each class, checks direct privacy (`private[spark]` annotations) and indirect privacy (nested within private classes)
53
3. **Member Analysis**: Examines class members for package-private methods and fields
54
4. **Inner Function Detection**: Uses Java reflection to find Scala compiler-generated inner functions
55
5. **Exclusion Generation**: Creates MIMA exclusion patterns and appends to existing exclusion files
56
57
## Capabilities
58
59
### Main Application Entry Point
60
61
Executes the complete MIMA exclusion generation process for Apache Spark classes.
62
63
```scala { .api }
64
def main(args: Array[String]): Unit
65
```
66
67
**Parameters:**
68
- `args: Array[String]` - Command line arguments (currently unused)
69
70
**Side Effects:**
71
- Creates `.generated-mima-class-excludes` file containing class exclusion patterns
72
- Creates `.generated-mima-member-excludes` file containing member exclusion patterns
73
- Prints progress messages to stdout
74
75
**Usage Example:**
76
```scala
77
// Typically invoked via spark-class script
78
object MyApp {
79
def main(args: Array[String]): Unit = {
80
org.apache.spark.tools.GenerateMIMAIgnore.main(Array.empty)
81
}
82
}
83
```
84
85
### Package Privacy Analysis
86
87
Analyzes all classes in a given package to identify package-private classes and members that should be excluded from MIMA binary compatibility checks.
88
89
```scala { .api }
90
def privateWithin(packageName: String): (Set[String], Set[String])
91
```
92
93
**Parameters:**
94
- `packageName: String` - The package name to analyze (typically `"org.apache.spark"`)
95
96
**Returns:**
97
- `(Set[String], Set[String])` - Tuple containing:
98
- First element: Set of package-private class names with MIMA-compatible patterns
99
- Second element: Set of package-private member names
100
101
**Usage Example:**
102
```scala
103
val (privateClasses, privateMembers) = GenerateMIMAIgnore.privateWithin("org.apache.spark")
104
// privateClasses contains: Set("org.apache.spark.internal.SomeClass", "org.apache.spark.internal.SomeClass#")
105
// privateMembers contains: Set("org.apache.spark.SomeClass.privateMethod", ...)
106
```
107
108
### Class Discovery
109
110
Scans all classes accessible from the context class loader which belong to the given package and subpackages, filtering out JVM-generated artifacts.
111
112
```scala { .api }
113
def getClasses(packageName: String): Set[String]
114
```
115
116
**Parameters:**
117
- `packageName: String` - The package name to scan for classes
118
119
**Returns:**
120
- `Set[String]` - Set of fully qualified class names found in the package
121
122
**Implementation Details:**
123
- Uses `org.clapper.classutil.ClassFinder` for efficient class discovery
124
- Applies filtering heuristics via `shouldExclude` to remove JVM-generated artifacts:
125
- Classes containing "anon" (anonymous classes)
126
- Classes ending with "$class" (Scala trait implementations)
127
- Classes containing "$sp" (specialized generic classes)
128
- Classes containing "hive" or "Hive" (Hive-related components)
129
- Scans both directory-based and JAR-based classes on the classpath
130
131
### Inner Function Analysis
132
133
Extracts inner functions from a class using Java reflection, identifying methods with `$$` patterns that Scala generates for inner functions.
134
135
```scala { .api }
136
def getInnerFunctions(classSymbol: unv.ClassSymbol): Seq[String]
137
```
138
139
**Parameters:**
140
- `classSymbol: unv.ClassSymbol` - Scala reflection symbol representing the class to analyze
141
142
**Returns:**
143
- `Seq[String]` - Sequence of fully qualified inner function names found in the class
144
145
**Implementation Details:**
146
- Falls back to Java reflection when Scala reflection cannot detect inner functions
147
- Specifically looks for methods containing `$$` which indicate Scala compiler-generated functions
148
- Gracefully handles class loading failures with warning messages
149
150
**Usage Example:**
151
```scala
152
import scala.reflect.runtime.universe._
153
import scala.reflect.runtime.{universe => unv}
154
155
val mirror = runtimeMirror(getClass.getClassLoader)
156
val classSymbol = mirror.classSymbol(classOf[SomeSparkClass])
157
val innerFunctions = GenerateMIMAIgnore.getInnerFunctions(classSymbol)
158
// Returns: Seq("com.example.SomeSparkClass.$$anonfun$method$1", ...)
159
```
160
161
## Types
162
163
```scala { .api }
164
// Scala reflection universe import alias
165
import scala.reflect.runtime.{universe => unv}
166
167
// Core Scala reflection types used by the API
168
type ClassSymbol = scala.reflect.runtime.universe.ClassSymbol
169
type ModuleSymbol = scala.reflect.runtime.universe.ModuleSymbol
170
type Symbol = scala.reflect.runtime.universe.Symbol
171
type RuntimeMirror = scala.reflect.runtime.universe.Mirror
172
173
// ClassFinder from external library
174
type ClassFinder = org.clapper.classutil.ClassFinder
175
176
// Scala compiler file I/O utilities
177
type File = scala.tools.nsc.io.File
178
```
179
180
## Dependencies
181
182
The tool requires these runtime dependencies:
183
- `scala-reflect` - Scala reflection API
184
- `scala-compiler` - Scala compiler utilities for file I/O
185
- `org.clapper.classutil` - Third-party library for class discovery
186
187
## Error Handling
188
189
The tool includes comprehensive defensive error handling:
190
191
### Class Loading and Reflection Errors
192
- **Exception Catching**: Wraps class reflection operations in try-catch blocks to handle `ClassNotFoundException` and reflection failures
193
- **Error Logging**: Prints descriptive error messages with class names when instrumentation fails: `"Error instrumenting class:" + className`
194
- **Graceful Degradation**: Continues processing other classes when individual class analysis fails
195
196
### Inner Function Detection Errors
197
- **Fallback Strategy**: When Scala reflection fails to detect inner functions, falls back to Java reflection
198
- **Warning Messages**: Logs warnings for classes where inner function detection fails: `"[WARN] Unable to detect inner functions for class:" + classSymbol.fullName`
199
- **Empty Results**: Returns empty sequences rather than failing when inner function detection encounters errors
200
201
### File I/O Error Handling
202
- **Safe File Operations**: Uses `scala.util.Try` for reading existing exclusion files to handle cases where files don't exist
203
- **Append-Only Strategy**: Reads existing file contents before writing to preserve previous exclusions
204
- **Iterator Fallback**: Provides empty iterators when file reading fails: `Try(File(".generated-mima-class-excludes").lines()).getOrElse(Iterator.empty)`
205
206
## Output Format
207
208
The tool generates two exclusion files with specific formatting patterns:
209
210
### `.generated-mima-class-excludes`
211
Contains package-private class exclusions with MIMA-compatible patterns:
212
- **Class Names**: Direct fully qualified class names
213
- **Object Names**: Class names with `$` replaced by `#` for Scala objects
214
- **Append Strategy**: New exclusions are appended to existing file contents
215
216
```
217
org.apache.spark.internal.SomePrivateClass
218
org.apache.spark.internal.SomePrivateClass#
219
org.apache.spark.scheduler.cluster.mesos.MesosTaskLaunchData
220
org.apache.spark.scheduler.cluster.mesos.MesosTaskLaunchData#
221
```
222
223
### `.generated-mima-member-excludes`
224
Contains package-private member exclusions including methods, fields, and inner functions:
225
- **Methods and Fields**: Fully qualified member names
226
- **Inner Functions**: Scala compiler-generated functions with `$$` in the name
227
- **Append Strategy**: New exclusions are appended to existing file contents
228
229
```
230
org.apache.spark.SomeClass.privateMethod
231
org.apache.spark.SomeClass.privateField
232
org.apache.spark.SomeClass.$$anonfun$someMethod$1
233
org.apache.spark.util.Utils.$$anonfun$tryOrIOException$1
234
```
235
236
### File Generation Process
237
1. **Read Existing**: Attempts to read existing exclusion files using `scala.util.Try`
238
2. **Append New**: Concatenates new exclusions with existing content
239
3. **Write Complete**: Writes the combined content to the exclusion files
240
4. **Progress Logging**: Prints confirmation messages when files are created