tessl/maven-org-apache-spark--spark-tools-2-12

Development tool for generating MIMA exclusion files to support binary compatibility checking in Apache Spark builds

Overview

Eval results

Files

Spark Tools

Name: tessl/maven-org-apache-spark--spark-tools-2-12
Author: tessl

Spark Tools is a development utility for Apache Spark that generates MIMA (Migration Manager for Scala) exclusion files. It analyzes compiled Spark classes to identify package-private APIs that should be excluded from binary compatibility checks, supporting Spark's release engineering process.

Package Information

Package Name: spark-tools_2.12
Package Type: maven
Language: Scala
Installation: Part of Apache Spark distribution
Maven Coordinates: org.apache.spark:spark-tools_2.12:3.0.1

Core Imports

import org.apache.spark.tools.GenerateMIMAIgnore

// For direct API usage (advanced scenarios)
import scala.reflect.runtime.{universe => unv}
import scala.reflect.runtime.universe.runtimeMirror
import org.clapper.classutil.ClassFinder

Basic Usage

This tool is designed to be executed via Apache Spark's spark-class script:

./spark-class org.apache.spark.tools.GenerateMIMAIgnore

The tool will:

Scan all classes in the org.apache.spark package
Identify package-private classes and members
Generate two exclusion files in the current directory:
- .generated-mima-class-excludes
- .generated-mima-member-excludes

Architecture

The tool operates through Scala reflection to analyze compiled bytecode:

Class Discovery: Uses org.clapper.classutil.ClassFinder to locate all Spark classes on the classpath
Reflection Analysis: Leverages Scala's runtime reflection API (scala.reflect.runtime.universe) to examine visibility modifiers and package privacy
Privacy Detection: Implements both direct privacy checking (class-level modifiers) and indirect privacy checking (inheritance from package-private outer classes)
Filtering Logic: Applies heuristics to exclude JVM-generated classes, anonymous functions, and compiler artifacts
Inner Function Detection: Uses Java reflection to discover Scala-generated inner functions (methods with $$ patterns)
File Generation: Outputs exclusion patterns in MIMA-compatible format with safe file I/O using scala.util.Try

Processing Algorithm

Class Scanning: Discovers all classes in org.apache.spark package using ClassFinder
Privacy Analysis: For each class, checks direct privacy (private[spark] annotations) and indirect privacy (nested within private classes)
Member Analysis: Examines class members for package-private methods and fields
Inner Function Detection: Uses Java reflection to find Scala compiler-generated inner functions
Exclusion Generation: Creates MIMA exclusion patterns and appends to existing exclusion files

Capabilities

Main Application Entry Point

Executes the complete MIMA exclusion generation process for Apache Spark classes.

def main(args: Array[String]): Unit

Parameters:

args: Array[String] - Command line arguments (currently unused)

Side Effects:

Creates .generated-mima-class-excludes file containing class exclusion patterns
Creates .generated-mima-member-excludes file containing member exclusion patterns
Prints progress messages to stdout

Usage Example:

// Typically invoked via spark-class script
object MyApp {
  def main(args: Array[String]): Unit = {
    org.apache.spark.tools.GenerateMIMAIgnore.main(Array.empty)
  }
}

Package Privacy Analysis

Analyzes all classes in a given package to identify package-private classes and members that should be excluded from MIMA binary compatibility checks.

def privateWithin(packageName: String): (Set[String], Set[String])

Parameters:

packageName: String - The package name to analyze (typically "org.apache.spark")

Returns:

(Set[String], Set[String]) - Tuple containing:
- First element: Set of package-private class names with MIMA-compatible patterns
- Second element: Set of package-private member names

Usage Example:

val (privateClasses, privateMembers) = GenerateMIMAIgnore.privateWithin("org.apache.spark")
// privateClasses contains: Set("org.apache.spark.internal.SomeClass", "org.apache.spark.internal.SomeClass#")
// privateMembers contains: Set("org.apache.spark.SomeClass.privateMethod", ...)

Class Discovery

Scans all classes accessible from the context class loader which belong to the given package and subpackages, filtering out JVM-generated artifacts.

def getClasses(packageName: String): Set[String]

Parameters:

packageName: String - The package name to scan for classes

Returns:

Set[String] - Set of fully qualified class names found in the package

Implementation Details:

Uses org.clapper.classutil.ClassFinder for efficient class discovery
Applies filtering heuristics via shouldExclude to remove JVM-generated artifacts:
- Classes containing "anon" (anonymous classes)
- Classes ending with "$class" (Scala trait implementations)
- Classes containing "$sp" (specialized generic classes)
- Classes containing "hive" or "Hive" (Hive-related components)
Scans both directory-based and JAR-based classes on the classpath

Inner Function Analysis

Extracts inner functions from a class using Java reflection, identifying methods with $$ patterns that Scala generates for inner functions.

def getInnerFunctions(classSymbol: unv.ClassSymbol): Seq[String]

Parameters:

classSymbol: unv.ClassSymbol - Scala reflection symbol representing the class to analyze

Returns:

Seq[String] - Sequence of fully qualified inner function names found in the class

Implementation Details:

Falls back to Java reflection when Scala reflection cannot detect inner functions
Specifically looks for methods containing $$ which indicate Scala compiler-generated functions
Gracefully handles class loading failures with warning messages

Usage Example:

import scala.reflect.runtime.universe._
import scala.reflect.runtime.{universe => unv}

val mirror = runtimeMirror(getClass.getClassLoader)
val classSymbol = mirror.classSymbol(classOf[SomeSparkClass])
val innerFunctions = GenerateMIMAIgnore.getInnerFunctions(classSymbol)
// Returns: Seq("com.example.SomeSparkClass.$$anonfun$method$1", ...)

Types

// Scala reflection universe import alias
import scala.reflect.runtime.{universe => unv}

// Core Scala reflection types used by the API
type ClassSymbol = scala.reflect.runtime.universe.ClassSymbol
type ModuleSymbol = scala.reflect.runtime.universe.ModuleSymbol  
type Symbol = scala.reflect.runtime.universe.Symbol
type RuntimeMirror = scala.reflect.runtime.universe.Mirror

// ClassFinder from external library
type ClassFinder = org.clapper.classutil.ClassFinder

// Scala compiler file I/O utilities
type File = scala.tools.nsc.io.File

Dependencies

The tool requires these runtime dependencies:

scala-reflect - Scala reflection API
scala-compiler - Scala compiler utilities for file I/O
org.clapper.classutil - Third-party library for class discovery

Error Handling

The tool includes comprehensive defensive error handling:

Class Loading and Reflection Errors

Exception Catching: Wraps class reflection operations in try-catch blocks to handle ClassNotFoundException and reflection failures
Error Logging: Prints descriptive error messages with class names when instrumentation fails: "Error instrumenting class:" + className
Graceful Degradation: Continues processing other classes when individual class analysis fails

Inner Function Detection Errors

Fallback Strategy: When Scala reflection fails to detect inner functions, falls back to Java reflection
Warning Messages: Logs warnings for classes where inner function detection fails: "[WARN] Unable to detect inner functions for class:" + classSymbol.fullName
Empty Results: Returns empty sequences rather than failing when inner function detection encounters errors

File I/O Error Handling

Safe File Operations: Uses scala.util.Try for reading existing exclusion files to handle cases where files don't exist
Append-Only Strategy: Reads existing file contents before writing to preserve previous exclusions
Iterator Fallback: Provides empty iterators when file reading fails: Try(File(".generated-mima-class-excludes").lines()).getOrElse(Iterator.empty)

Output Format

The tool generates two exclusion files with specific formatting patterns:

`.generated-mima-class-excludes`

Contains package-private class exclusions with MIMA-compatible patterns:

Class Names: Direct fully qualified class names
Object Names: Class names with $ replaced by # for Scala objects
Append Strategy: New exclusions are appended to existing file contents

org.apache.spark.internal.SomePrivateClass
org.apache.spark.internal.SomePrivateClass#
org.apache.spark.scheduler.cluster.mesos.MesosTaskLaunchData
org.apache.spark.scheduler.cluster.mesos.MesosTaskLaunchData#

`.generated-mima-member-excludes`

Contains package-private member exclusions including methods, fields, and inner functions:

Methods and Fields: Fully qualified member names
Inner Functions: Scala compiler-generated functions with $$ in the name
Append Strategy: New exclusions are appended to existing file contents

org.apache.spark.SomeClass.privateMethod
org.apache.spark.SomeClass.privateField
org.apache.spark.SomeClass.$$anonfun$someMethod$1
org.apache.spark.util.Utils.$$anonfun$tryOrIOException$1

File Generation Process

Read Existing: Attempts to read existing exclusion files using scala.util.Try
Append New: Concatenates new exclusions with existing content
Write Complete: Writes the combined content to the exclusion files
Progress Logging: Prints confirmation messages when files are created

Install with Tessl CLI

npx tessl i tessl/maven-org-apache-spark--spark-tools-2-12

Workspace: tessl
Visibility: Public
Created: 2 months ago
Last updated: about 1 month ago
Describes: pkg:maven/org.apache.spark/spark-tools_2.12@3.0.x