tessl/maven-org-apache-spark--spark-tools-2-12

Development tool for generating MIMA exclusion files to support binary compatibility checking in Apache Spark builds

—

Quality

—

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

—

The risk profile of this skill

Overview

Eval results

Files

Spark Tools

Name: tessl/maven-org-apache-spark--spark-tools-2-12
Author: tessl

Spark Tools is a development utility for Apache Spark that generates MIMA (Migration Manager for Scala) exclusion files. It analyzes compiled Spark classes to identify package-private APIs that should be excluded from binary compatibility checks, supporting Spark's release engineering process.

Package Information

Package Name: spark-tools_2.12
Package Type: maven
Language: Scala
Installation: Part of Apache Spark distribution
Maven Coordinates: org.apache.spark:spark-tools_2.12:3.0.1

Core Imports

import org.apache.spark.tools.GenerateMIMAIgnore

// For direct API usage (advanced scenarios)
import scala.reflect.runtime.{universe => unv}
import scala.reflect.runtime.universe.runtimeMirror
import org.clapper.classutil.ClassFinder

Basic Usage

This tool is designed to be executed via Apache Spark's spark-class script:

./spark-class org.apache.spark.tools.GenerateMIMAIgnore

The tool will:

Scan all classes in the org.apache.spark package
Identify package-private classes and members
Generate two exclusion files in the current directory:
- .generated-mima-class-excludes
- .generated-mima-member-excludes

Architecture

The tool operates through Scala reflection to analyze compiled bytecode:

Class Discovery: Uses org.clapper.classutil.ClassFinder to locate all Spark classes on the classpath
Reflection Analysis: Leverages Scala's runtime reflection API (scala.reflect.runtime.universe) to examine visibility modifiers and package privacy
Privacy Detection: Implements both direct privacy checking (class-level modifiers) and indirect privacy checking (inheritance from package-private outer classes)
Filtering Logic: Applies heuristics to exclude JVM-generated classes, anonymous functions, and compiler artifacts
Inner Function Detection: Uses Java reflection to discover Scala-generated inner functions (methods with $$ patterns)
File Generation: Outputs exclusion patterns in MIMA-compatible format with safe file I/O using scala.util.Try

Processing Algorithm

Class Scanning: Discovers all classes in org.apache.spark package using ClassFinder
Privacy Analysis: For each class, checks direct privacy (private[spark] annotations) and indirect privacy (nested within private classes)
Member Analysis: Examines class members for package-private methods and fields
Inner Function Detection: Uses Java reflection to find Scala compiler-generated inner functions
Exclusion Generation: Creates MIMA exclusion patterns and appends to existing exclusion files

Capabilities

Main Application Entry Point

Executes the complete MIMA exclusion generation process for Apache Spark classes.

def main(args: Array[String]): Unit

Parameters:

args: Array[String] - Command line arguments (currently unused)

Side Effects:

Creates .generated-mima-class-excludes file containing class exclusion patterns
Creates .generated-mima-member-excludes file containing member exclusion patterns
Prints progress messages to stdout

Usage Example:

// Typically invoked via spark-class script
object MyApp {
  def main(args: Array[String]): Unit = {
    org.apache.spark.tools.GenerateMIMAIgnore.main(Array.empty)
  }
}

Package Privacy Analysis

Analyzes all classes in a given package to identify package-private classes and members that should be excluded from MIMA binary compatibility checks.

def privateWithin(packageName: String): (Set[String], Set[String])

Parameters:

packageName: String - The package name to analyze (typically "org.apache.spark")

Returns:

(Set[String], Set[String]) - Tuple containing:
- First element: Set of package-private class names with MIMA-compatible patterns
- Second element: Set of package-private member names

Usage Example:

val (privateClasses, privateMembers) = GenerateMIMAIgnore.privateWithin("org.apache.spark")
// privateClasses contains: Set("org.apache.spark.internal.SomeClass", "org.apache.spark.internal.SomeClass#")
// privateMembers contains: Set("org.apache.spark.SomeClass.privateMethod", ...)

Class Discovery

Scans all classes accessible from the context class loader which belong to the given package and subpackages, filtering out JVM-generated artifacts.

def getClasses(packageName: String): Set[String]

Parameters:

packageName: String - The package name to scan for classes

Returns:

Set[String] - Set of fully qualified class names found in the package

Implementation Details:

Uses org.clapper.classutil.ClassFinder for efficient class discovery
Applies filtering heuristics via shouldExclude to remove JVM-generated artifacts:
- Classes containing "anon" (anonymous classes)
- Classes ending with "$class" (Scala trait implementations)
- Classes containing "$sp" (specialized generic classes)
- Classes containing "hive" or "Hive" (Hive-related components)
Scans both directory-based and JAR-based classes on the classpath

Inner Function Analysis

Extracts inner functions from a class using Java reflection, identifying methods with $$ patterns that Scala generates for inner functions.

def getInnerFunctions(classSymbol: unv.ClassSymbol): Seq[String]

Parameters:

classSymbol: unv.ClassSymbol - Scala reflection symbol representing the class to analyze

Returns:

Seq[String] - Sequence of fully qualified inner function names found in the class

Implementation Details:

Falls back to Java reflection when Scala reflection cannot detect inner functions
Specifically looks for methods containing $$ which indicate Scala compiler-generated functions
Gracefully handles class loading failures with warning messages

Usage Example:

import scala.reflect.runtime.universe._
import scala.reflect.runtime.{universe => unv}

val mirror = runtimeMirror(getClass.getClassLoader)
val classSymbol = mirror.classSymbol(classOf[SomeSparkClass])
val innerFunctions = GenerateMIMAIgnore.getInnerFunctions(classSymbol)
// Returns: Seq("com.example.SomeSparkClass.$$anonfun$method$1", ...)

Types

// Scala reflection universe import alias
import scala.reflect.runtime.{universe => unv}

// Core Scala reflection types used by the API
type ClassSymbol = scala.reflect.runtime.universe.ClassSymbol
type ModuleSymbol = scala.reflect.runtime.universe.ModuleSymbol  
type Symbol = scala.reflect.runtime.universe.Symbol
type RuntimeMirror = scala.reflect.runtime.universe.Mirror

// ClassFinder from external library
type ClassFinder = org.clapper.classutil.ClassFinder

// Scala compiler file I/O utilities
type File = scala.tools.nsc.io.File

Dependencies

The tool requires these runtime dependencies:

scala-reflect - Scala reflection API
scala-compiler - Scala compiler utilities for file I/O
org.clapper.classutil - Third-party library for class discovery

Error Handling

The tool includes comprehensive defensive error handling:

Class Loading and Reflection Errors

Exception Catching: Wraps class reflection operations in try-catch blocks to handle ClassNotFoundException and reflection failures
Error Logging: Prints descriptive error messages with class names when instrumentation fails: "Error instrumenting class:" + className
Graceful Degradation: Continues processing other classes when individual class analysis fails

Inner Function Detection Errors

Fallback Strategy: When Scala reflection fails to detect inner functions, falls back to Java reflection
Warning Messages: Logs warnings for classes where inner function detection fails: "[WARN] Unable to detect inner functions for class:" + classSymbol.fullName
Empty Results: Returns empty sequences rather than failing when inner function detection encounters errors

File I/O Error Handling

Safe File Operations: Uses scala.util.Try for reading existing exclusion files to handle cases where files don't exist
Append-Only Strategy: Reads existing file contents before writing to preserve previous exclusions
Iterator Fallback: Provides empty iterators when file reading fails: Try(File(".generated-mima-class-excludes").lines()).getOrElse(Iterator.empty)

Output Format

The tool generates two exclusion files with specific formatting patterns:

`.generated-mima-class-excludes`

Contains package-private class exclusions with MIMA-compatible patterns:

Class Names: Direct fully qualified class names
Object Names: Class names with $ replaced by # for Scala objects
Append Strategy: New exclusions are appended to existing file contents

org.apache.spark.internal.SomePrivateClass
org.apache.spark.internal.SomePrivateClass#
org.apache.spark.scheduler.cluster.mesos.MesosTaskLaunchData
org.apache.spark.scheduler.cluster.mesos.MesosTaskLaunchData#

`.generated-mima-member-excludes`

Contains package-private member exclusions including methods, fields, and inner functions:

Methods and Fields: Fully qualified member names
Inner Functions: Scala compiler-generated functions with $$ in the name
Append Strategy: New exclusions are appended to existing file contents

org.apache.spark.SomeClass.privateMethod
org.apache.spark.SomeClass.privateField
org.apache.spark.SomeClass.$$anonfun$someMethod$1
org.apache.spark.util.Utils.$$anonfun$tryOrIOException$1

File Generation Process

Read Existing: Attempts to read existing exclusion files using scala.util.Try
Append New: Concatenates new exclusions with existing content
Write Complete: Writes the combined content to the exclusion files
Progress Logging: Prints confirmation messages when files are created

docs

index.md

tile.json

tessl/maven-org-apache-spark--spark-tools-2-12

index.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

Spark Tools

Package Information

Core Imports

Basic Usage

Architecture

Processing Algorithm

Capabilities

Main Application Entry Point

Package Privacy Analysis

Class Discovery

Inner Function Analysis

Types

Dependencies

Error Handling

Class Loading and Reflection Errors

Inner Function Detection Errors

File I/O Error Handling

Output Format

.generated-mima-class-excludes

.generated-mima-member-excludes

File Generation Process

index.mddocs/

`.generated-mima-class-excludes`

`.generated-mima-member-excludes`