CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/maven-org-apache-spark--spark-tools-2-12

Development tool for generating MIMA exclusion files to support binary compatibility checking in Apache Spark builds

Pending
Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Pending

The risk profile of this skill

Overview
Eval results
Files

index.mddocs/

Spark Tools

Spark Tools is a development utility for Apache Spark that generates MIMA (Migration Manager for Scala) exclusion files. It analyzes compiled Spark classes to identify package-private APIs that should be excluded from binary compatibility checks, supporting Spark's release engineering process.

Package Information

  • Package Name: spark-tools_2.12
  • Package Type: maven
  • Language: Scala
  • Installation: Part of Apache Spark distribution
  • Maven Coordinates: org.apache.spark:spark-tools_2.12:3.0.1

Core Imports

import org.apache.spark.tools.GenerateMIMAIgnore

// For direct API usage (advanced scenarios)
import scala.reflect.runtime.{universe => unv}
import scala.reflect.runtime.universe.runtimeMirror
import org.clapper.classutil.ClassFinder

Basic Usage

This tool is designed to be executed via Apache Spark's spark-class script:

./spark-class org.apache.spark.tools.GenerateMIMAIgnore

The tool will:

  1. Scan all classes in the org.apache.spark package
  2. Identify package-private classes and members
  3. Generate two exclusion files in the current directory:
    • .generated-mima-class-excludes
    • .generated-mima-member-excludes

Architecture

The tool operates through Scala reflection to analyze compiled bytecode:

  • Class Discovery: Uses org.clapper.classutil.ClassFinder to locate all Spark classes on the classpath
  • Reflection Analysis: Leverages Scala's runtime reflection API (scala.reflect.runtime.universe) to examine visibility modifiers and package privacy
  • Privacy Detection: Implements both direct privacy checking (class-level modifiers) and indirect privacy checking (inheritance from package-private outer classes)
  • Filtering Logic: Applies heuristics to exclude JVM-generated classes, anonymous functions, and compiler artifacts
  • Inner Function Detection: Uses Java reflection to discover Scala-generated inner functions (methods with $$ patterns)
  • File Generation: Outputs exclusion patterns in MIMA-compatible format with safe file I/O using scala.util.Try

Processing Algorithm

  1. Class Scanning: Discovers all classes in org.apache.spark package using ClassFinder
  2. Privacy Analysis: For each class, checks direct privacy (private[spark] annotations) and indirect privacy (nested within private classes)
  3. Member Analysis: Examines class members for package-private methods and fields
  4. Inner Function Detection: Uses Java reflection to find Scala compiler-generated inner functions
  5. Exclusion Generation: Creates MIMA exclusion patterns and appends to existing exclusion files

Capabilities

Main Application Entry Point

Executes the complete MIMA exclusion generation process for Apache Spark classes.

def main(args: Array[String]): Unit

Parameters:

  • args: Array[String] - Command line arguments (currently unused)

Side Effects:

  • Creates .generated-mima-class-excludes file containing class exclusion patterns
  • Creates .generated-mima-member-excludes file containing member exclusion patterns
  • Prints progress messages to stdout

Usage Example:

// Typically invoked via spark-class script
object MyApp {
  def main(args: Array[String]): Unit = {
    org.apache.spark.tools.GenerateMIMAIgnore.main(Array.empty)
  }
}

Package Privacy Analysis

Analyzes all classes in a given package to identify package-private classes and members that should be excluded from MIMA binary compatibility checks.

def privateWithin(packageName: String): (Set[String], Set[String])

Parameters:

  • packageName: String - The package name to analyze (typically "org.apache.spark")

Returns:

  • (Set[String], Set[String]) - Tuple containing:
    • First element: Set of package-private class names with MIMA-compatible patterns
    • Second element: Set of package-private member names

Usage Example:

val (privateClasses, privateMembers) = GenerateMIMAIgnore.privateWithin("org.apache.spark")
// privateClasses contains: Set("org.apache.spark.internal.SomeClass", "org.apache.spark.internal.SomeClass#")
// privateMembers contains: Set("org.apache.spark.SomeClass.privateMethod", ...)

Class Discovery

Scans all classes accessible from the context class loader which belong to the given package and subpackages, filtering out JVM-generated artifacts.

def getClasses(packageName: String): Set[String]

Parameters:

  • packageName: String - The package name to scan for classes

Returns:

  • Set[String] - Set of fully qualified class names found in the package

Implementation Details:

  • Uses org.clapper.classutil.ClassFinder for efficient class discovery
  • Applies filtering heuristics via shouldExclude to remove JVM-generated artifacts:
    • Classes containing "anon" (anonymous classes)
    • Classes ending with "$class" (Scala trait implementations)
    • Classes containing "$sp" (specialized generic classes)
    • Classes containing "hive" or "Hive" (Hive-related components)
  • Scans both directory-based and JAR-based classes on the classpath

Inner Function Analysis

Extracts inner functions from a class using Java reflection, identifying methods with $$ patterns that Scala generates for inner functions.

def getInnerFunctions(classSymbol: unv.ClassSymbol): Seq[String]

Parameters:

  • classSymbol: unv.ClassSymbol - Scala reflection symbol representing the class to analyze

Returns:

  • Seq[String] - Sequence of fully qualified inner function names found in the class

Implementation Details:

  • Falls back to Java reflection when Scala reflection cannot detect inner functions
  • Specifically looks for methods containing $$ which indicate Scala compiler-generated functions
  • Gracefully handles class loading failures with warning messages

Usage Example:

import scala.reflect.runtime.universe._
import scala.reflect.runtime.{universe => unv}

val mirror = runtimeMirror(getClass.getClassLoader)
val classSymbol = mirror.classSymbol(classOf[SomeSparkClass])
val innerFunctions = GenerateMIMAIgnore.getInnerFunctions(classSymbol)
// Returns: Seq("com.example.SomeSparkClass.$$anonfun$method$1", ...)

Types

// Scala reflection universe import alias
import scala.reflect.runtime.{universe => unv}

// Core Scala reflection types used by the API
type ClassSymbol = scala.reflect.runtime.universe.ClassSymbol
type ModuleSymbol = scala.reflect.runtime.universe.ModuleSymbol  
type Symbol = scala.reflect.runtime.universe.Symbol
type RuntimeMirror = scala.reflect.runtime.universe.Mirror

// ClassFinder from external library
type ClassFinder = org.clapper.classutil.ClassFinder

// Scala compiler file I/O utilities
type File = scala.tools.nsc.io.File

Dependencies

The tool requires these runtime dependencies:

  • scala-reflect - Scala reflection API
  • scala-compiler - Scala compiler utilities for file I/O
  • org.clapper.classutil - Third-party library for class discovery

Error Handling

The tool includes comprehensive defensive error handling:

Class Loading and Reflection Errors

  • Exception Catching: Wraps class reflection operations in try-catch blocks to handle ClassNotFoundException and reflection failures
  • Error Logging: Prints descriptive error messages with class names when instrumentation fails: "Error instrumenting class:" + className
  • Graceful Degradation: Continues processing other classes when individual class analysis fails

Inner Function Detection Errors

  • Fallback Strategy: When Scala reflection fails to detect inner functions, falls back to Java reflection
  • Warning Messages: Logs warnings for classes where inner function detection fails: "[WARN] Unable to detect inner functions for class:" + classSymbol.fullName
  • Empty Results: Returns empty sequences rather than failing when inner function detection encounters errors

File I/O Error Handling

  • Safe File Operations: Uses scala.util.Try for reading existing exclusion files to handle cases where files don't exist
  • Append-Only Strategy: Reads existing file contents before writing to preserve previous exclusions
  • Iterator Fallback: Provides empty iterators when file reading fails: Try(File(".generated-mima-class-excludes").lines()).getOrElse(Iterator.empty)

Output Format

The tool generates two exclusion files with specific formatting patterns:

.generated-mima-class-excludes

Contains package-private class exclusions with MIMA-compatible patterns:

  • Class Names: Direct fully qualified class names
  • Object Names: Class names with $ replaced by # for Scala objects
  • Append Strategy: New exclusions are appended to existing file contents
org.apache.spark.internal.SomePrivateClass
org.apache.spark.internal.SomePrivateClass#
org.apache.spark.scheduler.cluster.mesos.MesosTaskLaunchData
org.apache.spark.scheduler.cluster.mesos.MesosTaskLaunchData#

.generated-mima-member-excludes

Contains package-private member exclusions including methods, fields, and inner functions:

  • Methods and Fields: Fully qualified member names
  • Inner Functions: Scala compiler-generated functions with $$ in the name
  • Append Strategy: New exclusions are appended to existing file contents
org.apache.spark.SomeClass.privateMethod
org.apache.spark.SomeClass.privateField
org.apache.spark.SomeClass.$$anonfun$someMethod$1
org.apache.spark.util.Utils.$$anonfun$tryOrIOException$1

File Generation Process

  1. Read Existing: Attempts to read existing exclusion files using scala.util.Try
  2. Append New: Concatenates new exclusions with existing content
  3. Write Complete: Writes the combined content to the exclusion files
  4. Progress Logging: Prints confirmation messages when files are created

docs

index.md

tile.json