or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

index.md
tile.json

tessl/maven-org-apache-spark--spark-tools-2-12

Development tool for generating MIMA exclusion files to support binary compatibility checking in Apache Spark builds

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
mavenpkg:maven/org.apache.spark/spark-tools_2.12@3.0.x

To install, run

npx @tessl/cli install tessl/maven-org-apache-spark--spark-tools-2-12@3.0.0

index.mddocs/

Spark Tools

Spark Tools is a development utility for Apache Spark that generates MIMA (Migration Manager for Scala) exclusion files. It analyzes compiled Spark classes to identify package-private APIs that should be excluded from binary compatibility checks, supporting Spark's release engineering process.

Package Information

  • Package Name: spark-tools_2.12
  • Package Type: maven
  • Language: Scala
  • Installation: Part of Apache Spark distribution
  • Maven Coordinates: org.apache.spark:spark-tools_2.12:3.0.1

Core Imports

import org.apache.spark.tools.GenerateMIMAIgnore

// For direct API usage (advanced scenarios)
import scala.reflect.runtime.{universe => unv}
import scala.reflect.runtime.universe.runtimeMirror
import org.clapper.classutil.ClassFinder

Basic Usage

This tool is designed to be executed via Apache Spark's spark-class script:

./spark-class org.apache.spark.tools.GenerateMIMAIgnore

The tool will:

  1. Scan all classes in the org.apache.spark package
  2. Identify package-private classes and members
  3. Generate two exclusion files in the current directory:
    • .generated-mima-class-excludes
    • .generated-mima-member-excludes

Architecture

The tool operates through Scala reflection to analyze compiled bytecode:

  • Class Discovery: Uses org.clapper.classutil.ClassFinder to locate all Spark classes on the classpath
  • Reflection Analysis: Leverages Scala's runtime reflection API (scala.reflect.runtime.universe) to examine visibility modifiers and package privacy
  • Privacy Detection: Implements both direct privacy checking (class-level modifiers) and indirect privacy checking (inheritance from package-private outer classes)
  • Filtering Logic: Applies heuristics to exclude JVM-generated classes, anonymous functions, and compiler artifacts
  • Inner Function Detection: Uses Java reflection to discover Scala-generated inner functions (methods with $$ patterns)
  • File Generation: Outputs exclusion patterns in MIMA-compatible format with safe file I/O using scala.util.Try

Processing Algorithm

  1. Class Scanning: Discovers all classes in org.apache.spark package using ClassFinder
  2. Privacy Analysis: For each class, checks direct privacy (private[spark] annotations) and indirect privacy (nested within private classes)
  3. Member Analysis: Examines class members for package-private methods and fields
  4. Inner Function Detection: Uses Java reflection to find Scala compiler-generated inner functions
  5. Exclusion Generation: Creates MIMA exclusion patterns and appends to existing exclusion files

Capabilities

Main Application Entry Point

Executes the complete MIMA exclusion generation process for Apache Spark classes.

def main(args: Array[String]): Unit

Parameters:

  • args: Array[String] - Command line arguments (currently unused)

Side Effects:

  • Creates .generated-mima-class-excludes file containing class exclusion patterns
  • Creates .generated-mima-member-excludes file containing member exclusion patterns
  • Prints progress messages to stdout

Usage Example:

// Typically invoked via spark-class script
object MyApp {
  def main(args: Array[String]): Unit = {
    org.apache.spark.tools.GenerateMIMAIgnore.main(Array.empty)
  }
}

Package Privacy Analysis

Analyzes all classes in a given package to identify package-private classes and members that should be excluded from MIMA binary compatibility checks.

def privateWithin(packageName: String): (Set[String], Set[String])

Parameters:

  • packageName: String - The package name to analyze (typically "org.apache.spark")

Returns:

  • (Set[String], Set[String]) - Tuple containing:
    • First element: Set of package-private class names with MIMA-compatible patterns
    • Second element: Set of package-private member names

Usage Example:

val (privateClasses, privateMembers) = GenerateMIMAIgnore.privateWithin("org.apache.spark")
// privateClasses contains: Set("org.apache.spark.internal.SomeClass", "org.apache.spark.internal.SomeClass#")
// privateMembers contains: Set("org.apache.spark.SomeClass.privateMethod", ...)

Class Discovery

Scans all classes accessible from the context class loader which belong to the given package and subpackages, filtering out JVM-generated artifacts.

def getClasses(packageName: String): Set[String]

Parameters:

  • packageName: String - The package name to scan for classes

Returns:

  • Set[String] - Set of fully qualified class names found in the package

Implementation Details:

  • Uses org.clapper.classutil.ClassFinder for efficient class discovery
  • Applies filtering heuristics via shouldExclude to remove JVM-generated artifacts:
    • Classes containing "anon" (anonymous classes)
    • Classes ending with "$class" (Scala trait implementations)
    • Classes containing "$sp" (specialized generic classes)
    • Classes containing "hive" or "Hive" (Hive-related components)
  • Scans both directory-based and JAR-based classes on the classpath

Inner Function Analysis

Extracts inner functions from a class using Java reflection, identifying methods with $$ patterns that Scala generates for inner functions.

def getInnerFunctions(classSymbol: unv.ClassSymbol): Seq[String]

Parameters:

  • classSymbol: unv.ClassSymbol - Scala reflection symbol representing the class to analyze

Returns:

  • Seq[String] - Sequence of fully qualified inner function names found in the class

Implementation Details:

  • Falls back to Java reflection when Scala reflection cannot detect inner functions
  • Specifically looks for methods containing $$ which indicate Scala compiler-generated functions
  • Gracefully handles class loading failures with warning messages

Usage Example:

import scala.reflect.runtime.universe._
import scala.reflect.runtime.{universe => unv}

val mirror = runtimeMirror(getClass.getClassLoader)
val classSymbol = mirror.classSymbol(classOf[SomeSparkClass])
val innerFunctions = GenerateMIMAIgnore.getInnerFunctions(classSymbol)
// Returns: Seq("com.example.SomeSparkClass.$$anonfun$method$1", ...)

Types

// Scala reflection universe import alias
import scala.reflect.runtime.{universe => unv}

// Core Scala reflection types used by the API
type ClassSymbol = scala.reflect.runtime.universe.ClassSymbol
type ModuleSymbol = scala.reflect.runtime.universe.ModuleSymbol  
type Symbol = scala.reflect.runtime.universe.Symbol
type RuntimeMirror = scala.reflect.runtime.universe.Mirror

// ClassFinder from external library
type ClassFinder = org.clapper.classutil.ClassFinder

// Scala compiler file I/O utilities
type File = scala.tools.nsc.io.File

Dependencies

The tool requires these runtime dependencies:

  • scala-reflect - Scala reflection API
  • scala-compiler - Scala compiler utilities for file I/O
  • org.clapper.classutil - Third-party library for class discovery

Error Handling

The tool includes comprehensive defensive error handling:

Class Loading and Reflection Errors

  • Exception Catching: Wraps class reflection operations in try-catch blocks to handle ClassNotFoundException and reflection failures
  • Error Logging: Prints descriptive error messages with class names when instrumentation fails: "Error instrumenting class:" + className
  • Graceful Degradation: Continues processing other classes when individual class analysis fails

Inner Function Detection Errors

  • Fallback Strategy: When Scala reflection fails to detect inner functions, falls back to Java reflection
  • Warning Messages: Logs warnings for classes where inner function detection fails: "[WARN] Unable to detect inner functions for class:" + classSymbol.fullName
  • Empty Results: Returns empty sequences rather than failing when inner function detection encounters errors

File I/O Error Handling

  • Safe File Operations: Uses scala.util.Try for reading existing exclusion files to handle cases where files don't exist
  • Append-Only Strategy: Reads existing file contents before writing to preserve previous exclusions
  • Iterator Fallback: Provides empty iterators when file reading fails: Try(File(".generated-mima-class-excludes").lines()).getOrElse(Iterator.empty)

Output Format

The tool generates two exclusion files with specific formatting patterns:

.generated-mima-class-excludes

Contains package-private class exclusions with MIMA-compatible patterns:

  • Class Names: Direct fully qualified class names
  • Object Names: Class names with $ replaced by # for Scala objects
  • Append Strategy: New exclusions are appended to existing file contents
org.apache.spark.internal.SomePrivateClass
org.apache.spark.internal.SomePrivateClass#
org.apache.spark.scheduler.cluster.mesos.MesosTaskLaunchData
org.apache.spark.scheduler.cluster.mesos.MesosTaskLaunchData#

.generated-mima-member-excludes

Contains package-private member exclusions including methods, fields, and inner functions:

  • Methods and Fields: Fully qualified member names
  • Inner Functions: Scala compiler-generated functions with $$ in the name
  • Append Strategy: New exclusions are appended to existing file contents
org.apache.spark.SomeClass.privateMethod
org.apache.spark.SomeClass.privateField
org.apache.spark.SomeClass.$$anonfun$someMethod$1
org.apache.spark.util.Utils.$$anonfun$tryOrIOException$1

File Generation Process

  1. Read Existing: Attempts to read existing exclusion files using scala.util.Try
  2. Append New: Concatenates new exclusions with existing content
  3. Write Complete: Writes the combined content to the exclusion files
  4. Progress Logging: Prints confirmation messages when files are created