Numerical utility functions and mathematical helpers for linear algebra operations and statistical computations. These utilities provide robust implementations for common mathematical operations with numerical stability considerations.
Machine precision and numerical stability constants.
object Utils {
/**
* Machine epsilon value for numerical tolerance calculations
* Computed as the smallest value where (1.0 + epsilon/2) != 1.0
* @return Machine epsilon for Double precision
*/
lazy val EPSILON: Double
}Utilities for working with packed triangular matrix storage formats.
object Utils {
/**
* Convert upper triangular packed matrix to full symmetric matrix
* @param n Order of the n x n matrix
* @param triangularValues Upper triangular part in packed array (column major)
* @return Dense matrix representing the full symmetric matrix (column major)
*/
def unpackUpperTriangular(n: Int, triangularValues: Array[Double]): Array[Double]
/**
* Get index in packed upper triangular matrix format
* @param n Order of the n x n matrix
* @param i Row index (0-based)
* @param j Column index (0-based)
* @return Index in packed triangular array
*/
def indexUpperTriangular(n: Int, i: Int, j: Int): Int
}Usage Examples:
import org.apache.spark.ml.impl.Utils
import org.apache.spark.ml.linalg._
// Machine epsilon for numerical comparisons
val tolerance = Utils.EPSILON * 1000
val isZero = math.abs(someValue) < tolerance
// Working with packed triangular matrices
val n = 3
val packedMatrix = Array(1.0, 2.0, 3.0, 4.0, 5.0, 6.0) // Upper triangular part
// Convert to full symmetric matrix
val fullMatrix = Utils.unpackUpperTriangular(n, packedMatrix)
val matrix = new DenseMatrix(n, n, fullMatrix)
// Access specific element in packed format
val index = Utils.indexUpperTriangular(n, 1, 2) // Index for element (1,2)
val value = packedMatrix(index)Functions for numerically stable mathematical operations.
object Utils {
/**
* Numerically stable computation of log(1 + exp(x))
* Prevents arithmetic overflow for large positive x values
* @param x Input value
* @return log(1 + exp(x)) computed in numerically stable way
*/
def log1pExp(x: Double): Double
}Usage Examples:
import org.apache.spark.ml.impl.Utils
// Safe computation avoiding overflow
val largeX = 800.0
val result = Utils.log1pExp(largeX) // Would overflow with naive math.log(1 + math.exp(x))
val smallX = -10.0
val result2 = Utils.log1pExp(smallX) // Handles negative values correctlyNumerically stable softmax computations for probability distributions.
object Utils {
/**
* Perform in-place softmax conversion on array
* @param array Array to convert (modified in-place)
*/
def softmax(array: Array[Double]): Unit
/**
* Perform softmax conversion with flexible indexing
* @param input Input array
* @param n Number of elements to process
* @param offset Starting offset in input array
* @param step Step size between elements
* @param output Output array for results
*/
def softmax(
input: Array[Double],
n: Int,
offset: Int,
step: Int,
output: Array[Double]
): Unit
}Usage Examples:
import org.apache.spark.ml.impl.Utils
// Simple in-place softmax
val logits = Array(2.0, 1.0, 0.1)
Utils.softmax(logits) // logits now contains probabilities that sum to 1.0
// Advanced softmax with custom indexing
val input = Array(1.0, 5.0, 2.0, 3.0, 1.0, 2.0)
val output = Array.ofDim[Double](6)
// Process every other element starting from index 1
Utils.softmax(
input = input,
n = 3, // Process 3 elements
offset = 1, // Start at index 1
step = 2, // Skip every other element
output = output
)
// output(1), output(3), output(5) contain softmax probabilitiesThe utility functions implement several numerical stability techniques:
log1pExp uses conditional logic to prevent arithmetic overflow// These operations validate inputs and throw exceptions for invalid parameters:
// Index validation in triangular matrix operations
Utils.indexUpperTriangular(-1, 0, 0) // throws IllegalArgumentException
Utils.indexUpperTriangular(3, 5, 0) // throws IllegalArgumentException
// Array bounds checking in softmax operations
val tooSmall = Array(1.0)
Utils.softmax(tooSmall, 5, 0, 1, Array.ofDim[Double](5)) // may throw exceptionThese utilities are used internally throughout Spark MLlib:
Machine epsilon (ε) is the smallest positive number such that 1 + ε ≠ 1 in floating-point arithmetic. It's computed iteratively:
var eps = 1.0
while ((1.0 + (eps / 2.0)) != 1.0) {
eps /= 2.0
}Upper triangular matrices can be stored compactly by storing only the upper triangle:
Matrix: Packed Storage:
[a b c] → [a d b e c f]
[0 d e]
[0 0 f]The index mapping is: index = j * (j + 1) / 2 + i for i ≤ j.
For large x, computing log(1 + exp(x)) directly causes overflow. The stable version uses:
log1pExp(x) = x + log1p(exp(-x)) for x > 0
= log1p(exp(x)) for x ≤ 0