0
# Utility Functions
1
2
Numerical utility functions and mathematical helpers for linear algebra operations and statistical computations. These utilities provide robust implementations for common mathematical operations with numerical stability considerations.
3
4
## Capabilities
5
6
### Mathematical Constants
7
8
Machine precision and numerical stability constants.
9
10
```scala { .api }
11
object Utils {
12
/**
13
* Machine epsilon value for numerical tolerance calculations
14
* Computed as the smallest value where (1.0 + epsilon/2) != 1.0
15
* @return Machine epsilon for Double precision
16
*/
17
lazy val EPSILON: Double
18
}
19
```
20
21
### Matrix Utilities
22
23
Utilities for working with packed triangular matrix storage formats.
24
25
```scala { .api }
26
object Utils {
27
/**
28
* Convert upper triangular packed matrix to full symmetric matrix
29
* @param n Order of the n x n matrix
30
* @param triangularValues Upper triangular part in packed array (column major)
31
* @return Dense matrix representing the full symmetric matrix (column major)
32
*/
33
def unpackUpperTriangular(n: Int, triangularValues: Array[Double]): Array[Double]
34
35
/**
36
* Get index in packed upper triangular matrix format
37
* @param n Order of the n x n matrix
38
* @param i Row index (0-based)
39
* @param j Column index (0-based)
40
* @return Index in packed triangular array
41
*/
42
def indexUpperTriangular(n: Int, i: Int, j: Int): Int
43
}
44
```
45
46
**Usage Examples:**
47
48
```scala
49
import org.apache.spark.ml.impl.Utils
50
import org.apache.spark.ml.linalg._
51
52
// Machine epsilon for numerical comparisons
53
val tolerance = Utils.EPSILON * 1000
54
val isZero = math.abs(someValue) < tolerance
55
56
// Working with packed triangular matrices
57
val n = 3
58
val packedMatrix = Array(1.0, 2.0, 3.0, 4.0, 5.0, 6.0) // Upper triangular part
59
60
// Convert to full symmetric matrix
61
val fullMatrix = Utils.unpackUpperTriangular(n, packedMatrix)
62
val matrix = new DenseMatrix(n, n, fullMatrix)
63
64
// Access specific element in packed format
65
val index = Utils.indexUpperTriangular(n, 1, 2) // Index for element (1,2)
66
val value = packedMatrix(index)
67
```
68
69
### Numerical Stability Functions
70
71
Functions for numerically stable mathematical operations.
72
73
```scala { .api }
74
object Utils {
75
/**
76
* Numerically stable computation of log(1 + exp(x))
77
* Prevents arithmetic overflow for large positive x values
78
* @param x Input value
79
* @return log(1 + exp(x)) computed in numerically stable way
80
*/
81
def log1pExp(x: Double): Double
82
}
83
```
84
85
**Usage Examples:**
86
87
```scala
88
import org.apache.spark.ml.impl.Utils
89
90
// Safe computation avoiding overflow
91
val largeX = 800.0
92
val result = Utils.log1pExp(largeX) // Would overflow with naive math.log(1 + math.exp(x))
93
94
val smallX = -10.0
95
val result2 = Utils.log1pExp(smallX) // Handles negative values correctly
96
```
97
98
### Softmax Operations
99
100
Numerically stable softmax computations for probability distributions.
101
102
```scala { .api }
103
object Utils {
104
/**
105
* Perform in-place softmax conversion on array
106
* @param array Array to convert (modified in-place)
107
*/
108
def softmax(array: Array[Double]): Unit
109
110
/**
111
* Perform softmax conversion with flexible indexing
112
* @param input Input array
113
* @param n Number of elements to process
114
* @param offset Starting offset in input array
115
* @param step Step size between elements
116
* @param output Output array for results
117
*/
118
def softmax(
119
input: Array[Double],
120
n: Int,
121
offset: Int,
122
step: Int,
123
output: Array[Double]
124
): Unit
125
}
126
```
127
128
**Usage Examples:**
129
130
```scala
131
import org.apache.spark.ml.impl.Utils
132
133
// Simple in-place softmax
134
val logits = Array(2.0, 1.0, 0.1)
135
Utils.softmax(logits) // logits now contains probabilities that sum to 1.0
136
137
// Advanced softmax with custom indexing
138
val input = Array(1.0, 5.0, 2.0, 3.0, 1.0, 2.0)
139
val output = Array.ofDim[Double](6)
140
141
// Process every other element starting from index 1
142
Utils.softmax(
143
input = input,
144
n = 3, // Process 3 elements
145
offset = 1, // Start at index 1
146
step = 2, // Skip every other element
147
output = output
148
)
149
// output(1), output(3), output(5) contain softmax probabilities
150
```
151
152
## Implementation Details
153
154
### Numerical Stability
155
156
The utility functions implement several numerical stability techniques:
157
158
1. **Machine Epsilon**: Computed dynamically to match the current system's floating-point precision
159
2. **Overflow Prevention**: `log1pExp` uses conditional logic to prevent arithmetic overflow
160
3. **Softmax Stability**: Subtracts maximum value before exponentiation to prevent overflow
161
4. **Infinity Handling**: Special cases for positive infinity in softmax operations
162
163
### Memory Efficiency
164
165
- **In-place Operations**: Softmax can operate in-place to minimize memory allocation
166
- **Flexible Indexing**: Advanced softmax supports strided access patterns for efficient memory usage
167
- **Lazy Evaluation**: EPSILON is computed only once using lazy initialization
168
169
### Error Handling
170
171
```scala
172
// These operations validate inputs and throw exceptions for invalid parameters:
173
174
// Index validation in triangular matrix operations
175
Utils.indexUpperTriangular(-1, 0, 0) // throws IllegalArgumentException
176
Utils.indexUpperTriangular(3, 5, 0) // throws IllegalArgumentException
177
178
// Array bounds checking in softmax operations
179
val tooSmall = Array(1.0)
180
Utils.softmax(tooSmall, 5, 0, 1, Array.ofDim[Double](5)) // may throw exception
181
```
182
183
## Integration with Other Components
184
185
These utilities are used internally throughout Spark MLlib:
186
187
- **EPSILON**: Used in MultivariateGaussian for singular value tolerance calculations
188
- **Triangular Operations**: Used in symmetric matrix operations and BLAS routines
189
- **log1pExp**: Used in logistic regression and other probabilistic models
190
- **Softmax**: Used in neural networks and multi-class classification algorithms
191
192
## Mathematical Background
193
194
### Machine Epsilon
195
196
Machine epsilon (ε) is the smallest positive number such that 1 + ε ≠ 1 in floating-point arithmetic. It's computed iteratively:
197
198
```scala
199
var eps = 1.0
200
while ((1.0 + (eps / 2.0)) != 1.0) {
201
eps /= 2.0
202
}
203
```
204
205
### Packed Triangular Storage
206
207
Upper triangular matrices can be stored compactly by storing only the upper triangle:
208
209
```
210
Matrix: Packed Storage:
211
[a b c] → [a d b e c f]
212
[0 d e]
213
[0 0 f]
214
```
215
216
The index mapping is: `index = j * (j + 1) / 2 + i` for i ≤ j.
217
218
### Log1pExp Stability
219
220
For large x, computing `log(1 + exp(x))` directly causes overflow. The stable version uses:
221
222
```
223
log1pExp(x) = x + log1p(exp(-x)) for x > 0
224
= log1p(exp(x)) for x ≤ 0
225
```