or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

blas.mddistributions.mdindex.mdmatrices.mdutils.mdvectors.md

utils.mddocs/

0

# Utility Functions

1

2

Numerical utility functions and mathematical helpers for linear algebra operations and statistical computations. These utilities provide robust implementations for common mathematical operations with numerical stability considerations.

3

4

## Capabilities

5

6

### Mathematical Constants

7

8

Machine precision and numerical stability constants.

9

10

```scala { .api }

11

object Utils {

12

/**

13

* Machine epsilon value for numerical tolerance calculations

14

* Computed as the smallest value where (1.0 + epsilon/2) != 1.0

15

* @return Machine epsilon for Double precision

16

*/

17

lazy val EPSILON: Double

18

}

19

```

20

21

### Matrix Utilities

22

23

Utilities for working with packed triangular matrix storage formats.

24

25

```scala { .api }

26

object Utils {

27

/**

28

* Convert upper triangular packed matrix to full symmetric matrix

29

* @param n Order of the n x n matrix

30

* @param triangularValues Upper triangular part in packed array (column major)

31

* @return Dense matrix representing the full symmetric matrix (column major)

32

*/

33

def unpackUpperTriangular(n: Int, triangularValues: Array[Double]): Array[Double]

34

35

/**

36

* Get index in packed upper triangular matrix format

37

* @param n Order of the n x n matrix

38

* @param i Row index (0-based)

39

* @param j Column index (0-based)

40

* @return Index in packed triangular array

41

*/

42

def indexUpperTriangular(n: Int, i: Int, j: Int): Int

43

}

44

```

45

46

**Usage Examples:**

47

48

```scala

49

import org.apache.spark.ml.impl.Utils

50

import org.apache.spark.ml.linalg._

51

52

// Machine epsilon for numerical comparisons

53

val tolerance = Utils.EPSILON * 1000

54

val isZero = math.abs(someValue) < tolerance

55

56

// Working with packed triangular matrices

57

val n = 3

58

val packedMatrix = Array(1.0, 2.0, 3.0, 4.0, 5.0, 6.0) // Upper triangular part

59

60

// Convert to full symmetric matrix

61

val fullMatrix = Utils.unpackUpperTriangular(n, packedMatrix)

62

val matrix = new DenseMatrix(n, n, fullMatrix)

63

64

// Access specific element in packed format

65

val index = Utils.indexUpperTriangular(n, 1, 2) // Index for element (1,2)

66

val value = packedMatrix(index)

67

```

68

69

### Numerical Stability Functions

70

71

Functions for numerically stable mathematical operations.

72

73

```scala { .api }

74

object Utils {

75

/**

76

* Numerically stable computation of log(1 + exp(x))

77

* Prevents arithmetic overflow for large positive x values

78

* @param x Input value

79

* @return log(1 + exp(x)) computed in numerically stable way

80

*/

81

def log1pExp(x: Double): Double

82

}

83

```

84

85

**Usage Examples:**

86

87

```scala

88

import org.apache.spark.ml.impl.Utils

89

90

// Safe computation avoiding overflow

91

val largeX = 800.0

92

val result = Utils.log1pExp(largeX) // Would overflow with naive math.log(1 + math.exp(x))

93

94

val smallX = -10.0

95

val result2 = Utils.log1pExp(smallX) // Handles negative values correctly

96

```

97

98

### Softmax Operations

99

100

Numerically stable softmax computations for probability distributions.

101

102

```scala { .api }

103

object Utils {

104

/**

105

* Perform in-place softmax conversion on array

106

* @param array Array to convert (modified in-place)

107

*/

108

def softmax(array: Array[Double]): Unit

109

110

/**

111

* Perform softmax conversion with flexible indexing

112

* @param input Input array

113

* @param n Number of elements to process

114

* @param offset Starting offset in input array

115

* @param step Step size between elements

116

* @param output Output array for results

117

*/

118

def softmax(

119

input: Array[Double],

120

n: Int,

121

offset: Int,

122

step: Int,

123

output: Array[Double]

124

): Unit

125

}

126

```

127

128

**Usage Examples:**

129

130

```scala

131

import org.apache.spark.ml.impl.Utils

132

133

// Simple in-place softmax

134

val logits = Array(2.0, 1.0, 0.1)

135

Utils.softmax(logits) // logits now contains probabilities that sum to 1.0

136

137

// Advanced softmax with custom indexing

138

val input = Array(1.0, 5.0, 2.0, 3.0, 1.0, 2.0)

139

val output = Array.ofDim[Double](6)

140

141

// Process every other element starting from index 1

142

Utils.softmax(

143

input = input,

144

n = 3, // Process 3 elements

145

offset = 1, // Start at index 1

146

step = 2, // Skip every other element

147

output = output

148

)

149

// output(1), output(3), output(5) contain softmax probabilities

150

```

151

152

## Implementation Details

153

154

### Numerical Stability

155

156

The utility functions implement several numerical stability techniques:

157

158

1. **Machine Epsilon**: Computed dynamically to match the current system's floating-point precision

159

2. **Overflow Prevention**: `log1pExp` uses conditional logic to prevent arithmetic overflow

160

3. **Softmax Stability**: Subtracts maximum value before exponentiation to prevent overflow

161

4. **Infinity Handling**: Special cases for positive infinity in softmax operations

162

163

### Memory Efficiency

164

165

- **In-place Operations**: Softmax can operate in-place to minimize memory allocation

166

- **Flexible Indexing**: Advanced softmax supports strided access patterns for efficient memory usage

167

- **Lazy Evaluation**: EPSILON is computed only once using lazy initialization

168

169

### Error Handling

170

171

```scala

172

// These operations validate inputs and throw exceptions for invalid parameters:

173

174

// Index validation in triangular matrix operations

175

Utils.indexUpperTriangular(-1, 0, 0) // throws IllegalArgumentException

176

Utils.indexUpperTriangular(3, 5, 0) // throws IllegalArgumentException

177

178

// Array bounds checking in softmax operations

179

val tooSmall = Array(1.0)

180

Utils.softmax(tooSmall, 5, 0, 1, Array.ofDim[Double](5)) // may throw exception

181

```

182

183

## Integration with Other Components

184

185

These utilities are used internally throughout Spark MLlib:

186

187

- **EPSILON**: Used in MultivariateGaussian for singular value tolerance calculations

188

- **Triangular Operations**: Used in symmetric matrix operations and BLAS routines

189

- **log1pExp**: Used in logistic regression and other probabilistic models

190

- **Softmax**: Used in neural networks and multi-class classification algorithms

191

192

## Mathematical Background

193

194

### Machine Epsilon

195

196

Machine epsilon (ε) is the smallest positive number such that 1 + ε ≠ 1 in floating-point arithmetic. It's computed iteratively:

197

198

```scala

199

var eps = 1.0

200

while ((1.0 + (eps / 2.0)) != 1.0) {

201

eps /= 2.0

202

}

203

```

204

205

### Packed Triangular Storage

206

207

Upper triangular matrices can be stored compactly by storing only the upper triangle:

208

209

```

210

Matrix: Packed Storage:

211

[a b c] → [a d b e c f]

212

[0 d e]

213

[0 0 f]

214

```

215

216

The index mapping is: `index = j * (j + 1) / 2 + i` for i ≤ j.

217

218

### Log1pExp Stability

219

220

For large x, computing `log(1 + exp(x))` directly causes overflow. The stable version uses:

221

222

```

223

log1pExp(x) = x + log1p(exp(-x)) for x > 0

224

= log1p(exp(x)) for x ≤ 0

225

```