or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

blas.mddistributions.mdindex.mdmatrices.mdvectors.md

distributions.mddocs/

0

# Statistical Distributions

1

2

Multivariate statistical distributions for probabilistic modeling and machine learning applications. Provides robust implementations that handle edge cases like singular covariance matrices.

3

4

## Capabilities

5

6

### Multivariate Gaussian Distribution

7

8

Implementation of multivariate normal distribution with support for singular covariance matrices through pseudo-inverse computation.

9

10

```scala { .api }

11

class MultivariateGaussian(val mean: Vector, val cov: Matrix) extends Serializable {

12

/** Returns density of this multivariate Gaussian at given point */

13

def pdf(x: Vector): Double

14

15

/** Returns the log-density of this multivariate Gaussian at given point */

16

def logpdf(x: Vector): Double

17

}

18

```

19

20

Usage examples:

21

22

```scala

23

import org.apache.spark.ml.linalg._

24

import org.apache.spark.ml.stat.distribution.MultivariateGaussian

25

26

// Create 2D Gaussian distribution

27

val mean = Vectors.dense(0.0, 0.0)

28

val cov = Matrices.dense(2, 2, Array(

29

1.0, 0.5, // Covariance matrix: [[1.0, 0.5],

30

0.5, 1.0 // [0.5, 1.0]]

31

))

32

33

val mvGaussian = new MultivariateGaussian(mean, cov)

34

35

// Evaluate probability density

36

val point1 = Vectors.dense(0.0, 0.0) // At mean

37

val point2 = Vectors.dense(1.0, 1.0) // Away from mean

38

39

val density1 = mvGaussian.pdf(point1) // Higher density at mean

40

val density2 = mvGaussian.pdf(point2) // Lower density away from mean

41

42

val logDensity1 = mvGaussian.logpdf(point1) // Log-density (numerically stable)

43

val logDensity2 = mvGaussian.logpdf(point2)

44

```

45

46

### Advanced Usage

47

48

#### Singular Covariance Matrices

49

50

The implementation handles singular (non-invertible) covariance matrices by computing the pseudo-inverse and working in the reduced-dimensional subspace where the distribution is supported.

51

52

```scala

53

import org.apache.spark.ml.linalg._

54

import org.apache.spark.ml.stat.distribution.MultivariateGaussian

55

56

// Singular covariance matrix (rank deficient)

57

val singularCov = Matrices.dense(3, 3, Array(

58

1.0, 1.0, 0.0, // Rows 1 and 2 are identical -> rank = 2

59

1.0, 1.0, 0.0,

60

0.0, 0.0, 1.0

61

))

62

63

val mean = Vectors.dense(0.0, 0.0, 0.0)

64

val mvGaussian = new MultivariateGaussian(mean, singularCov)

65

66

// Still works correctly with singular covariance

67

val point = Vectors.dense(1.0, 1.0, 0.5)

68

val density = mvGaussian.pdf(point)

69

val logDensity = mvGaussian.logpdf(point)

70

```

71

72

#### High-Dimensional Distributions

73

74

Efficient computation for high-dimensional multivariate Gaussians.

75

76

```scala

77

import org.apache.spark.ml.linalg._

78

import org.apache.spark.ml.stat.distribution.MultivariateGaussian

79

import java.util.Random

80

81

val dim = 100

82

val rng = new Random(42)

83

84

// Create high-dimensional Gaussian

85

val mean = Vectors.dense(Array.fill(dim)(0.0))

86

87

// Create diagonal covariance matrix for efficiency

88

val covValues = Array.fill(dim * dim)(0.0)

89

for (i <- 0 until dim) {

90

covValues(i * dim + i) = 1.0 + rng.nextGaussian() * 0.1 // Diagonal elements

91

}

92

val cov = Matrices.dense(dim, dim, covValues)

93

94

val mvGaussian = new MultivariateGaussian(mean, cov)

95

96

// Evaluate at random points

97

val testPoint = Vectors.dense(Array.fill(dim)(rng.nextGaussian()))

98

val density = mvGaussian.pdf(testPoint)

99

val logDensity = mvGaussian.logpdf(testPoint) // Preferred for numerical stability

100

```

101

102

#### Integration with Breeze

103

104

The implementation can also work with Breeze vectors and matrices for interoperability.

105

106

```scala

107

import org.apache.spark.ml.linalg._

108

import org.apache.spark.ml.stat.distribution.MultivariateGaussian

109

import breeze.linalg.{DenseVector => BDV, DenseMatrix => BDM}

110

111

// Create using Breeze types (internal constructor)

112

val breezeMean = BDV(1.0, 2.0)

113

val breezeCov = BDM((1.0, 0.3), (0.3, 1.0))

114

115

// Note: This constructor is private[ml], shown for completeness

116

// val mvGaussian = new MultivariateGaussian(breezeMean, breezeCov)

117

118

// Convert from MLlib types

119

val mean = Vectors.fromBreeze(breezeMean)

120

val cov = Matrices.fromBreeze(breezeCov)

121

val mvGaussian = new MultivariateGaussian(mean, cov)

122

```

123

124

### Mathematical Background

125

126

The multivariate Gaussian distribution has the probability density function:

127

128

```

129

pdf(x) = (2π)^(-k/2) * |Σ|^(-1/2) * exp(-1/2 * (x-μ)ᵀ * Σ⁻¹ * (x-μ))

130

```

131

132

Where:

133

- `k` is the dimensionality

134

- `μ` is the mean vector

135

- `Σ` is the covariance matrix

136

- `|Σ|` is the determinant of the covariance matrix

137

138

The implementation:

139

- Uses eigendecomposition for numerical stability

140

- Computes pseudo-determinant and pseudo-inverse for singular matrices

141

- Applies tolerance-based filtering of singular values

142

- Supports both PDF and log-PDF computation

143

144

## Error Handling

145

146

The implementation includes robust error handling for common edge cases:

147

148

```scala

149

import org.apache.spark.ml.linalg._

150

import org.apache.spark.ml.stat.distribution.MultivariateGaussian

151

152

// These will throw appropriate exceptions:

153

154

// Mismatched dimensions

155

val mean = Vectors.dense(1.0, 2.0)

156

val wrongCov = Matrices.dense(3, 3, Array.fill(9)(1.0))

157

// val mvGaussian = new MultivariateGaussian(mean, wrongCov) // IllegalArgumentException

158

159

// Non-square covariance matrix

160

val nonSquareCov = Matrices.dense(2, 3, Array.fill(6)(1.0))

161

// val mvGaussian = new MultivariateGaussian(mean, nonSquareCov) // IllegalArgumentException

162

163

// Zero covariance matrix (all eigenvalues are zero)

164

val zeroCov = Matrices.zeros(2, 2)

165

// val mvGaussian = new MultivariateGaussian(mean, zeroCov) // IllegalArgumentException

166

```

167

168

## Type Definitions

169

170

```scala { .api }

171

class MultivariateGaussian(val mean: Vector, val cov: Matrix) extends Serializable {

172

require(cov.numCols == cov.numRows, "Covariance matrix must be square")

173

require(mean.size == cov.numCols, "Mean vector length must match covariance matrix size")

174

175

def pdf(x: Vector): Double

176

def logpdf(x: Vector): Double

177

}

178

```