or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/maven-org-apache-spark--spark-mllib-local_2-13

Spark Project ML Local Library provides local (non-distributed) linear algebra utilities and basic machine learning components.

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
mavenpkg:maven/org.apache.spark/spark-mllib-local_2.13@4.0.x

To install, run

npx @tessl/cli install tessl/maven-org-apache-spark--spark-mllib-local_2-13@4.0.0

0

# Spark MLlib Local

1

2

Apache Spark's MLlib Local library provides essential linear algebra operations, vector and matrix manipulations, and basic statistical utilities designed for local (non-distributed) computation. This library serves as the mathematical foundation for Spark's distributed machine learning algorithms by offering optimized implementations of dense and sparse vector/matrix representations, multivariate Gaussian distributions, and utility functions for numerical computations.

3

4

## Package Information

5

6

- **Package Name**: spark-mllib-local_2.13

7

- **Package Type**: Maven

8

- **Language**: Scala

9

- **Installation**: `org.apache.spark:spark-mllib-local_2.13:4.0.0`

10

11

## Core Imports

12

13

```scala

14

import org.apache.spark.ml.linalg._

15

import org.apache.spark.ml.stat.distribution.MultivariateGaussian

16

```

17

18

## Basic Usage

19

20

```scala

21

import org.apache.spark.ml.linalg._

22

import org.apache.spark.ml.stat.distribution.MultivariateGaussian

23

24

// Create vectors

25

val denseVec = Vectors.dense(1.0, 2.0, 3.0, 4.0)

26

val sparseVec = Vectors.sparse(4, Array(0, 2), Array(1.0, 3.0))

27

28

// Vector operations

29

val dot = denseVec.dot(sparseVec)

30

val norm = Vectors.norm(denseVec, 2.0)

31

32

// Create matrices

33

val denseMatrix = DenseMatrix.zeros(3, 3)

34

val sparseMatrix = SparseMatrix.speye(3)

35

36

// Matrix operations

37

val result = denseMatrix.multiply(sparseMatrix)

38

val transposed = denseMatrix.transpose

39

40

// Statistical distribution

41

val mean = Vectors.dense(0.0, 0.0)

42

val cov = DenseMatrix.eye(2)

43

val gaussian = new MultivariateGaussian(mean, cov)

44

val density = gaussian.pdf(Vectors.dense(1.0, 1.0))

45

```

46

47

## Architecture

48

49

The library is organized around several key components:

50

51

- **Vector Types**: Dense and sparse vector implementations with automatic format selection for optimal storage and computation efficiency

52

- **Matrix Types**: Dense and sparse matrices supporting both column-major and row-major layouts with automatic compression

53

- **Statistical Components**: Multivariate statistical distributions with numerical stability features for singular covariance matrices

54

55

Built on top of Breeze (Scala's scientific computing library) and optimized BLAS implementations, the library enables efficient local mathematical operations that seamlessly integrate into larger distributed machine learning pipelines.

56

57

## Capabilities

58

59

### Linear Algebra Operations

60

61

Core vector and matrix operations including creation, manipulation, arithmetic operations, and format conversions. Provides both dense and sparse implementations with automatic optimization for storage efficiency and computational performance.

62

63

```scala { .api }

64

// Vector creation and operations

65

def dense(values: Array[Double]): Vector

66

def sparse(size: Int, indices: Array[Int], values: Array[Double]): Vector

67

def zeros(size: Int): Vector

68

def norm(vector: Vector, p: Double): Double

69

def sqdist(v1: Vector, v2: Vector): Double

70

71

// Matrix creation and operations

72

def dense(numRows: Int, numCols: Int, values: Array[Double]): Matrix

73

def sparse(numRows: Int, numCols: Int, colPtrs: Array[Int], rowIndices: Array[Int], values: Array[Double]): Matrix

74

def zeros(numRows: Int, numCols: Int): Matrix

75

def eye(n: Int): Matrix

76

```

77

78

[Linear Algebra](./linear-algebra.md)

79

80

81

### Statistical Distributions

82

83

Multivariate statistical distributions with robust numerical implementations that handle edge cases like singular covariance matrices. Provides probability density function calculations with numerical stability features.

84

85

```scala { .api }

86

class MultivariateGaussian(mean: Vector, cov: Matrix) {

87

def pdf(x: Vector): Double

88

def logpdf(x: Vector): Double

89

}

90

```

91

92

[Statistical Distributions](./statistical-distributions.md)

93

94

### BLAS Operations

95

96

Basic Linear Algebra Subprograms (BLAS) providing optimized implementations of vector and matrix operations. Includes Level 1 (vector-vector), Level 2 (matrix-vector), and Level 3 (matrix-matrix) operations.

97

98

```scala { .api }

99

object BLAS {

100

// Level 1: Vector operations

101

def axpy(a: Double, x: Vector, y: Vector): Unit

102

def dot(x: Vector, y: Vector): Double

103

def copy(x: Vector, y: Vector): Unit

104

def scal(a: Double, x: Vector): Unit

105

106

// Level 2: Matrix-vector operations

107

def gemv(alpha: Double, A: Matrix, x: Vector, beta: Double, y: DenseVector): Unit

108

def syr(alpha: Double, x: Vector, A: DenseMatrix): Unit

109

110

// Level 3: Matrix-matrix operations

111

def gemm(alpha: Double, A: Matrix, B: DenseMatrix, beta: Double, C: DenseMatrix): Unit

112

}

113

```

114

115

## Types

116

117

### Core Vector Types

118

119

```scala { .api }

120

sealed trait Vector extends Serializable {

121

def size: Int

122

def toArray: Array[Double]

123

def apply(i: Int): Double

124

def copy: Vector

125

def foreachActive(f: (Int, Double) => Unit): Unit

126

def numActives: Int

127

def numNonzeros: Int

128

def toSparse: SparseVector

129

def toDense: DenseVector

130

def compressed: Vector

131

def argmax: Int

132

def dot(v: Vector): Double

133

}

134

135

class DenseVector(val values: Array[Double]) extends Vector

136

137

class SparseVector(

138

override val size: Int,

139

val indices: Array[Int],

140

val values: Array[Double]

141

) extends Vector

142

```

143

144

### Core Matrix Types

145

146

```scala { .api }

147

sealed trait Matrix extends Serializable {

148

def numRows: Int

149

def numCols: Int

150

val isTransposed: Boolean

151

def toArray: Array[Double]

152

def colIter: Iterator[Vector]

153

def rowIter: Iterator[Vector]

154

def apply(i: Int, j: Int): Double

155

def copy: Matrix

156

def transpose: Matrix

157

def multiply(y: DenseMatrix): DenseMatrix

158

def multiply(y: Vector): DenseVector

159

def foreachActive(f: (Int, Int, Double) => Unit): Unit

160

def numNonzeros: Int

161

def numActives: Int

162

def toSparse: SparseMatrix

163

def toDense: DenseMatrix

164

def compressed: Matrix

165

}

166

167

class DenseMatrix(

168

val numRows: Int,

169

val numCols: Int,

170

val values: Array[Double],

171

override val isTransposed: Boolean = false

172

) extends Matrix

173

174

class SparseMatrix(

175

val numRows: Int,

176

val numCols: Int,

177

val colPtrs: Array[Int],

178

val rowIndices: Array[Int],

179

val values: Array[Double],

180

override val isTransposed: Boolean = false

181

) extends Matrix

182

```