or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

distributions.mdindex.mdmatrices.mdtesting.mdvectors.md

index.mddocs/

0

# Spark MLlib Local Library

1

2

Spark MLlib Local is a lightweight, dependency-minimal package that provides fundamental linear algebra operations and basic statistical utilities for machine learning tasks that can be executed locally without requiring a distributed Spark cluster framework.

3

4

## Package Information

5

6

- **Package Name**: spark-mllib-local_2.11

7

- **Package Type**: maven

8

- **Language**: Scala 2.11

9

- **Group ID**: org.apache.spark

10

- **Artifact ID**: spark-mllib-local_2.11

11

- **Installation**:

12

```xml

13

<dependency>

14

<groupId>org.apache.spark</groupId>

15

<artifactId>spark-mllib-local_2.11</artifactId>

16

<version>2.2.3</version>

17

</dependency>

18

```

19

20

For SBT:

21

```scala

22

libraryDependencies += "org.apache.spark" %% "spark-mllib-local" % "2.2.3"

23

```

24

25

## Core Imports

26

27

```scala

28

import org.apache.spark.ml.linalg.{Vector, DenseVector, SparseVector, Vectors}

29

import org.apache.spark.ml.linalg.{Matrix, DenseMatrix, SparseMatrix, Matrices}

30

import org.apache.spark.ml.stat.distribution.MultivariateGaussian

31

```

32

33

## Basic Usage

34

35

```scala

36

import org.apache.spark.ml.linalg.{Vectors, Matrices}

37

import org.apache.spark.ml.stat.distribution.MultivariateGaussian

38

39

// Create vectors

40

val dense = Vectors.dense(1.0, 2.0, 3.0)

41

val sparse = Vectors.sparse(5, Array(0, 2, 4), Array(1.0, 3.0, 5.0))

42

43

// Create matrices

44

val denseMatrix = Matrices.dense(2, 3, Array(1.0, 2.0, 3.0, 4.0, 5.0, 6.0))

45

val sparseMatrix = Matrices.sparse(3, 3, Array(0, 2, 3, 6), Array(0, 2, 1, 0, 1, 2), Array(1.0, 2.0, 3.0, 4.0, 5.0, 6.0))

46

47

// Vector operations

48

val vectorCopy = dense.copy

49

val dotProduct = sparse.toArray.zip(dense.toArray).map{ case (a, b) => a * b }.sum

50

val l2Norm = Vectors.norm(dense, 2.0)

51

val distance = Vectors.sqdist(dense, sparse)

52

53

// Matrix operations

54

val matrixProduct = denseMatrix.multiply(sparseMatrix.transpose)

55

val transposed = denseMatrix.transpose

56

val compressed = sparseMatrix.compressed

57

58

// Statistical distribution

59

val mean = Vectors.dense(0.0, 0.0)

60

val cov = Matrices.eye(2)

61

val gaussian = new MultivariateGaussian(mean, cov)

62

val density = gaussian.pdf(Vectors.dense(1.0, 1.0))

63

```

64

65

## Architecture

66

67

Spark MLlib Local is organized around several key components:

68

69

- **Vector System**: Dense and sparse vector implementations with unified operations

70

- **Matrix System**: Dense and sparse matrix implementations supporting various storage formats

71

- **Statistical Distributions**: Probability distributions for machine learning algorithms

72

- **Testing Utilities**: Tolerance-based comparison utilities for numerical testing

73

74

The library emphasizes performance through:

75

- **Optimized Storage**: Automatic selection between dense/sparse representations based on data characteristics

76

- **Lazy Evaluation**: Operations like compression are performed on-demand to avoid unnecessary computation

77

- **Memory Efficiency**: Compact storage formats for both dense and sparse data structures

78

- **Native Integration**: Internal integration with optimized mathematical libraries

79

80

## Capabilities

81

82

### Vector Operations

83

84

Core vector functionality providing dense and sparse representations with unified operations. Essential for feature representations and mathematical computations.

85

86

```scala { .api }

87

// Factory methods

88

object Vectors {

89

def dense(firstValue: Double, otherValues: Double*): Vector

90

def dense(values: Array[Double]): Vector

91

def sparse(size: Int, indices: Array[Int], values: Array[Double]): Vector

92

def sparse(size: Int, elements: Seq[(Int, Double)]): Vector

93

def zeros(size: Int): Vector

94

def norm(vector: Vector, p: Double): Double

95

def sqdist(v1: Vector, v2: Vector): Double

96

}

97

98

// Vector trait

99

trait Vector {

100

def size: Int

101

def toArray: Array[Double]

102

def apply(i: Int): Double

103

def copy: Vector

104

def foreachActive(f: (Int, Double) => Unit): Unit

105

def numActives: Int

106

def numNonzeros: Int

107

def toSparse: SparseVector

108

def toDense: DenseVector

109

def compressed: Vector

110

def argmax: Int

111

}

112

```

113

114

[Vector Operations](./vectors.md)

115

116

### Matrix Operations

117

118

Comprehensive matrix functionality supporting both dense and sparse matrices with efficient storage formats and mathematical operations.

119

120

```scala { .api }

121

// Factory methods

122

object Matrices {

123

def dense(numRows: Int, numCols: Int, values: Array[Double]): Matrix

124

def sparse(numRows: Int, numCols: Int, colPtrs: Array[Int], rowIndices: Array[Int], values: Array[Double]): Matrix

125

def zeros(numRows: Int, numCols: Int): Matrix

126

def ones(numRows: Int, numCols: Int): Matrix

127

def eye(n: Int): Matrix

128

def speye(n: Int): Matrix

129

def diag(vector: Vector): Matrix

130

def horzcat(matrices: Array[Matrix]): Matrix

131

def vertcat(matrices: Array[Matrix]): Matrix

132

}

133

134

// Matrix trait

135

trait Matrix {

136

def numRows: Int

137

def numCols: Int

138

def apply(i: Int, j: Int): Double

139

def transpose: Matrix

140

def multiply(y: Matrix): DenseMatrix

141

def multiply(y: Vector): DenseVector

142

def toSparse: SparseMatrix

143

def toDense: DenseMatrix

144

def compressed: Matrix

145

}

146

```

147

148

[Matrix Operations](./matrices.md)

149

150

### Statistical Distributions

151

152

Statistical distribution implementations for probability computations and machine learning algorithms.

153

154

```scala { .api }

155

class MultivariateGaussian(mean: Vector, cov: Matrix) {

156

val mean: Vector

157

val cov: Matrix

158

def pdf(x: Vector): Double

159

def logpdf(x: Vector): Double

160

}

161

```

162

163

[Statistical Distributions](./distributions.md)

164

165

### Testing Utilities

166

167

Numerical testing utilities with tolerance-based comparisons for vectors, matrices, and doubles, essential for testing numerical algorithms.

168

169

```scala { .api }

170

object TestingUtils {

171

implicit class DoubleWithAlmostEquals(val x: Double) {

172

def ~=(r: CompareDoubleRightSide): Boolean

173

def ~==(r: CompareDoubleRightSide): Boolean

174

def absTol(eps: Double): CompareDoubleRightSide

175

def relTol(eps: Double): CompareDoubleRightSide

176

}

177

178

implicit class VectorWithAlmostEquals(val x: Vector) {

179

def ~=(r: CompareVectorRightSide): Boolean

180

def ~==(r: CompareVectorRightSide): Boolean

181

def absTol(eps: Double): CompareVectorRightSide

182

def relTol(eps: Double): CompareVectorRightSide

183

}

184

}

185

```

186

187

[Testing Utilities](./testing.md)

188

189

## Types

190

191

### Core Types

192

193

```scala { .api }

194

// Vector hierarchy

195

sealed trait Vector extends Serializable

196

class DenseVector(val values: Array[Double]) extends Vector

197

class SparseVector(override val size: Int, val indices: Array[Int], val values: Array[Double]) extends Vector

198

199

// Matrix hierarchy

200

sealed trait Matrix extends Serializable

201

class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double], override val isTransposed: Boolean) extends Matrix

202

class SparseMatrix(val numRows: Int, val numCols: Int, val colPtrs: Array[Int], val rowIndices: Array[Int], val values: Array[Double], override val isTransposed: Boolean) extends Matrix

203

```

204

205

### Testing Types

206

207

```scala { .api }

208

case class CompareDoubleRightSide(fun: (Double, Double, Double) => Boolean, y: Double, eps: Double, method: String)

209

case class CompareVectorRightSide(fun: (Vector, Vector, Double) => Boolean, y: Vector, eps: Double, method: String)

210

case class CompareMatrixRightSide(fun: (Matrix, Matrix, Double) => Boolean, y: Matrix, eps: Double, method: String)

211

```