or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

blas.mddistributions.mdindex.mdmatrices.mdvectors.md

index.mddocs/

0

# Apache Spark MLlib Local Library

1

2

Apache Spark MLlib Local Library provides local implementations of linear algebra data structures and utilities for machine learning without requiring a Spark cluster. It includes core linear algebra components such as Vector and Matrix implementations, BLAS operations, and statistical distributions like MultivariateGaussian.

3

4

## Package Information

5

6

- **Package Name**: spark-mllib-local_2.12

7

- **Package Type**: maven

8

- **Language**: Scala

9

- **Installation**: `org.apache.spark:spark-mllib-local_2.12:2.4.8`

10

11

## Core Imports

12

13

```scala

14

import org.apache.spark.ml.linalg._

15

```

16

17

For statistical distributions:

18

19

```scala

20

import org.apache.spark.ml.stat.distribution.MultivariateGaussian

21

```

22

23

## Basic Usage

24

25

```scala

26

import org.apache.spark.ml.linalg._

27

28

// Create vectors

29

val denseVec = Vectors.dense(1.0, 2.0, 3.0)

30

val sparseVec = Vectors.sparse(4, Array(0, 2), Array(1.0, 3.0))

31

32

// Create matrices

33

val denseMatrix = Matrices.dense(2, 2, Array(1.0, 3.0, 2.0, 4.0))

34

val sparseMatrix = Matrices.sparse(2, 2, Array(0, 1, 2), Array(0, 1), Array(1.0, 4.0))

35

36

// Vector operations

37

println(s"Dense vector size: ${denseVec.size}")

38

println(s"Sparse vector nonzeros: ${sparseVec.numNonzeros}")

39

40

// Matrix operations

41

val result = denseMatrix.multiply(denseVec)

42

println(s"Matrix-vector product: ${result.toArray.mkString(",")}")

43

44

// BLAS operations

45

val dotProduct = BLAS.dot(denseVec, denseVec)

46

val norm = Vectors.norm(denseVec, 2.0)

47

```

48

49

## Architecture

50

51

The library is built around a few core abstractions:

52

53

- **Vector**: Sealed trait with DenseVector/SparseVector implementations for efficient storage of 1D data

54

- **Matrix**: Sealed trait with DenseMatrix/SparseMatrix implementations supporting both column-major and row-major layouts

55

- **BLAS**: Object providing optimized linear algebra routines compatible with netlib-java

56

- **Factory Objects**: Vectors, Matrices objects providing convenient creation methods

57

- **Statistical Distributions**: MultivariateGaussian for probabilistic modeling

58

59

The design emphasizes performance through integration with Breeze linear algebra library and supports seamless conversion between dense and sparse formats based on sparsity patterns.

60

61

## Capabilities

62

63

### Vector Operations

64

65

Core vector data structures and operations including dense and sparse vectors, with comprehensive linear algebra functionality, norms, distances, and format conversions.

66

67

```scala { .api }

68

sealed trait Vector {

69

def size: Int

70

def toArray: Array[Double]

71

def apply(i: Int): Double

72

def copy: Vector

73

def numNonzeros: Int

74

def toSparse: SparseVector

75

def toDense: DenseVector

76

}

77

78

object Vectors {

79

def dense(values: Array[Double]): Vector

80

def sparse(size: Int, indices: Array[Int], values: Array[Double]): Vector

81

def zeros(size: Int): Vector

82

def norm(vector: Vector, p: Double): Double

83

def sqdist(v1: Vector, v2: Vector): Double

84

}

85

```

86

87

[Vector Operations](./vectors.md)

88

89

### Matrix Operations

90

91

Matrix data structures and operations including dense and sparse matrices with support for various layouts, linear algebra operations, and efficient storage format conversions.

92

93

```scala { .api }

94

sealed trait Matrix {

95

def numRows: Int

96

def numCols: Int

97

def apply(i: Int, j: Int): Double

98

def transpose: Matrix

99

def multiply(y: DenseMatrix): DenseMatrix

100

def multiply(y: Vector): DenseVector

101

def toSparse: SparseMatrix

102

def toDense: DenseMatrix

103

}

104

105

object Matrices {

106

def dense(numRows: Int, numCols: Int, values: Array[Double]): Matrix

107

def sparse(numRows: Int, numCols: Int, colPtrs: Array[Int], rowIndices: Array[Int], values: Array[Double]): Matrix

108

def zeros(numRows: Int, numCols: Int): Matrix

109

def eye(n: Int): Matrix

110

}

111

```

112

113

[Matrix Operations](./matrices.md)

114

115

### BLAS Operations

116

117

High-performance Basic Linear Algebra Subprograms (BLAS) routines for vectors and matrices, providing optimized implementations of common linear algebra operations.

118

119

```scala { .api }

120

object BLAS {

121

def axpy(a: Double, x: Vector, y: Vector): Unit

122

def dot(x: Vector, y: Vector): Double

123

def copy(x: Vector, y: Vector): Unit

124

def scal(a: Double, x: Vector): Unit

125

def gemv(alpha: Double, A: Matrix, x: Vector, beta: Double, y: DenseVector): Unit

126

def gemm(alpha: Double, A: Matrix, B: DenseMatrix, beta: Double, C: DenseMatrix): Unit

127

}

128

```

129

130

[BLAS Operations](./blas.md)

131

132

### Statistical Distributions

133

134

Multivariate statistical distributions for probabilistic modeling and machine learning applications, with support for probability density calculations.

135

136

```scala { .api }

137

class MultivariateGaussian(mean: Vector, cov: Matrix) {

138

def pdf(x: Vector): Double

139

def logpdf(x: Vector): Double

140

}

141

```

142

143

[Statistical Distributions](./distributions.md)