Spark Project ML Local Library provides local (non-distributed) linear algebra utilities and basic machine learning components.
npx @tessl/cli install tessl/maven-org-apache-spark--spark-mllib-local_2-13@4.0.00
# Spark MLlib Local
1
2
Apache Spark's MLlib Local library provides essential linear algebra operations, vector and matrix manipulations, and basic statistical utilities designed for local (non-distributed) computation. This library serves as the mathematical foundation for Spark's distributed machine learning algorithms by offering optimized implementations of dense and sparse vector/matrix representations, multivariate Gaussian distributions, and utility functions for numerical computations.
3
4
## Package Information
5
6
- **Package Name**: spark-mllib-local_2.13
7
- **Package Type**: Maven
8
- **Language**: Scala
9
- **Installation**: `org.apache.spark:spark-mllib-local_2.13:4.0.0`
10
11
## Core Imports
12
13
```scala
14
import org.apache.spark.ml.linalg._
15
import org.apache.spark.ml.stat.distribution.MultivariateGaussian
16
```
17
18
## Basic Usage
19
20
```scala
21
import org.apache.spark.ml.linalg._
22
import org.apache.spark.ml.stat.distribution.MultivariateGaussian
23
24
// Create vectors
25
val denseVec = Vectors.dense(1.0, 2.0, 3.0, 4.0)
26
val sparseVec = Vectors.sparse(4, Array(0, 2), Array(1.0, 3.0))
27
28
// Vector operations
29
val dot = denseVec.dot(sparseVec)
30
val norm = Vectors.norm(denseVec, 2.0)
31
32
// Create matrices
33
val denseMatrix = DenseMatrix.zeros(3, 3)
34
val sparseMatrix = SparseMatrix.speye(3)
35
36
// Matrix operations
37
val result = denseMatrix.multiply(sparseMatrix)
38
val transposed = denseMatrix.transpose
39
40
// Statistical distribution
41
val mean = Vectors.dense(0.0, 0.0)
42
val cov = DenseMatrix.eye(2)
43
val gaussian = new MultivariateGaussian(mean, cov)
44
val density = gaussian.pdf(Vectors.dense(1.0, 1.0))
45
```
46
47
## Architecture
48
49
The library is organized around several key components:
50
51
- **Vector Types**: Dense and sparse vector implementations with automatic format selection for optimal storage and computation efficiency
52
- **Matrix Types**: Dense and sparse matrices supporting both column-major and row-major layouts with automatic compression
53
- **Statistical Components**: Multivariate statistical distributions with numerical stability features for singular covariance matrices
54
55
Built on top of Breeze (Scala's scientific computing library) and optimized BLAS implementations, the library enables efficient local mathematical operations that seamlessly integrate into larger distributed machine learning pipelines.
56
57
## Capabilities
58
59
### Linear Algebra Operations
60
61
Core vector and matrix operations including creation, manipulation, arithmetic operations, and format conversions. Provides both dense and sparse implementations with automatic optimization for storage efficiency and computational performance.
62
63
```scala { .api }
64
// Vector creation and operations
65
def dense(values: Array[Double]): Vector
66
def sparse(size: Int, indices: Array[Int], values: Array[Double]): Vector
67
def zeros(size: Int): Vector
68
def norm(vector: Vector, p: Double): Double
69
def sqdist(v1: Vector, v2: Vector): Double
70
71
// Matrix creation and operations
72
def dense(numRows: Int, numCols: Int, values: Array[Double]): Matrix
73
def sparse(numRows: Int, numCols: Int, colPtrs: Array[Int], rowIndices: Array[Int], values: Array[Double]): Matrix
74
def zeros(numRows: Int, numCols: Int): Matrix
75
def eye(n: Int): Matrix
76
```
77
78
[Linear Algebra](./linear-algebra.md)
79
80
81
### Statistical Distributions
82
83
Multivariate statistical distributions with robust numerical implementations that handle edge cases like singular covariance matrices. Provides probability density function calculations with numerical stability features.
84
85
```scala { .api }
86
class MultivariateGaussian(mean: Vector, cov: Matrix) {
87
def pdf(x: Vector): Double
88
def logpdf(x: Vector): Double
89
}
90
```
91
92
[Statistical Distributions](./statistical-distributions.md)
93
94
### BLAS Operations
95
96
Basic Linear Algebra Subprograms (BLAS) providing optimized implementations of vector and matrix operations. Includes Level 1 (vector-vector), Level 2 (matrix-vector), and Level 3 (matrix-matrix) operations.
97
98
```scala { .api }
99
object BLAS {
100
// Level 1: Vector operations
101
def axpy(a: Double, x: Vector, y: Vector): Unit
102
def dot(x: Vector, y: Vector): Double
103
def copy(x: Vector, y: Vector): Unit
104
def scal(a: Double, x: Vector): Unit
105
106
// Level 2: Matrix-vector operations
107
def gemv(alpha: Double, A: Matrix, x: Vector, beta: Double, y: DenseVector): Unit
108
def syr(alpha: Double, x: Vector, A: DenseMatrix): Unit
109
110
// Level 3: Matrix-matrix operations
111
def gemm(alpha: Double, A: Matrix, B: DenseMatrix, beta: Double, C: DenseMatrix): Unit
112
}
113
```
114
115
## Types
116
117
### Core Vector Types
118
119
```scala { .api }
120
sealed trait Vector extends Serializable {
121
def size: Int
122
def toArray: Array[Double]
123
def apply(i: Int): Double
124
def copy: Vector
125
def foreachActive(f: (Int, Double) => Unit): Unit
126
def numActives: Int
127
def numNonzeros: Int
128
def toSparse: SparseVector
129
def toDense: DenseVector
130
def compressed: Vector
131
def argmax: Int
132
def dot(v: Vector): Double
133
}
134
135
class DenseVector(val values: Array[Double]) extends Vector
136
137
class SparseVector(
138
override val size: Int,
139
val indices: Array[Int],
140
val values: Array[Double]
141
) extends Vector
142
```
143
144
### Core Matrix Types
145
146
```scala { .api }
147
sealed trait Matrix extends Serializable {
148
def numRows: Int
149
def numCols: Int
150
val isTransposed: Boolean
151
def toArray: Array[Double]
152
def colIter: Iterator[Vector]
153
def rowIter: Iterator[Vector]
154
def apply(i: Int, j: Int): Double
155
def copy: Matrix
156
def transpose: Matrix
157
def multiply(y: DenseMatrix): DenseMatrix
158
def multiply(y: Vector): DenseVector
159
def foreachActive(f: (Int, Int, Double) => Unit): Unit
160
def numNonzeros: Int
161
def numActives: Int
162
def toSparse: SparseMatrix
163
def toDense: DenseMatrix
164
def compressed: Matrix
165
}
166
167
class DenseMatrix(
168
val numRows: Int,
169
val numCols: Int,
170
val values: Array[Double],
171
override val isTransposed: Boolean = false
172
) extends Matrix
173
174
class SparseMatrix(
175
val numRows: Int,
176
val numCols: Int,
177
val colPtrs: Array[Int],
178
val rowIndices: Array[Int],
179
val values: Array[Double],
180
override val isTransposed: Boolean = false
181
) extends Matrix
182
```