Apache Spark ML Local Library provides local implementations of linear algebra data structures and utilities for machine learning without requiring a Spark cluster.
npx @tessl/cli install tessl/maven-org-apache-spark--spark-mllib-local@2.4.00
# Apache Spark MLlib Local Library
1
2
Apache Spark MLlib Local Library provides local implementations of linear algebra data structures and utilities for machine learning without requiring a Spark cluster. It includes core linear algebra components such as Vector and Matrix implementations, BLAS operations, and statistical distributions like MultivariateGaussian.
3
4
## Package Information
5
6
- **Package Name**: spark-mllib-local_2.12
7
- **Package Type**: maven
8
- **Language**: Scala
9
- **Installation**: `org.apache.spark:spark-mllib-local_2.12:2.4.8`
10
11
## Core Imports
12
13
```scala
14
import org.apache.spark.ml.linalg._
15
```
16
17
For statistical distributions:
18
19
```scala
20
import org.apache.spark.ml.stat.distribution.MultivariateGaussian
21
```
22
23
## Basic Usage
24
25
```scala
26
import org.apache.spark.ml.linalg._
27
28
// Create vectors
29
val denseVec = Vectors.dense(1.0, 2.0, 3.0)
30
val sparseVec = Vectors.sparse(4, Array(0, 2), Array(1.0, 3.0))
31
32
// Create matrices
33
val denseMatrix = Matrices.dense(2, 2, Array(1.0, 3.0, 2.0, 4.0))
34
val sparseMatrix = Matrices.sparse(2, 2, Array(0, 1, 2), Array(0, 1), Array(1.0, 4.0))
35
36
// Vector operations
37
println(s"Dense vector size: ${denseVec.size}")
38
println(s"Sparse vector nonzeros: ${sparseVec.numNonzeros}")
39
40
// Matrix operations
41
val result = denseMatrix.multiply(denseVec)
42
println(s"Matrix-vector product: ${result.toArray.mkString(",")}")
43
44
// BLAS operations
45
val dotProduct = BLAS.dot(denseVec, denseVec)
46
val norm = Vectors.norm(denseVec, 2.0)
47
```
48
49
## Architecture
50
51
The library is built around a few core abstractions:
52
53
- **Vector**: Sealed trait with DenseVector/SparseVector implementations for efficient storage of 1D data
54
- **Matrix**: Sealed trait with DenseMatrix/SparseMatrix implementations supporting both column-major and row-major layouts
55
- **BLAS**: Object providing optimized linear algebra routines compatible with netlib-java
56
- **Factory Objects**: Vectors, Matrices objects providing convenient creation methods
57
- **Statistical Distributions**: MultivariateGaussian for probabilistic modeling
58
59
The design emphasizes performance through integration with Breeze linear algebra library and supports seamless conversion between dense and sparse formats based on sparsity patterns.
60
61
## Capabilities
62
63
### Vector Operations
64
65
Core vector data structures and operations including dense and sparse vectors, with comprehensive linear algebra functionality, norms, distances, and format conversions.
66
67
```scala { .api }
68
sealed trait Vector {
69
def size: Int
70
def toArray: Array[Double]
71
def apply(i: Int): Double
72
def copy: Vector
73
def numNonzeros: Int
74
def toSparse: SparseVector
75
def toDense: DenseVector
76
}
77
78
object Vectors {
79
def dense(values: Array[Double]): Vector
80
def sparse(size: Int, indices: Array[Int], values: Array[Double]): Vector
81
def zeros(size: Int): Vector
82
def norm(vector: Vector, p: Double): Double
83
def sqdist(v1: Vector, v2: Vector): Double
84
}
85
```
86
87
[Vector Operations](./vectors.md)
88
89
### Matrix Operations
90
91
Matrix data structures and operations including dense and sparse matrices with support for various layouts, linear algebra operations, and efficient storage format conversions.
92
93
```scala { .api }
94
sealed trait Matrix {
95
def numRows: Int
96
def numCols: Int
97
def apply(i: Int, j: Int): Double
98
def transpose: Matrix
99
def multiply(y: DenseMatrix): DenseMatrix
100
def multiply(y: Vector): DenseVector
101
def toSparse: SparseMatrix
102
def toDense: DenseMatrix
103
}
104
105
object Matrices {
106
def dense(numRows: Int, numCols: Int, values: Array[Double]): Matrix
107
def sparse(numRows: Int, numCols: Int, colPtrs: Array[Int], rowIndices: Array[Int], values: Array[Double]): Matrix
108
def zeros(numRows: Int, numCols: Int): Matrix
109
def eye(n: Int): Matrix
110
}
111
```
112
113
[Matrix Operations](./matrices.md)
114
115
### BLAS Operations
116
117
High-performance Basic Linear Algebra Subprograms (BLAS) routines for vectors and matrices, providing optimized implementations of common linear algebra operations.
118
119
```scala { .api }
120
object BLAS {
121
def axpy(a: Double, x: Vector, y: Vector): Unit
122
def dot(x: Vector, y: Vector): Double
123
def copy(x: Vector, y: Vector): Unit
124
def scal(a: Double, x: Vector): Unit
125
def gemv(alpha: Double, A: Matrix, x: Vector, beta: Double, y: DenseVector): Unit
126
def gemm(alpha: Double, A: Matrix, B: DenseMatrix, beta: Double, C: DenseMatrix): Unit
127
}
128
```
129
130
[BLAS Operations](./blas.md)
131
132
### Statistical Distributions
133
134
Multivariate statistical distributions for probabilistic modeling and machine learning applications, with support for probability density calculations.
135
136
```scala { .api }
137
class MultivariateGaussian(mean: Vector, cov: Matrix) {
138
def pdf(x: Vector): Double
139
def logpdf(x: Vector): Double
140
}
141
```
142
143
[Statistical Distributions](./distributions.md)