Spark Project ML Local Library provides local linear algebra operations for machine learning without requiring a distributed Spark context
npx @tessl/cli install tessl/maven-org-apache-spark--spark-mllib-local-2-13@3.5.00
# Spark MLlib Local
1
2
Spark MLlib Local provides local linear algebra operations for machine learning without requiring a distributed Spark context. It includes vector and matrix data structures (dense and sparse), statistical distributions, and utility functions that can operate independently of a distributed Spark cluster.
3
4
## Package Information
5
6
- **Package Name**: spark-mllib-local_2.13
7
- **Package Type**: maven
8
- **Language**: Scala
9
- **Installation**: Add to `pom.xml`: `<groupId>org.apache.spark</groupId><artifactId>spark-mllib-local_2.13</artifactId><version>3.5.6</version>`
10
- **Gradle**: `implementation 'org.apache.spark:spark-mllib-local_2.13:3.5.6'`
11
12
## Core Imports
13
14
```scala
15
import org.apache.spark.ml.linalg.{Vector, DenseVector, SparseVector, Vectors}
16
import org.apache.spark.ml.linalg.{Matrix, DenseMatrix, SparseMatrix, Matrices}
17
import org.apache.spark.ml.stat.distribution.MultivariateGaussian
18
```
19
20
## Basic Usage
21
22
```scala
23
import org.apache.spark.ml.linalg.{Vectors, Matrices}
24
25
// Create vectors
26
val denseVec = Vectors.dense(1.0, 2.0, 3.0)
27
val sparseVec = Vectors.sparse(5, Array(0, 2, 4), Array(1.0, 3.0, 5.0))
28
29
// Create matrices
30
val denseMatrix = Matrices.dense(2, 3, Array(1.0, 2.0, 3.0, 4.0, 5.0, 6.0))
31
val sparseMatrix = Matrices.sparse(3, 3, Array(0, 1, 2, 3), Array(0, 1, 2), Array(1.0, 2.0, 3.0))
32
33
// Basic operations
34
val dotProduct = denseVec.dot(sparseVec)
35
val matVecProduct = denseMatrix.multiply(denseVec)
36
val norm = Vectors.norm(denseVec, 2.0)
37
```
38
39
## Architecture
40
41
Spark MLlib Local is built around several key components:
42
43
- **Vector API**: Unified interface for dense and sparse vector operations with automatic format optimization
44
- **Matrix API**: Comprehensive matrix operations supporting both dense and sparse representations
45
- **Statistical Distributions**: Multivariate probability distributions for machine learning algorithms
46
- **Type Safety**: Strong typing with sealed traits preventing invalid operations
47
48
## Capabilities
49
50
### Vector Operations
51
52
Core vector functionality supporting both dense and sparse representations with automatic optimization and conversion capabilities.
53
54
```scala { .api }
55
// Vector trait and factory methods
56
trait Vector extends Serializable {
57
def size: Int
58
def apply(i: Int): Double
59
def toArray: Array[Double]
60
def dot(v: Vector): Double
61
}
62
63
object Vectors {
64
def dense(values: Array[Double]): Vector
65
def sparse(size: Int, indices: Array[Int], values: Array[Double]): Vector
66
def zeros(size: Int): Vector
67
def norm(vector: Vector, p: Double): Double
68
}
69
```
70
71
[Vector Operations](./vectors.md)
72
73
### Matrix Operations
74
75
Matrix operations supporting dense and sparse formats, with efficient multiplication, transposition, and format conversion.
76
77
```scala { .api }
78
// Matrix trait and factory methods
79
trait Matrix extends Serializable {
80
def numRows: Int
81
def numCols: Int
82
def apply(i: Int, j: Int): Double
83
def multiply(y: Vector): DenseVector
84
def transpose: Matrix
85
}
86
87
object Matrices {
88
def dense(numRows: Int, numCols: Int, values: Array[Double]): Matrix
89
def sparse(numRows: Int, numCols: Int, colPtrs: Array[Int], rowIndices: Array[Int], values: Array[Double]): Matrix
90
def zeros(numRows: Int, numCols: Int): Matrix
91
}
92
```
93
94
[Matrix Operations](./matrices.md)
95
96
97
### Statistical Distributions
98
99
Multivariate statistical distributions for machine learning applications with support for probability density functions.
100
101
```scala { .api }
102
// Multivariate Gaussian distribution
103
class MultivariateGaussian(mean: Vector, cov: Matrix) extends Serializable {
104
def pdf(x: Vector): Double
105
def logpdf(x: Vector): Double
106
}
107
```
108
109
[Statistical Distributions](./distributions.md)
110
111
## Types
112
113
```scala { .api }
114
// Core vector types
115
trait Vector extends Serializable {
116
def size: Int
117
def toArray: Array[Double]
118
def apply(i: Int): Double
119
def copy: Vector
120
def foreachActive(f: (Int, Double) => Unit): Unit
121
def numActives: Int
122
def numNonzeros: Int
123
def toSparse: SparseVector
124
def toDense: DenseVector
125
def compressed: Vector
126
def argmax: Int
127
def dot(v: Vector): Double
128
}
129
130
class DenseVector(val values: Array[Double]) extends Vector
131
class SparseVector(override val size: Int, val indices: Array[Int], val values: Array[Double]) extends Vector
132
133
// Core matrix types
134
trait Matrix extends Serializable {
135
def numRows: Int
136
def numCols: Int
137
def apply(i: Int, j: Int): Double
138
def copy: Matrix
139
def transpose: Matrix
140
def multiply(y: DenseMatrix): DenseMatrix
141
def multiply(y: Vector): DenseVector
142
def foreachActive(f: (Int, Int, Double) => Unit): Unit
143
def numNonzeros: Int
144
def numActives: Int
145
def toSparse: SparseMatrix
146
def toDense: DenseMatrix
147
}
148
149
class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double], override val isTransposed: Boolean) extends Matrix
150
class SparseMatrix(val numRows: Int, val numCols: Int, val colPtrs: Array[Int], val rowIndices: Array[Int], val values: Array[Double], override val isTransposed: Boolean) extends Matrix
151
```