Local linear algebra operations and utilities for Apache Spark's MLlib machine learning library
npx @tessl/cli install tessl/maven-org-apache-spark--spark-mllib-local_2-12@3.5.00
# Spark MLlib Local
1
2
Spark MLlib Local provides local linear algebra operations and utilities for Apache Spark's machine learning library. This library implements core data structures including Vector and Matrix types, along with optimized BLAS (Basic Linear Algebra Subprograms) operations for numerical computations in distributed machine learning applications.
3
4
## Package Information
5
6
- **Package Name**: spark-mllib-local_2.12
7
- **Package Type**: maven
8
- **Language**: Scala
9
- **Installation**: Add to your Maven/SBT dependencies:
10
11
Maven:
12
```xml
13
<dependency>
14
<groupId>org.apache.spark</groupId>
15
<artifactId>spark-mllib-local_2.12</artifactId>
16
<version>3.5.6</version>
17
</dependency>
18
```
19
20
SBT:
21
```scala
22
libraryDependencies += "org.apache.spark" %% "spark-mllib-local" % "3.5.6"
23
```
24
25
## Core Imports
26
27
```scala
28
import org.apache.spark.ml.linalg._
29
import org.apache.spark.ml.stat.distribution.MultivariateGaussian
30
import org.apache.spark.ml.impl.Utils
31
```
32
33
## Basic Usage
34
35
```scala
36
import org.apache.spark.ml.linalg._
37
38
// Create vectors
39
val denseVec = Vectors.dense(1.0, 2.0, 3.0)
40
val sparseVec = Vectors.sparse(5, Array(0, 2, 4), Array(1.0, 3.0, 5.0))
41
42
// Create matrices
43
val denseMatrix = DenseMatrix.zeros(3, 3)
44
val sparseMatrix = SparseMatrix.speye(3) // Identity matrix
45
46
// Linear algebra operations
47
val dotProduct = BLAS.dot(denseVec, denseVec)
48
val denseCopy = denseVec.toDense
49
BLAS.axpy(2.0, denseVec, denseCopy) // denseCopy += 2.0 * denseVec
50
51
// Statistical distributions
52
val mean = Vectors.dense(0.0, 0.0)
53
val cov = DenseMatrix.eye(2)
54
val mvn = new MultivariateGaussian(mean, cov)
55
val density = mvn.pdf(Vectors.dense(1.0, 1.0))
56
```
57
58
## Architecture
59
60
Spark MLlib Local is built around several key components:
61
62
- **Vector Types**: Dense and sparse vector implementations with unified API
63
- **Matrix Types**: Dense and sparse matrix implementations with lazy transposition
64
- **BLAS Operations**: Optimized linear algebra routines with native acceleration
65
- **Statistical Distributions**: Multivariate probability distributions with numerical stability
66
- **Type Safety**: Sealed trait hierarchies ensuring compile-time correctness
67
- **Native Integration**: Automatic fallback from native BLAS to pure JVM implementations
68
69
## Capabilities
70
71
### Vector Operations
72
73
Core vector data structures and operations for numerical computing. Supports both dense and sparse representations with automatic optimization.
74
75
```scala { .api }
76
sealed trait Vector extends Serializable {
77
def size: Int
78
def toArray: Array[Double]
79
def apply(i: Int): Double
80
def copy: Vector
81
def dot(v: Vector): Double
82
def numNonzeros: Int
83
def compressed: Vector
84
}
85
86
object Vectors {
87
def dense(values: Array[Double]): Vector
88
def sparse(size: Int, indices: Array[Int], values: Array[Double]): Vector
89
def zeros(size: Int): Vector
90
def norm(vector: Vector, p: Double): Double
91
def sqdist(v1: Vector, v2: Vector): Double
92
}
93
```
94
95
[Vector Operations](./vectors.md)
96
97
### Matrix Operations
98
99
Matrix data structures and operations for linear algebra computations. Provides both dense and sparse implementations with format conversion capabilities.
100
101
```scala { .api }
102
sealed trait Matrix extends Serializable {
103
def numRows: Int
104
def numCols: Int
105
def apply(i: Int, j: Int): Double
106
def transpose: Matrix
107
def multiply(y: Vector): DenseVector
108
def multiply(y: DenseMatrix): DenseMatrix
109
def compressed: Matrix
110
}
111
112
object Matrices {
113
def dense(numRows: Int, numCols: Int, values: Array[Double]): Matrix
114
def sparse(numRows: Int, numCols: Int, colPtrs: Array[Int], rowIndices: Array[Int], values: Array[Double]): Matrix
115
def zeros(numRows: Int, numCols: Int): Matrix
116
def eye(n: Int): Matrix
117
}
118
```
119
120
[Matrix Operations](./matrices.md)
121
122
### Linear Algebra Operations
123
124
Optimized linear algebra operations accessible through Vector and Matrix APIs. Underlying BLAS implementation provides automatic native acceleration.
125
126
```scala { .api }
127
// Vector operations (accessing optimized BLAS internally)
128
val dotProduct = vector1.dot(vector2)
129
130
// Matrix operations (accessing optimized BLAS internally)
131
val result = matrix.multiply(vector)
132
val product = matrix.multiply(otherMatrix)
133
```
134
135
[Linear Algebra Operations](./blas.md)
136
137
### Statistical Distributions
138
139
Multivariate probability distributions with numerical stability and support for singular covariance matrices.
140
141
```scala { .api }
142
class MultivariateGaussian(mean: Vector, cov: Matrix) extends Serializable {
143
def pdf(x: Vector): Double
144
def logpdf(x: Vector): Double
145
}
146
```
147
148
[Statistical Distributions](./distributions.md)
149
150
### Utility Functions
151
152
Numerical utility functions and mathematical helpers for robust computations with numerical stability considerations.
153
154
```scala { .api }
155
object Utils {
156
lazy val EPSILON: Double
157
def unpackUpperTriangular(n: Int, triangularValues: Array[Double]): Array[Double]
158
def indexUpperTriangular(n: Int, i: Int, j: Int): Int
159
def log1pExp(x: Double): Double
160
def softmax(array: Array[Double]): Unit
161
}
162
```
163
164
[Utility Functions](./utils.md)
165
166
## Error Handling
167
168
The library uses standard Scala exception handling:
169
170
- `IllegalArgumentException`: Invalid parameters or dimension mismatches
171
- `UnsupportedOperationException`: Operations not supported for specific vector/matrix types
172
- `IndexOutOfBoundsException`: Invalid indices
173
- `NoSuchElementException`: Attempting to update zero elements in sparse matrices
174
175
Operations validate input dimensions and throw descriptive exceptions for invalid operations.