Spark ML Local Library providing linear algebra and statistical utilities for local machine learning operations without requiring a distributed Spark cluster
npx @tessl/cli install tessl/maven-org-apache-spark--spark-mllib-local-2-11@2.2.00
# Spark MLlib Local Library
1
2
Spark MLlib Local is a lightweight, dependency-minimal package that provides fundamental linear algebra operations and basic statistical utilities for machine learning tasks that can be executed locally without requiring a distributed Spark cluster framework.
3
4
## Package Information
5
6
- **Package Name**: spark-mllib-local_2.11
7
- **Package Type**: maven
8
- **Language**: Scala 2.11
9
- **Group ID**: org.apache.spark
10
- **Artifact ID**: spark-mllib-local_2.11
11
- **Installation**:
12
```xml
13
<dependency>
14
<groupId>org.apache.spark</groupId>
15
<artifactId>spark-mllib-local_2.11</artifactId>
16
<version>2.2.3</version>
17
</dependency>
18
```
19
20
For SBT:
21
```scala
22
libraryDependencies += "org.apache.spark" %% "spark-mllib-local" % "2.2.3"
23
```
24
25
## Core Imports
26
27
```scala
28
import org.apache.spark.ml.linalg.{Vector, DenseVector, SparseVector, Vectors}
29
import org.apache.spark.ml.linalg.{Matrix, DenseMatrix, SparseMatrix, Matrices}
30
import org.apache.spark.ml.stat.distribution.MultivariateGaussian
31
```
32
33
## Basic Usage
34
35
```scala
36
import org.apache.spark.ml.linalg.{Vectors, Matrices}
37
import org.apache.spark.ml.stat.distribution.MultivariateGaussian
38
39
// Create vectors
40
val dense = Vectors.dense(1.0, 2.0, 3.0)
41
val sparse = Vectors.sparse(5, Array(0, 2, 4), Array(1.0, 3.0, 5.0))
42
43
// Create matrices
44
val denseMatrix = Matrices.dense(2, 3, Array(1.0, 2.0, 3.0, 4.0, 5.0, 6.0))
45
val sparseMatrix = Matrices.sparse(3, 3, Array(0, 2, 3, 6), Array(0, 2, 1, 0, 1, 2), Array(1.0, 2.0, 3.0, 4.0, 5.0, 6.0))
46
47
// Vector operations
48
val vectorCopy = dense.copy
49
val dotProduct = sparse.toArray.zip(dense.toArray).map{ case (a, b) => a * b }.sum
50
val l2Norm = Vectors.norm(dense, 2.0)
51
val distance = Vectors.sqdist(dense, sparse)
52
53
// Matrix operations
54
val matrixProduct = denseMatrix.multiply(sparseMatrix.transpose)
55
val transposed = denseMatrix.transpose
56
val compressed = sparseMatrix.compressed
57
58
// Statistical distribution
59
val mean = Vectors.dense(0.0, 0.0)
60
val cov = Matrices.eye(2)
61
val gaussian = new MultivariateGaussian(mean, cov)
62
val density = gaussian.pdf(Vectors.dense(1.0, 1.0))
63
```
64
65
## Architecture
66
67
Spark MLlib Local is organized around several key components:
68
69
- **Vector System**: Dense and sparse vector implementations with unified operations
70
- **Matrix System**: Dense and sparse matrix implementations supporting various storage formats
71
- **Statistical Distributions**: Probability distributions for machine learning algorithms
72
- **Testing Utilities**: Tolerance-based comparison utilities for numerical testing
73
74
The library emphasizes performance through:
75
- **Optimized Storage**: Automatic selection between dense/sparse representations based on data characteristics
76
- **Lazy Evaluation**: Operations like compression are performed on-demand to avoid unnecessary computation
77
- **Memory Efficiency**: Compact storage formats for both dense and sparse data structures
78
- **Native Integration**: Internal integration with optimized mathematical libraries
79
80
## Capabilities
81
82
### Vector Operations
83
84
Core vector functionality providing dense and sparse representations with unified operations. Essential for feature representations and mathematical computations.
85
86
```scala { .api }
87
// Factory methods
88
object Vectors {
89
def dense(firstValue: Double, otherValues: Double*): Vector
90
def dense(values: Array[Double]): Vector
91
def sparse(size: Int, indices: Array[Int], values: Array[Double]): Vector
92
def sparse(size: Int, elements: Seq[(Int, Double)]): Vector
93
def zeros(size: Int): Vector
94
def norm(vector: Vector, p: Double): Double
95
def sqdist(v1: Vector, v2: Vector): Double
96
}
97
98
// Vector trait
99
trait Vector {
100
def size: Int
101
def toArray: Array[Double]
102
def apply(i: Int): Double
103
def copy: Vector
104
def foreachActive(f: (Int, Double) => Unit): Unit
105
def numActives: Int
106
def numNonzeros: Int
107
def toSparse: SparseVector
108
def toDense: DenseVector
109
def compressed: Vector
110
def argmax: Int
111
}
112
```
113
114
[Vector Operations](./vectors.md)
115
116
### Matrix Operations
117
118
Comprehensive matrix functionality supporting both dense and sparse matrices with efficient storage formats and mathematical operations.
119
120
```scala { .api }
121
// Factory methods
122
object Matrices {
123
def dense(numRows: Int, numCols: Int, values: Array[Double]): Matrix
124
def sparse(numRows: Int, numCols: Int, colPtrs: Array[Int], rowIndices: Array[Int], values: Array[Double]): Matrix
125
def zeros(numRows: Int, numCols: Int): Matrix
126
def ones(numRows: Int, numCols: Int): Matrix
127
def eye(n: Int): Matrix
128
def speye(n: Int): Matrix
129
def diag(vector: Vector): Matrix
130
def horzcat(matrices: Array[Matrix]): Matrix
131
def vertcat(matrices: Array[Matrix]): Matrix
132
}
133
134
// Matrix trait
135
trait Matrix {
136
def numRows: Int
137
def numCols: Int
138
def apply(i: Int, j: Int): Double
139
def transpose: Matrix
140
def multiply(y: Matrix): DenseMatrix
141
def multiply(y: Vector): DenseVector
142
def toSparse: SparseMatrix
143
def toDense: DenseMatrix
144
def compressed: Matrix
145
}
146
```
147
148
[Matrix Operations](./matrices.md)
149
150
### Statistical Distributions
151
152
Statistical distribution implementations for probability computations and machine learning algorithms.
153
154
```scala { .api }
155
class MultivariateGaussian(mean: Vector, cov: Matrix) {
156
val mean: Vector
157
val cov: Matrix
158
def pdf(x: Vector): Double
159
def logpdf(x: Vector): Double
160
}
161
```
162
163
[Statistical Distributions](./distributions.md)
164
165
### Testing Utilities
166
167
Numerical testing utilities with tolerance-based comparisons for vectors, matrices, and doubles, essential for testing numerical algorithms.
168
169
```scala { .api }
170
object TestingUtils {
171
implicit class DoubleWithAlmostEquals(val x: Double) {
172
def ~=(r: CompareDoubleRightSide): Boolean
173
def ~==(r: CompareDoubleRightSide): Boolean
174
def absTol(eps: Double): CompareDoubleRightSide
175
def relTol(eps: Double): CompareDoubleRightSide
176
}
177
178
implicit class VectorWithAlmostEquals(val x: Vector) {
179
def ~=(r: CompareVectorRightSide): Boolean
180
def ~==(r: CompareVectorRightSide): Boolean
181
def absTol(eps: Double): CompareVectorRightSide
182
def relTol(eps: Double): CompareVectorRightSide
183
}
184
}
185
```
186
187
[Testing Utilities](./testing.md)
188
189
## Types
190
191
### Core Types
192
193
```scala { .api }
194
// Vector hierarchy
195
sealed trait Vector extends Serializable
196
class DenseVector(val values: Array[Double]) extends Vector
197
class SparseVector(override val size: Int, val indices: Array[Int], val values: Array[Double]) extends Vector
198
199
// Matrix hierarchy
200
sealed trait Matrix extends Serializable
201
class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double], override val isTransposed: Boolean) extends Matrix
202
class SparseMatrix(val numRows: Int, val numCols: Int, val colPtrs: Array[Int], val rowIndices: Array[Int], val values: Array[Double], override val isTransposed: Boolean) extends Matrix
203
```
204
205
### Testing Types
206
207
```scala { .api }
208
case class CompareDoubleRightSide(fun: (Double, Double, Double) => Boolean, y: Double, eps: Double, method: String)
209
case class CompareVectorRightSide(fun: (Vector, Vector, Double) => Boolean, y: Vector, eps: Double, method: String)
210
case class CompareMatrixRightSide(fun: (Matrix, Matrix, Double) => Boolean, y: Matrix, eps: Double, method: String)
211
```