Spark Project ML Local Library provides local (non-distributed) linear algebra utilities and basic machine learning components.
npx @tessl/cli install tessl/maven-org-apache-spark--spark-mllib-local_2-13@4.0.0Apache Spark's MLlib Local library provides essential linear algebra operations, vector and matrix manipulations, and basic statistical utilities designed for local (non-distributed) computation. This library serves as the mathematical foundation for Spark's distributed machine learning algorithms by offering optimized implementations of dense and sparse vector/matrix representations, multivariate Gaussian distributions, and utility functions for numerical computations.
org.apache.spark:spark-mllib-local_2.13:4.0.0import org.apache.spark.ml.linalg._
import org.apache.spark.ml.stat.distribution.MultivariateGaussianimport org.apache.spark.ml.linalg._
import org.apache.spark.ml.stat.distribution.MultivariateGaussian
// Create vectors
val denseVec = Vectors.dense(1.0, 2.0, 3.0, 4.0)
val sparseVec = Vectors.sparse(4, Array(0, 2), Array(1.0, 3.0))
// Vector operations
val dot = denseVec.dot(sparseVec)
val norm = Vectors.norm(denseVec, 2.0)
// Create matrices
val denseMatrix = DenseMatrix.zeros(3, 3)
val sparseMatrix = SparseMatrix.speye(3)
// Matrix operations
val result = denseMatrix.multiply(sparseMatrix)
val transposed = denseMatrix.transpose
// Statistical distribution
val mean = Vectors.dense(0.0, 0.0)
val cov = DenseMatrix.eye(2)
val gaussian = new MultivariateGaussian(mean, cov)
val density = gaussian.pdf(Vectors.dense(1.0, 1.0))The library is organized around several key components:
Built on top of Breeze (Scala's scientific computing library) and optimized BLAS implementations, the library enables efficient local mathematical operations that seamlessly integrate into larger distributed machine learning pipelines.
Core vector and matrix operations including creation, manipulation, arithmetic operations, and format conversions. Provides both dense and sparse implementations with automatic optimization for storage efficiency and computational performance.
// Vector creation and operations
def dense(values: Array[Double]): Vector
def sparse(size: Int, indices: Array[Int], values: Array[Double]): Vector
def zeros(size: Int): Vector
def norm(vector: Vector, p: Double): Double
def sqdist(v1: Vector, v2: Vector): Double
// Matrix creation and operations
def dense(numRows: Int, numCols: Int, values: Array[Double]): Matrix
def sparse(numRows: Int, numCols: Int, colPtrs: Array[Int], rowIndices: Array[Int], values: Array[Double]): Matrix
def zeros(numRows: Int, numCols: Int): Matrix
def eye(n: Int): MatrixMultivariate statistical distributions with robust numerical implementations that handle edge cases like singular covariance matrices. Provides probability density function calculations with numerical stability features.
class MultivariateGaussian(mean: Vector, cov: Matrix) {
def pdf(x: Vector): Double
def logpdf(x: Vector): Double
}Basic Linear Algebra Subprograms (BLAS) providing optimized implementations of vector and matrix operations. Includes Level 1 (vector-vector), Level 2 (matrix-vector), and Level 3 (matrix-matrix) operations.
object BLAS {
// Level 1: Vector operations
def axpy(a: Double, x: Vector, y: Vector): Unit
def dot(x: Vector, y: Vector): Double
def copy(x: Vector, y: Vector): Unit
def scal(a: Double, x: Vector): Unit
// Level 2: Matrix-vector operations
def gemv(alpha: Double, A: Matrix, x: Vector, beta: Double, y: DenseVector): Unit
def syr(alpha: Double, x: Vector, A: DenseMatrix): Unit
// Level 3: Matrix-matrix operations
def gemm(alpha: Double, A: Matrix, B: DenseMatrix, beta: Double, C: DenseMatrix): Unit
}sealed trait Vector extends Serializable {
def size: Int
def toArray: Array[Double]
def apply(i: Int): Double
def copy: Vector
def foreachActive(f: (Int, Double) => Unit): Unit
def numActives: Int
def numNonzeros: Int
def toSparse: SparseVector
def toDense: DenseVector
def compressed: Vector
def argmax: Int
def dot(v: Vector): Double
}
class DenseVector(val values: Array[Double]) extends Vector
class SparseVector(
override val size: Int,
val indices: Array[Int],
val values: Array[Double]
) extends Vectorsealed trait Matrix extends Serializable {
def numRows: Int
def numCols: Int
val isTransposed: Boolean
def toArray: Array[Double]
def colIter: Iterator[Vector]
def rowIter: Iterator[Vector]
def apply(i: Int, j: Int): Double
def copy: Matrix
def transpose: Matrix
def multiply(y: DenseMatrix): DenseMatrix
def multiply(y: Vector): DenseVector
def foreachActive(f: (Int, Int, Double) => Unit): Unit
def numNonzeros: Int
def numActives: Int
def toSparse: SparseMatrix
def toDense: DenseMatrix
def compressed: Matrix
}
class DenseMatrix(
val numRows: Int,
val numCols: Int,
val values: Array[Double],
override val isTransposed: Boolean = false
) extends Matrix
class SparseMatrix(
val numRows: Int,
val numCols: Int,
val colPtrs: Array[Int],
val rowIndices: Array[Int],
val values: Array[Double],
override val isTransposed: Boolean = false
) extends Matrix