Apache Spark ML Local Library provides local implementations of linear algebra data structures and utilities for machine learning without requiring a Spark cluster.
npx @tessl/cli install tessl/maven-org-apache-spark--spark-mllib-local@2.4.0Apache Spark MLlib Local Library provides local implementations of linear algebra data structures and utilities for machine learning without requiring a Spark cluster. It includes core linear algebra components such as Vector and Matrix implementations, BLAS operations, and statistical distributions like MultivariateGaussian.
org.apache.spark:spark-mllib-local_2.12:2.4.8import org.apache.spark.ml.linalg._For statistical distributions:
import org.apache.spark.ml.stat.distribution.MultivariateGaussianimport org.apache.spark.ml.linalg._
// Create vectors
val denseVec = Vectors.dense(1.0, 2.0, 3.0)
val sparseVec = Vectors.sparse(4, Array(0, 2), Array(1.0, 3.0))
// Create matrices
val denseMatrix = Matrices.dense(2, 2, Array(1.0, 3.0, 2.0, 4.0))
val sparseMatrix = Matrices.sparse(2, 2, Array(0, 1, 2), Array(0, 1), Array(1.0, 4.0))
// Vector operations
println(s"Dense vector size: ${denseVec.size}")
println(s"Sparse vector nonzeros: ${sparseVec.numNonzeros}")
// Matrix operations
val result = denseMatrix.multiply(denseVec)
println(s"Matrix-vector product: ${result.toArray.mkString(",")}")
// BLAS operations
val dotProduct = BLAS.dot(denseVec, denseVec)
val norm = Vectors.norm(denseVec, 2.0)The library is built around a few core abstractions:
The design emphasizes performance through integration with Breeze linear algebra library and supports seamless conversion between dense and sparse formats based on sparsity patterns.
Core vector data structures and operations including dense and sparse vectors, with comprehensive linear algebra functionality, norms, distances, and format conversions.
sealed trait Vector {
def size: Int
def toArray: Array[Double]
def apply(i: Int): Double
def copy: Vector
def numNonzeros: Int
def toSparse: SparseVector
def toDense: DenseVector
}
object Vectors {
def dense(values: Array[Double]): Vector
def sparse(size: Int, indices: Array[Int], values: Array[Double]): Vector
def zeros(size: Int): Vector
def norm(vector: Vector, p: Double): Double
def sqdist(v1: Vector, v2: Vector): Double
}Matrix data structures and operations including dense and sparse matrices with support for various layouts, linear algebra operations, and efficient storage format conversions.
sealed trait Matrix {
def numRows: Int
def numCols: Int
def apply(i: Int, j: Int): Double
def transpose: Matrix
def multiply(y: DenseMatrix): DenseMatrix
def multiply(y: Vector): DenseVector
def toSparse: SparseMatrix
def toDense: DenseMatrix
}
object Matrices {
def dense(numRows: Int, numCols: Int, values: Array[Double]): Matrix
def sparse(numRows: Int, numCols: Int, colPtrs: Array[Int], rowIndices: Array[Int], values: Array[Double]): Matrix
def zeros(numRows: Int, numCols: Int): Matrix
def eye(n: Int): Matrix
}High-performance Basic Linear Algebra Subprograms (BLAS) routines for vectors and matrices, providing optimized implementations of common linear algebra operations.
object BLAS {
def axpy(a: Double, x: Vector, y: Vector): Unit
def dot(x: Vector, y: Vector): Double
def copy(x: Vector, y: Vector): Unit
def scal(a: Double, x: Vector): Unit
def gemv(alpha: Double, A: Matrix, x: Vector, beta: Double, y: DenseVector): Unit
def gemm(alpha: Double, A: Matrix, B: DenseMatrix, beta: Double, C: DenseMatrix): Unit
}Multivariate statistical distributions for probabilistic modeling and machine learning applications, with support for probability density calculations.
class MultivariateGaussian(mean: Vector, cov: Matrix) {
def pdf(x: Vector): Double
def logpdf(x: Vector): Double
}