or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

index.mdlinear-algebra.mdstatistical-distributions.md
tile.json

tessl/maven-org-apache-spark--spark-mllib-local_2-13

Spark Project ML Local Library provides local (non-distributed) linear algebra utilities and basic machine learning components.

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
mavenpkg:maven/org.apache.spark/spark-mllib-local_2.13@4.0.x

To install, run

npx @tessl/cli install tessl/maven-org-apache-spark--spark-mllib-local_2-13@4.0.0

index.mddocs/

Spark MLlib Local

Apache Spark's MLlib Local library provides essential linear algebra operations, vector and matrix manipulations, and basic statistical utilities designed for local (non-distributed) computation. This library serves as the mathematical foundation for Spark's distributed machine learning algorithms by offering optimized implementations of dense and sparse vector/matrix representations, multivariate Gaussian distributions, and utility functions for numerical computations.

Package Information

  • Package Name: spark-mllib-local_2.13
  • Package Type: Maven
  • Language: Scala
  • Installation: org.apache.spark:spark-mllib-local_2.13:4.0.0

Core Imports

import org.apache.spark.ml.linalg._
import org.apache.spark.ml.stat.distribution.MultivariateGaussian

Basic Usage

import org.apache.spark.ml.linalg._
import org.apache.spark.ml.stat.distribution.MultivariateGaussian

// Create vectors
val denseVec = Vectors.dense(1.0, 2.0, 3.0, 4.0)
val sparseVec = Vectors.sparse(4, Array(0, 2), Array(1.0, 3.0))

// Vector operations
val dot = denseVec.dot(sparseVec)
val norm = Vectors.norm(denseVec, 2.0)

// Create matrices
val denseMatrix = DenseMatrix.zeros(3, 3)
val sparseMatrix = SparseMatrix.speye(3)

// Matrix operations
val result = denseMatrix.multiply(sparseMatrix)
val transposed = denseMatrix.transpose

// Statistical distribution
val mean = Vectors.dense(0.0, 0.0)
val cov = DenseMatrix.eye(2)
val gaussian = new MultivariateGaussian(mean, cov)
val density = gaussian.pdf(Vectors.dense(1.0, 1.0))

Architecture

The library is organized around several key components:

  • Vector Types: Dense and sparse vector implementations with automatic format selection for optimal storage and computation efficiency
  • Matrix Types: Dense and sparse matrices supporting both column-major and row-major layouts with automatic compression
  • Statistical Components: Multivariate statistical distributions with numerical stability features for singular covariance matrices

Built on top of Breeze (Scala's scientific computing library) and optimized BLAS implementations, the library enables efficient local mathematical operations that seamlessly integrate into larger distributed machine learning pipelines.

Capabilities

Linear Algebra Operations

Core vector and matrix operations including creation, manipulation, arithmetic operations, and format conversions. Provides both dense and sparse implementations with automatic optimization for storage efficiency and computational performance.

// Vector creation and operations
def dense(values: Array[Double]): Vector
def sparse(size: Int, indices: Array[Int], values: Array[Double]): Vector
def zeros(size: Int): Vector
def norm(vector: Vector, p: Double): Double
def sqdist(v1: Vector, v2: Vector): Double

// Matrix creation and operations  
def dense(numRows: Int, numCols: Int, values: Array[Double]): Matrix
def sparse(numRows: Int, numCols: Int, colPtrs: Array[Int], rowIndices: Array[Int], values: Array[Double]): Matrix
def zeros(numRows: Int, numCols: Int): Matrix
def eye(n: Int): Matrix

Linear Algebra

Statistical Distributions

Multivariate statistical distributions with robust numerical implementations that handle edge cases like singular covariance matrices. Provides probability density function calculations with numerical stability features.

class MultivariateGaussian(mean: Vector, cov: Matrix) {
  def pdf(x: Vector): Double
  def logpdf(x: Vector): Double
}

Statistical Distributions

BLAS Operations

Basic Linear Algebra Subprograms (BLAS) providing optimized implementations of vector and matrix operations. Includes Level 1 (vector-vector), Level 2 (matrix-vector), and Level 3 (matrix-matrix) operations.

object BLAS {
  // Level 1: Vector operations
  def axpy(a: Double, x: Vector, y: Vector): Unit
  def dot(x: Vector, y: Vector): Double  
  def copy(x: Vector, y: Vector): Unit
  def scal(a: Double, x: Vector): Unit
  
  // Level 2: Matrix-vector operations
  def gemv(alpha: Double, A: Matrix, x: Vector, beta: Double, y: DenseVector): Unit
  def syr(alpha: Double, x: Vector, A: DenseMatrix): Unit
  
  // Level 3: Matrix-matrix operations
  def gemm(alpha: Double, A: Matrix, B: DenseMatrix, beta: Double, C: DenseMatrix): Unit
}

Types

Core Vector Types

sealed trait Vector extends Serializable {
  def size: Int
  def toArray: Array[Double]
  def apply(i: Int): Double
  def copy: Vector
  def foreachActive(f: (Int, Double) => Unit): Unit
  def numActives: Int
  def numNonzeros: Int
  def toSparse: SparseVector
  def toDense: DenseVector
  def compressed: Vector
  def argmax: Int
  def dot(v: Vector): Double
}

class DenseVector(val values: Array[Double]) extends Vector

class SparseVector(
  override val size: Int,
  val indices: Array[Int], 
  val values: Array[Double]
) extends Vector

Core Matrix Types

sealed trait Matrix extends Serializable {
  def numRows: Int
  def numCols: Int
  val isTransposed: Boolean
  def toArray: Array[Double]
  def colIter: Iterator[Vector]
  def rowIter: Iterator[Vector]
  def apply(i: Int, j: Int): Double
  def copy: Matrix
  def transpose: Matrix
  def multiply(y: DenseMatrix): DenseMatrix
  def multiply(y: Vector): DenseVector
  def foreachActive(f: (Int, Int, Double) => Unit): Unit
  def numNonzeros: Int
  def numActives: Int
  def toSparse: SparseMatrix
  def toDense: DenseMatrix
  def compressed: Matrix
}

class DenseMatrix(
  val numRows: Int,
  val numCols: Int, 
  val values: Array[Double],
  override val isTransposed: Boolean = false
) extends Matrix

class SparseMatrix(
  val numRows: Int,
  val numCols: Int,
  val colPtrs: Array[Int],
  val rowIndices: Array[Int], 
  val values: Array[Double],
  override val isTransposed: Boolean = false
) extends Matrix