or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

blas.mddistributions.mdindex.mdmatrices.mdutils.mdvectors.md
tile.json

index.mddocs/

Spark MLlib Local

Spark MLlib Local provides local linear algebra operations and utilities for Apache Spark's machine learning library. This library implements core data structures including Vector and Matrix types, along with optimized BLAS (Basic Linear Algebra Subprograms) operations for numerical computations in distributed machine learning applications.

Package Information

  • Package Name: spark-mllib-local_2.12
  • Package Type: maven
  • Language: Scala
  • Installation: Add to your Maven/SBT dependencies:

Maven:

<dependency>
  <groupId>org.apache.spark</groupId>
  <artifactId>spark-mllib-local_2.12</artifactId>
  <version>3.5.6</version>
</dependency>

SBT:

libraryDependencies += "org.apache.spark" %% "spark-mllib-local" % "3.5.6"

Core Imports

import org.apache.spark.ml.linalg._
import org.apache.spark.ml.stat.distribution.MultivariateGaussian
import org.apache.spark.ml.impl.Utils

Basic Usage

import org.apache.spark.ml.linalg._

// Create vectors
val denseVec = Vectors.dense(1.0, 2.0, 3.0)
val sparseVec = Vectors.sparse(5, Array(0, 2, 4), Array(1.0, 3.0, 5.0))

// Create matrices
val denseMatrix = DenseMatrix.zeros(3, 3)
val sparseMatrix = SparseMatrix.speye(3) // Identity matrix

// Linear algebra operations
val dotProduct = BLAS.dot(denseVec, denseVec)
val denseCopy = denseVec.toDense
BLAS.axpy(2.0, denseVec, denseCopy) // denseCopy += 2.0 * denseVec

// Statistical distributions
val mean = Vectors.dense(0.0, 0.0)
val cov = DenseMatrix.eye(2)
val mvn = new MultivariateGaussian(mean, cov)
val density = mvn.pdf(Vectors.dense(1.0, 1.0))

Architecture

Spark MLlib Local is built around several key components:

  • Vector Types: Dense and sparse vector implementations with unified API
  • Matrix Types: Dense and sparse matrix implementations with lazy transposition
  • BLAS Operations: Optimized linear algebra routines with native acceleration
  • Statistical Distributions: Multivariate probability distributions with numerical stability
  • Type Safety: Sealed trait hierarchies ensuring compile-time correctness
  • Native Integration: Automatic fallback from native BLAS to pure JVM implementations

Capabilities

Vector Operations

Core vector data structures and operations for numerical computing. Supports both dense and sparse representations with automatic optimization.

sealed trait Vector extends Serializable {
  def size: Int
  def toArray: Array[Double]
  def apply(i: Int): Double
  def copy: Vector
  def dot(v: Vector): Double
  def numNonzeros: Int
  def compressed: Vector
}

object Vectors {
  def dense(values: Array[Double]): Vector
  def sparse(size: Int, indices: Array[Int], values: Array[Double]): Vector
  def zeros(size: Int): Vector
  def norm(vector: Vector, p: Double): Double
  def sqdist(v1: Vector, v2: Vector): Double
}

Vector Operations

Matrix Operations

Matrix data structures and operations for linear algebra computations. Provides both dense and sparse implementations with format conversion capabilities.

sealed trait Matrix extends Serializable {
  def numRows: Int
  def numCols: Int
  def apply(i: Int, j: Int): Double
  def transpose: Matrix
  def multiply(y: Vector): DenseVector
  def multiply(y: DenseMatrix): DenseMatrix
  def compressed: Matrix
}

object Matrices {
  def dense(numRows: Int, numCols: Int, values: Array[Double]): Matrix
  def sparse(numRows: Int, numCols: Int, colPtrs: Array[Int], rowIndices: Array[Int], values: Array[Double]): Matrix
  def zeros(numRows: Int, numCols: Int): Matrix
  def eye(n: Int): Matrix
}

Matrix Operations

Linear Algebra Operations

Optimized linear algebra operations accessible through Vector and Matrix APIs. Underlying BLAS implementation provides automatic native acceleration.

// Vector operations (accessing optimized BLAS internally)
val dotProduct = vector1.dot(vector2)

// Matrix operations (accessing optimized BLAS internally)  
val result = matrix.multiply(vector)
val product = matrix.multiply(otherMatrix)

Linear Algebra Operations

Statistical Distributions

Multivariate probability distributions with numerical stability and support for singular covariance matrices.

class MultivariateGaussian(mean: Vector, cov: Matrix) extends Serializable {
  def pdf(x: Vector): Double
  def logpdf(x: Vector): Double
}

Statistical Distributions

Utility Functions

Numerical utility functions and mathematical helpers for robust computations with numerical stability considerations.

object Utils {
  lazy val EPSILON: Double
  def unpackUpperTriangular(n: Int, triangularValues: Array[Double]): Array[Double]
  def indexUpperTriangular(n: Int, i: Int, j: Int): Int
  def log1pExp(x: Double): Double
  def softmax(array: Array[Double]): Unit
}

Utility Functions

Error Handling

The library uses standard Scala exception handling:

  • IllegalArgumentException: Invalid parameters or dimension mismatches
  • UnsupportedOperationException: Operations not supported for specific vector/matrix types
  • IndexOutOfBoundsException: Invalid indices
  • NoSuchElementException: Attempting to update zero elements in sparse matrices

Operations validate input dimensions and throw descriptive exceptions for invalid operations.