CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/maven-org-apache-spark--spark-mllib-local-2-11

Spark ML Local Library providing linear algebra and statistical utilities for local machine learning operations without requiring a distributed Spark cluster

Pending
Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Pending

The risk profile of this skill

Overview
Eval results
Files

Spark MLlib Local Library

Spark MLlib Local is a lightweight, dependency-minimal package that provides fundamental linear algebra operations and basic statistical utilities for machine learning tasks that can be executed locally without requiring a distributed Spark cluster framework.

Package Information

  • Package Name: spark-mllib-local_2.11
  • Package Type: maven
  • Language: Scala 2.11
  • Group ID: org.apache.spark
  • Artifact ID: spark-mllib-local_2.11
  • Installation:
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-mllib-local_2.11</artifactId>
      <version>2.2.3</version>
    </dependency>

For SBT:

libraryDependencies += "org.apache.spark" %% "spark-mllib-local" % "2.2.3"

Core Imports

import org.apache.spark.ml.linalg.{Vector, DenseVector, SparseVector, Vectors}
import org.apache.spark.ml.linalg.{Matrix, DenseMatrix, SparseMatrix, Matrices}
import org.apache.spark.ml.stat.distribution.MultivariateGaussian

Basic Usage

import org.apache.spark.ml.linalg.{Vectors, Matrices}
import org.apache.spark.ml.stat.distribution.MultivariateGaussian

// Create vectors
val dense = Vectors.dense(1.0, 2.0, 3.0)
val sparse = Vectors.sparse(5, Array(0, 2, 4), Array(1.0, 3.0, 5.0))

// Create matrices
val denseMatrix = Matrices.dense(2, 3, Array(1.0, 2.0, 3.0, 4.0, 5.0, 6.0))
val sparseMatrix = Matrices.sparse(3, 3, Array(0, 2, 3, 6), Array(0, 2, 1, 0, 1, 2), Array(1.0, 2.0, 3.0, 4.0, 5.0, 6.0))

// Vector operations
val vectorCopy = dense.copy
val dotProduct = sparse.toArray.zip(dense.toArray).map{ case (a, b) => a * b }.sum
val l2Norm = Vectors.norm(dense, 2.0)
val distance = Vectors.sqdist(dense, sparse)

// Matrix operations
val matrixProduct = denseMatrix.multiply(sparseMatrix.transpose)
val transposed = denseMatrix.transpose
val compressed = sparseMatrix.compressed

// Statistical distribution  
val mean = Vectors.dense(0.0, 0.0)
val cov = Matrices.eye(2)
val gaussian = new MultivariateGaussian(mean, cov)
val density = gaussian.pdf(Vectors.dense(1.0, 1.0))

Architecture

Spark MLlib Local is organized around several key components:

  • Vector System: Dense and sparse vector implementations with unified operations
  • Matrix System: Dense and sparse matrix implementations supporting various storage formats
  • Statistical Distributions: Probability distributions for machine learning algorithms
  • Testing Utilities: Tolerance-based comparison utilities for numerical testing

The library emphasizes performance through:

  • Optimized Storage: Automatic selection between dense/sparse representations based on data characteristics
  • Lazy Evaluation: Operations like compression are performed on-demand to avoid unnecessary computation
  • Memory Efficiency: Compact storage formats for both dense and sparse data structures
  • Native Integration: Internal integration with optimized mathematical libraries

Capabilities

Vector Operations

Core vector functionality providing dense and sparse representations with unified operations. Essential for feature representations and mathematical computations.

// Factory methods
object Vectors {
  def dense(firstValue: Double, otherValues: Double*): Vector
  def dense(values: Array[Double]): Vector
  def sparse(size: Int, indices: Array[Int], values: Array[Double]): Vector
  def sparse(size: Int, elements: Seq[(Int, Double)]): Vector
  def zeros(size: Int): Vector
  def norm(vector: Vector, p: Double): Double
  def sqdist(v1: Vector, v2: Vector): Double
}

// Vector trait
trait Vector {
  def size: Int
  def toArray: Array[Double]
  def apply(i: Int): Double
  def copy: Vector
  def foreachActive(f: (Int, Double) => Unit): Unit
  def numActives: Int
  def numNonzeros: Int
  def toSparse: SparseVector
  def toDense: DenseVector
  def compressed: Vector
  def argmax: Int
}

Vector Operations

Matrix Operations

Comprehensive matrix functionality supporting both dense and sparse matrices with efficient storage formats and mathematical operations.

// Factory methods
object Matrices {
  def dense(numRows: Int, numCols: Int, values: Array[Double]): Matrix
  def sparse(numRows: Int, numCols: Int, colPtrs: Array[Int], rowIndices: Array[Int], values: Array[Double]): Matrix
  def zeros(numRows: Int, numCols: Int): Matrix
  def ones(numRows: Int, numCols: Int): Matrix
  def eye(n: Int): Matrix
  def speye(n: Int): Matrix
  def diag(vector: Vector): Matrix
  def horzcat(matrices: Array[Matrix]): Matrix
  def vertcat(matrices: Array[Matrix]): Matrix
}

// Matrix trait
trait Matrix {
  def numRows: Int
  def numCols: Int
  def apply(i: Int, j: Int): Double
  def transpose: Matrix
  def multiply(y: Matrix): DenseMatrix
  def multiply(y: Vector): DenseVector
  def toSparse: SparseMatrix
  def toDense: DenseMatrix
  def compressed: Matrix
}

Matrix Operations

Statistical Distributions

Statistical distribution implementations for probability computations and machine learning algorithms.

class MultivariateGaussian(mean: Vector, cov: Matrix) {
  val mean: Vector
  val cov: Matrix
  def pdf(x: Vector): Double
  def logpdf(x: Vector): Double
}

Statistical Distributions

Testing Utilities

Numerical testing utilities with tolerance-based comparisons for vectors, matrices, and doubles, essential for testing numerical algorithms.

object TestingUtils {
  implicit class DoubleWithAlmostEquals(val x: Double) {
    def ~=(r: CompareDoubleRightSide): Boolean
    def ~==(r: CompareDoubleRightSide): Boolean
    def absTol(eps: Double): CompareDoubleRightSide
    def relTol(eps: Double): CompareDoubleRightSide
  }
  
  implicit class VectorWithAlmostEquals(val x: Vector) {
    def ~=(r: CompareVectorRightSide): Boolean
    def ~==(r: CompareVectorRightSide): Boolean
    def absTol(eps: Double): CompareVectorRightSide
    def relTol(eps: Double): CompareVectorRightSide
  }
}

Testing Utilities

Types

Core Types

// Vector hierarchy
sealed trait Vector extends Serializable
class DenseVector(val values: Array[Double]) extends Vector
class SparseVector(override val size: Int, val indices: Array[Int], val values: Array[Double]) extends Vector

// Matrix hierarchy  
sealed trait Matrix extends Serializable
class DenseMatrix(val numRows: Int, val numCols: Int, val values: Array[Double], override val isTransposed: Boolean) extends Matrix
class SparseMatrix(val numRows: Int, val numCols: Int, val colPtrs: Array[Int], val rowIndices: Array[Int], val values: Array[Double], override val isTransposed: Boolean) extends Matrix

Testing Types

case class CompareDoubleRightSide(fun: (Double, Double, Double) => Boolean, y: Double, eps: Double, method: String)
case class CompareVectorRightSide(fun: (Vector, Vector, Double) => Boolean, y: Vector, eps: Double, method: String)
case class CompareMatrixRightSide(fun: (Matrix, Matrix, Double) => Boolean, y: Matrix, eps: Double, method: String)
Workspace
tessl
Visibility
Public
Created
Last updated
Describes
mavenpkg:maven/org.apache.spark/spark-mllib-local_2.11@2.2.x
Publish Source
CLI
Badge
tessl/maven-org-apache-spark--spark-mllib-local-2-11 badge