or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

core-graph-api.mdgraph-algorithms.mdindex.mdpregel-api.mdutilities.md
tile.json

tessl/maven-org-apache-spark--spark-graphx_2-12

GraphX is Apache Spark's API for graphs and graph-parallel computation

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
mavenpkg:maven/org.apache.spark/spark-graphx_2.12@3.5.x

To install, run

npx @tessl/cli install tessl/maven-org-apache-spark--spark-graphx_2-12@3.5.0

index.mddocs/

Apache Spark GraphX

GraphX is Apache Spark's API for graphs and graph-parallel computation. It provides a distributed graph processing framework built on top of Spark RDDs, offering both graph-parallel and data-parallel views of the same physical data. GraphX enables users to seamlessly move between graph structures and tabular data, making it ideal for ETL, exploratory analysis, and iterative graph computation.

Package Information

  • Package Name: org.apache.spark/spark-graphx_2.12
  • Package Type: maven
  • Language: Scala
  • Installation: Add to build.sbt: "org.apache.spark" %% "spark-graphx" % "3.5.6"

Core Imports

import org.apache.spark.graphx._
import org.apache.spark.graphx.lib._
import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkContext, SparkConf}
import org.apache.spark.storage.StorageLevel

For specific imports:

import org.apache.spark.graphx.{Graph, VertexId, Edge, EdgeTriplet}
import org.apache.spark.graphx.{VertexRDD, EdgeRDD, GraphOps}
import org.apache.spark.graphx.{PartitionStrategy, EdgeDirection}

Basic Usage

import org.apache.spark.graphx._
import org.apache.spark.rdd.RDD

// Create vertices RDD: (VertexId, VertexAttribute)
val vertices: RDD[(VertexId, String)] = sc.parallelize(Array(
  (1L, "Alice"),
  (2L, "Bob"), 
  (3L, "Charlie")
))

// Create edges RDD
val edges: RDD[Edge[String]] = sc.parallelize(Array(
  Edge(1L, 2L, "friend"),
  Edge(2L, 3L, "friend"),
  Edge(3L, 1L, "friend")
))

// Build the graph
val graph: Graph[String, String] = Graph(vertices, edges)

// Basic operations
println(s"Vertices: ${graph.numVertices}")
println(s"Edges: ${graph.numEdges}")

// Transform vertex attributes
val transformedGraph = graph.mapVertices((id, attr) => attr.toUpperCase)

// Run PageRank algorithm
val ranks = graph.pageRank(0.0001).vertices

Architecture

GraphX is built around several key components:

  • Graph Abstraction: Immutable, distributed graphs with typed vertex and edge attributes
  • Specialized RDDs: VertexRDD and EdgeRDD provide efficient graph-specific operations
  • Triplet View: EdgeTriplet joins edges with adjacent vertex attributes for message passing
  • Partitioning Strategies: Optimize data locality and minimize communication overhead
  • Pregel API: Vertex-centric programming model for iterative graph algorithms
  • Algorithm Library: Pre-implemented graph algorithms like PageRank and Connected Components

Capabilities

Core Graph Operations

Fundamental graph construction, transformation, and analysis operations for building and manipulating graph structures.

// Graph construction
def Graph.apply[VD: ClassTag, ED: ClassTag](
  vertices: RDD[(VertexId, VD)], 
  edges: RDD[Edge[ED]]
): Graph[VD, ED]

def Graph.fromEdges[VD: ClassTag, ED: ClassTag](
  edges: RDD[Edge[ED]], 
  defaultValue: VD
): Graph[VD, ED]

// Graph transformations
def mapVertices[VD2: ClassTag](map: (VertexId, VD) => VD2): Graph[VD2, ED]
def mapEdges[ED2: ClassTag](map: Edge[ED] => ED2): Graph[VD, ED2]

Core Graph API

Graph Algorithms

Comprehensive collection of graph algorithms including PageRank, Connected Components, Triangle Counting, and community detection.

// PageRank algorithms
def pageRank(tol: Double, resetProb: Double = 0.15): Graph[Double, Double]
def staticPageRank(numIter: Int, resetProb: Double = 0.15): Graph[Double, Double]

// Component algorithms  
def connectedComponents(): Graph[VertexId, ED]
def stronglyConnectedComponents(numIter: Int): Graph[VertexId, ED]

// Community detection
def triangleCount(): Graph[Int, ED]

Graph Algorithms

Pregel Message-Passing API

Vertex-centric programming framework for implementing custom iterative graph algorithms using the Pregel computational model.

def pregel[A: ClassTag](
  initialMsg: A,
  maxIterations: Int = Int.MaxValue,
  activeDirection: EdgeDirection = EdgeDirection.Either
)(
  vprog: (VertexId, VD, A) => VD,
  sendMsg: EdgeTriplet[VD, ED] => Iterator[(VertexId, A)],
  mergeMsg: (A, A) => A
): Graph[VD, ED]

Pregel API

Utilities and Graph Generation

Graph loading, generation, and utility functions for creating test graphs, importing data, and performance optimization.

// Graph loading
def GraphLoader.edgeListFile(
  sc: SparkContext,
  path: String,
  canonicalOrientation: Boolean = false,
  numEdgePartitions: Int = -1
): Graph[Int, Int]

// Graph generation
def GraphGenerators.logNormalGraph(
  sc: SparkContext,
  numVertices: Int,
  numEParts: Int = -1,
  mu: Double = 4.0,
  sigma: Double = 1.3
): Graph[Long, Int]

Utilities

Core Types

// Type aliases
type VertexId = Long
type PartitionID = Int

// Core data structures
case class Edge[ED](srcId: VertexId, dstId: VertexId, attr: ED)

class EdgeTriplet[VD, ED] extends Edge[ED] {
  val srcAttr: VD
  val dstAttr: VD
}

abstract class Graph[VD: ClassTag, ED: ClassTag] {
  val vertices: VertexRDD[VD]
  val edges: EdgeRDD[ED]  
  val triplets: RDD[EdgeTriplet[VD, ED]]
}

abstract class VertexRDD[VD] extends RDD[(VertexId, VD)]
abstract class EdgeRDD[ED] extends RDD[Edge[ED]]

Common Patterns

Graph Construction from Data:

// From edge list file
val graph = GraphLoader.edgeListFile(sc, "path/to/edges.txt")

// From existing RDDs  
val graph = Graph(verticesRDD, edgesRDD)

// From edge tuples with default vertex values
val graph = Graph.fromEdgeTuples(edgeTuples, defaultValue = "Unknown")

Performance Optimization:

// Cache for iterative algorithms
val cachedGraph = graph.cache()

// Partition for better locality
val partitionedGraph = graph.partitionBy(PartitionStrategy.EdgePartition2D)

// Checkpoint for fault tolerance
graph.checkpoint()