CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/maven-org-apache-flink--flink-gelly-2-10

Gelly: Flink Graph API - A comprehensive graph processing library for Apache Flink

Pending
Overview
Eval results
Files

algorithms.mddocs/

Graph Algorithms

Gelly provides a comprehensive library of pre-implemented graph algorithms optimized for distributed execution on Flink clusters. These algorithms cover common graph analysis tasks including path finding, centrality analysis, clustering, and community detection.

Capabilities

Algorithm Execution Interface

All algorithms implement the GraphAlgorithm interface and are executed using the graph's run method.

public interface GraphAlgorithm<K, VV, EV, T> {
    T run(Graph<K, VV, EV> input) throws Exception;
}

// Execute algorithm on graph
public <T> T run(GraphAlgorithm<K, VV, EV, T> algorithm) throws Exception

Connected Components

Find connected components in undirected graphs using iterative label propagation.

public class ConnectedComponents<K, VV, EV> implements GraphAlgorithm<K, VV, EV, DataSet<Vertex<K, K>>> {
    public ConnectedComponents(int maxIterations)
}

Usage Example:

// Find connected components (max 10 iterations)
ConnectedComponents<Long, String, Double> cc = new ConnectedComponents<>(10);
DataSet<Vertex<Long, Long>> components = graph.run(cc);

// Each vertex will have the minimum vertex ID in its component as its value
components.print();

Single Source Shortest Paths

Compute shortest paths from a source vertex to all reachable vertices using the Bellman-Ford algorithm.

public class SingleSourceShortestPaths<K, VV> implements GraphAlgorithm<K, VV, Double, DataSet<Vertex<K, Double>>> {
    public SingleSourceShortestPaths(K srcVertexId, Integer maxIterations)
}

Usage Example:

// Find shortest paths from vertex 1 (max 10 iterations)
SingleSourceShortestPaths<Long, String> sssp = new SingleSourceShortestPaths<>(1L, 10);
DataSet<Vertex<Long, Double>> distances = graph.run(sssp);

// Each vertex will contain its shortest distance from source vertex 1
distances.print();

GSA Single Source Shortest Paths

Alternative SSSP implementation using the Gather-Sum-Apply iteration model.

public class GSASingleSourceShortestPaths<K, VV> implements GraphAlgorithm<K, VV, Double, DataSet<Vertex<K, Double>>> {
    public GSASingleSourceShortestPaths(K srcVertexId, Integer maxIterations)
}

PageRank

Compute PageRank centrality scores using the iterative power method.

public class PageRank<K, VV, EV> implements GraphAlgorithm<K, VV, EV, DataSet<Vertex<K, Double>>> {
    public PageRank(double dampingFactor, int maxIterations)
    public PageRank(double dampingFactor, double convergenceThreshold)
}

Usage Example:

// Run PageRank with damping factor 0.85 for 10 iterations
PageRank<Long, String, Double> pageRank = new PageRank<>(0.85, 10);
DataSet<Vertex<Long, Double>> ranks = graph.run(pageRank);

// Each vertex will contain its PageRank score
ranks.print();

// Run until convergence (threshold 0.0001)
PageRank<Long, String, Double> convergentPR = new PageRank<>(0.85, 0.0001);
DataSet<Vertex<Long, Double>> convergedRanks = graph.run(convergentPR);

HITS Algorithm

Compute Hub and Authority scores using the Hyperlink-Induced Topic Search algorithm.

public class HITS<K, VV, EV> implements GraphAlgorithm<K, VV, EV, DataSet<Vertex<K, Tuple2<Double, Double>>>> {
    public HITS(int maxIterations)
    public HITS(double convergenceThreshold)
}

Usage Example:

// Run HITS for 10 iterations
HITS<Long, String, Double> hits = new HITS<>(10);
DataSet<Vertex<Long, Tuple2<Double, Double>>> scores = graph.run(hits);

// Each vertex contains (hub_score, authority_score)
scores.print();

Community Detection

Detect communities using label propagation algorithm.

public class CommunityDetection<K, VV, EV> implements GraphAlgorithm<K, VV, EV, DataSet<Vertex<K, K>>> {
    public CommunityDetection(double deltaThreshold, int maxIterations)
}

Usage Example:

// Detect communities with delta threshold 0.5 and max 20 iterations
CommunityDetection<Long, String, Double> cd = new CommunityDetection<>(0.5, 20);
DataSet<Vertex<Long, Long>> communities = graph.run(cd);

// Each vertex will contain its community ID
communities.print();

Triangle Enumeration

Enumerate all triangles in the graph.

public class TriangleEnumerator<K, VV, EV> implements GraphAlgorithm<K, VV, EV, DataSet<Tuple3<K, K, K>>> {
    public TriangleEnumerator()
}

Usage Example:

TriangleEnumerator<Long, String, Double> triangles = new TriangleEnumerator<>();
DataSet<Tuple3<Long, Long, Long>> allTriangles = graph.run(triangles);

// Each tuple contains three vertex IDs forming a triangle
allTriangles.print();

Link Analysis Algorithms

Specialized algorithms for link analysis and web graph processing.

PageRank Variants

// Standard PageRank
public class PageRank<K, VV, EV> implements GraphAlgorithm<K, VV, EV, DataSet<Vertex<K, Double>>>

// GSA-based PageRank implementation
public class GSAPageRank<K, VV, EV> implements GraphAlgorithm<K, VV, EV, DataSet<Vertex<K, Double>>>

Link Analysis Utilities

Helper functions and utilities for link analysis algorithms.

// Link analysis helper functions in org.apache.flink.graph.library.link_analysis.Functions
public class Functions {
    // Utility functions for PageRank and HITS algorithms
}

Clustering Algorithms

Algorithms for computing clustering coefficients and community structure.

Directed Graph Clustering

Clustering algorithms specifically designed for directed graphs.

// Directed clustering algorithms in org.apache.flink.graph.library.clustering.directed
public class LocalClusteringCoefficient<K, VV, EV> implements GraphAlgorithm<K, VV, EV, DataSet<Vertex<K, Double>>>
public class GlobalClusteringCoefficient<K, VV, EV> implements GraphAlgorithm<K, VV, EV, Double>
public class TriadicCensus<K, VV, EV> implements GraphAlgorithm<K, VV, EV, DataSet<Tuple4<Integer, Integer, Integer, LongValue>>>

Undirected Graph Clustering

Clustering algorithms for undirected graphs.

// Undirected clustering algorithms in org.apache.flink.graph.library.clustering.undirected
public class LocalClusteringCoefficient<K, VV, EV> implements GraphAlgorithm<K, VV, EV, DataSet<Vertex<K, Double>>>
public class GlobalClusteringCoefficient<K, VV, EV> implements GraphAlgorithm<K, VV, EV, Double>
public class TriangleListing<K, VV, EV> implements GraphAlgorithm<K, VV, EV, DataSet<Tuple3<K, K, K>>>

Usage Example:

// Compute local clustering coefficient for each vertex
LocalClusteringCoefficient<Long, String, Double> lcc = 
    new org.apache.flink.graph.library.clustering.undirected.LocalClusteringCoefficient<>();
DataSet<Vertex<Long, Double>> coefficients = graph.run(lcc);

// Compute global clustering coefficient for the entire graph
GlobalClusteringCoefficient<Long, String, Double> gcc = 
    new org.apache.flink.graph.library.clustering.undirected.GlobalClusteringCoefficient<>();
Double globalCoefficient = graph.run(gcc);

Metric Algorithms

Algorithms for computing various graph metrics and statistics.

Directed Graph Metrics

// Directed graph metrics in org.apache.flink.graph.library.metric.directed
public class EdgeMetrics<K, VV, EV> implements GraphAnalytic<K, VV, EV, EdgeMetrics.Result>
public class VertexMetrics<K, VV, EV> implements GraphAnalytic<K, VV, EV, VertexMetrics.Result>

Undirected Graph Metrics

// Undirected graph metrics in org.apache.flink.graph.library.metric.undirected
public class EdgeMetrics<K, VV, EV> implements GraphAnalytic<K, VV, EV, EdgeMetrics.Result>
public class VertexMetrics<K, VV, EV> implements GraphAnalytic<K, VV, EV, VertexMetrics.Result>

Usage Example:

// Compute vertex metrics for undirected graph
VertexMetrics<Long, String, Double> vertexMetrics = 
    new org.apache.flink.graph.library.metric.undirected.VertexMetrics<>();
VertexMetrics.Result result = graph.run(vertexMetrics);

// Access various metrics
System.out.println("Number of vertices: " + result.getNumberOfVertices());
System.out.println("Number of edges: " + result.getNumberOfEdges());
System.out.println("Average degree: " + result.getAverageDegree());
System.out.println("Density: " + result.getDensity());

Similarity Algorithms

Algorithms for computing graph and vertex similarity measures.

// Similarity algorithms in org.apache.flink.graph.library.similarity
// Various similarity measures and algorithms

Custom Algorithm Development

Creating Custom Algorithms

Implement the GraphAlgorithm interface to create custom algorithms:

public class CustomAlgorithm<K, VV, EV> implements GraphAlgorithm<K, VV, EV, DataSet<MyResult>> {
    
    private final int parameter;
    
    public CustomAlgorithm(int parameter) {
        this.parameter = parameter;
    }
    
    @Override
    public DataSet<MyResult> run(Graph<K, VV, EV> input) throws Exception {
        // Algorithm implementation
        return input.getVertices()
            .map(new MyMapFunction(parameter))
            .returns(MyResult.class);
    }
}

// Usage
CustomAlgorithm<Long, String, Double> custom = new CustomAlgorithm<>(42);
DataSet<MyResult> result = graph.run(custom);

Algorithm Configuration

Many algorithms support configuration options:

// Algorithms with convergence thresholds
PageRank<Long, String, Double> pr = new PageRank<>(0.85, 0.0001); // threshold-based

// Algorithms with iteration limits
ConnectedComponents<Long, String, Double> cc = new ConnectedComponents<>(100); // max iterations

// Algorithms with multiple parameters
CommunityDetection<Long, String, Double> cd = new CommunityDetection<>(0.5, 50); // threshold + iterations

Performance Considerations

  • Graph Preprocessing: Some algorithms benefit from graph preprocessing (e.g., removing self-loops)
  • Data Types: Use appropriate data types for vertex/edge values (primitive wrappers vs objects)
  • Memory Management: Configure Flink memory settings for large graphs
  • Parallelism: Set appropriate parallelism levels for algorithm execution
  • Checkpointing: Enable checkpointing for fault tolerance in long-running algorithms

Install with Tessl CLI

npx tessl i tessl/maven-org-apache-flink--flink-gelly-2-10

docs

algorithms.md

analytics.md

data-access.md

generators.md

graph-creation.md

index.md

iterative-processing.md

tile.json