0
# Graph Algorithms
1
2
Gelly provides a comprehensive library of pre-implemented graph algorithms optimized for distributed execution on Flink clusters. These algorithms cover common graph analysis tasks including path finding, centrality analysis, clustering, and community detection.
3
4
## Capabilities
5
6
### Algorithm Execution Interface
7
8
All algorithms implement the `GraphAlgorithm` interface and are executed using the graph's `run` method.
9
10
```java { .api }
11
public interface GraphAlgorithm<K, VV, EV, T> {
12
T run(Graph<K, VV, EV> input) throws Exception;
13
}
14
15
// Execute algorithm on graph
16
public <T> T run(GraphAlgorithm<K, VV, EV, T> algorithm) throws Exception
17
```
18
19
### Connected Components
20
21
Find connected components in undirected graphs using iterative label propagation.
22
23
```java { .api }
24
public class ConnectedComponents<K, VV, EV> implements GraphAlgorithm<K, VV, EV, DataSet<Vertex<K, K>>> {
25
public ConnectedComponents(int maxIterations)
26
}
27
```
28
29
**Usage Example:**
30
31
```java
32
// Find connected components (max 10 iterations)
33
ConnectedComponents<Long, String, Double> cc = new ConnectedComponents<>(10);
34
DataSet<Vertex<Long, Long>> components = graph.run(cc);
35
36
// Each vertex will have the minimum vertex ID in its component as its value
37
components.print();
38
```
39
40
### Single Source Shortest Paths
41
42
Compute shortest paths from a source vertex to all reachable vertices using the Bellman-Ford algorithm.
43
44
```java { .api }
45
public class SingleSourceShortestPaths<K, VV> implements GraphAlgorithm<K, VV, Double, DataSet<Vertex<K, Double>>> {
46
public SingleSourceShortestPaths(K srcVertexId, Integer maxIterations)
47
}
48
```
49
50
**Usage Example:**
51
52
```java
53
// Find shortest paths from vertex 1 (max 10 iterations)
54
SingleSourceShortestPaths<Long, String> sssp = new SingleSourceShortestPaths<>(1L, 10);
55
DataSet<Vertex<Long, Double>> distances = graph.run(sssp);
56
57
// Each vertex will contain its shortest distance from source vertex 1
58
distances.print();
59
```
60
61
### GSA Single Source Shortest Paths
62
63
Alternative SSSP implementation using the Gather-Sum-Apply iteration model.
64
65
```java { .api }
66
public class GSASingleSourceShortestPaths<K, VV> implements GraphAlgorithm<K, VV, Double, DataSet<Vertex<K, Double>>> {
67
public GSASingleSourceShortestPaths(K srcVertexId, Integer maxIterations)
68
}
69
```
70
71
### PageRank
72
73
Compute PageRank centrality scores using the iterative power method.
74
75
```java { .api }
76
public class PageRank<K, VV, EV> implements GraphAlgorithm<K, VV, EV, DataSet<Vertex<K, Double>>> {
77
public PageRank(double dampingFactor, int maxIterations)
78
public PageRank(double dampingFactor, double convergenceThreshold)
79
}
80
```
81
82
**Usage Example:**
83
84
```java
85
// Run PageRank with damping factor 0.85 for 10 iterations
86
PageRank<Long, String, Double> pageRank = new PageRank<>(0.85, 10);
87
DataSet<Vertex<Long, Double>> ranks = graph.run(pageRank);
88
89
// Each vertex will contain its PageRank score
90
ranks.print();
91
92
// Run until convergence (threshold 0.0001)
93
PageRank<Long, String, Double> convergentPR = new PageRank<>(0.85, 0.0001);
94
DataSet<Vertex<Long, Double>> convergedRanks = graph.run(convergentPR);
95
```
96
97
### HITS Algorithm
98
99
Compute Hub and Authority scores using the Hyperlink-Induced Topic Search algorithm.
100
101
```java { .api }
102
public class HITS<K, VV, EV> implements GraphAlgorithm<K, VV, EV, DataSet<Vertex<K, Tuple2<Double, Double>>>> {
103
public HITS(int maxIterations)
104
public HITS(double convergenceThreshold)
105
}
106
```
107
108
**Usage Example:**
109
110
```java
111
// Run HITS for 10 iterations
112
HITS<Long, String, Double> hits = new HITS<>(10);
113
DataSet<Vertex<Long, Tuple2<Double, Double>>> scores = graph.run(hits);
114
115
// Each vertex contains (hub_score, authority_score)
116
scores.print();
117
```
118
119
### Community Detection
120
121
Detect communities using label propagation algorithm.
122
123
```java { .api }
124
public class CommunityDetection<K, VV, EV> implements GraphAlgorithm<K, VV, EV, DataSet<Vertex<K, K>>> {
125
public CommunityDetection(double deltaThreshold, int maxIterations)
126
}
127
```
128
129
**Usage Example:**
130
131
```java
132
// Detect communities with delta threshold 0.5 and max 20 iterations
133
CommunityDetection<Long, String, Double> cd = new CommunityDetection<>(0.5, 20);
134
DataSet<Vertex<Long, Long>> communities = graph.run(cd);
135
136
// Each vertex will contain its community ID
137
communities.print();
138
```
139
140
### Triangle Enumeration
141
142
Enumerate all triangles in the graph.
143
144
```java { .api }
145
public class TriangleEnumerator<K, VV, EV> implements GraphAlgorithm<K, VV, EV, DataSet<Tuple3<K, K, K>>> {
146
public TriangleEnumerator()
147
}
148
```
149
150
**Usage Example:**
151
152
```java
153
TriangleEnumerator<Long, String, Double> triangles = new TriangleEnumerator<>();
154
DataSet<Tuple3<Long, Long, Long>> allTriangles = graph.run(triangles);
155
156
// Each tuple contains three vertex IDs forming a triangle
157
allTriangles.print();
158
```
159
160
## Link Analysis Algorithms
161
162
Specialized algorithms for link analysis and web graph processing.
163
164
### PageRank Variants
165
166
```java { .api }
167
// Standard PageRank
168
public class PageRank<K, VV, EV> implements GraphAlgorithm<K, VV, EV, DataSet<Vertex<K, Double>>>
169
170
// GSA-based PageRank implementation
171
public class GSAPageRank<K, VV, EV> implements GraphAlgorithm<K, VV, EV, DataSet<Vertex<K, Double>>>
172
```
173
174
### Link Analysis Utilities
175
176
Helper functions and utilities for link analysis algorithms.
177
178
```java { .api }
179
// Link analysis helper functions in org.apache.flink.graph.library.link_analysis.Functions
180
public class Functions {
181
// Utility functions for PageRank and HITS algorithms
182
}
183
```
184
185
## Clustering Algorithms
186
187
Algorithms for computing clustering coefficients and community structure.
188
189
### Directed Graph Clustering
190
191
Clustering algorithms specifically designed for directed graphs.
192
193
```java { .api }
194
// Directed clustering algorithms in org.apache.flink.graph.library.clustering.directed
195
public class LocalClusteringCoefficient<K, VV, EV> implements GraphAlgorithm<K, VV, EV, DataSet<Vertex<K, Double>>>
196
public class GlobalClusteringCoefficient<K, VV, EV> implements GraphAlgorithm<K, VV, EV, Double>
197
public class TriadicCensus<K, VV, EV> implements GraphAlgorithm<K, VV, EV, DataSet<Tuple4<Integer, Integer, Integer, LongValue>>>
198
```
199
200
### Undirected Graph Clustering
201
202
Clustering algorithms for undirected graphs.
203
204
```java { .api }
205
// Undirected clustering algorithms in org.apache.flink.graph.library.clustering.undirected
206
public class LocalClusteringCoefficient<K, VV, EV> implements GraphAlgorithm<K, VV, EV, DataSet<Vertex<K, Double>>>
207
public class GlobalClusteringCoefficient<K, VV, EV> implements GraphAlgorithm<K, VV, EV, Double>
208
public class TriangleListing<K, VV, EV> implements GraphAlgorithm<K, VV, EV, DataSet<Tuple3<K, K, K>>>
209
```
210
211
**Usage Example:**
212
213
```java
214
// Compute local clustering coefficient for each vertex
215
LocalClusteringCoefficient<Long, String, Double> lcc =
216
new org.apache.flink.graph.library.clustering.undirected.LocalClusteringCoefficient<>();
217
DataSet<Vertex<Long, Double>> coefficients = graph.run(lcc);
218
219
// Compute global clustering coefficient for the entire graph
220
GlobalClusteringCoefficient<Long, String, Double> gcc =
221
new org.apache.flink.graph.library.clustering.undirected.GlobalClusteringCoefficient<>();
222
Double globalCoefficient = graph.run(gcc);
223
```
224
225
## Metric Algorithms
226
227
Algorithms for computing various graph metrics and statistics.
228
229
### Directed Graph Metrics
230
231
```java { .api }
232
// Directed graph metrics in org.apache.flink.graph.library.metric.directed
233
public class EdgeMetrics<K, VV, EV> implements GraphAnalytic<K, VV, EV, EdgeMetrics.Result>
234
public class VertexMetrics<K, VV, EV> implements GraphAnalytic<K, VV, EV, VertexMetrics.Result>
235
```
236
237
### Undirected Graph Metrics
238
239
```java { .api }
240
// Undirected graph metrics in org.apache.flink.graph.library.metric.undirected
241
public class EdgeMetrics<K, VV, EV> implements GraphAnalytic<K, VV, EV, EdgeMetrics.Result>
242
public class VertexMetrics<K, VV, EV> implements GraphAnalytic<K, VV, EV, VertexMetrics.Result>
243
```
244
245
**Usage Example:**
246
247
```java
248
// Compute vertex metrics for undirected graph
249
VertexMetrics<Long, String, Double> vertexMetrics =
250
new org.apache.flink.graph.library.metric.undirected.VertexMetrics<>();
251
VertexMetrics.Result result = graph.run(vertexMetrics);
252
253
// Access various metrics
254
System.out.println("Number of vertices: " + result.getNumberOfVertices());
255
System.out.println("Number of edges: " + result.getNumberOfEdges());
256
System.out.println("Average degree: " + result.getAverageDegree());
257
System.out.println("Density: " + result.getDensity());
258
```
259
260
## Similarity Algorithms
261
262
Algorithms for computing graph and vertex similarity measures.
263
264
```java { .api }
265
// Similarity algorithms in org.apache.flink.graph.library.similarity
266
// Various similarity measures and algorithms
267
```
268
269
## Custom Algorithm Development
270
271
### Creating Custom Algorithms
272
273
Implement the `GraphAlgorithm` interface to create custom algorithms:
274
275
```java
276
public class CustomAlgorithm<K, VV, EV> implements GraphAlgorithm<K, VV, EV, DataSet<MyResult>> {
277
278
private final int parameter;
279
280
public CustomAlgorithm(int parameter) {
281
this.parameter = parameter;
282
}
283
284
@Override
285
public DataSet<MyResult> run(Graph<K, VV, EV> input) throws Exception {
286
// Algorithm implementation
287
return input.getVertices()
288
.map(new MyMapFunction(parameter))
289
.returns(MyResult.class);
290
}
291
}
292
293
// Usage
294
CustomAlgorithm<Long, String, Double> custom = new CustomAlgorithm<>(42);
295
DataSet<MyResult> result = graph.run(custom);
296
```
297
298
### Algorithm Configuration
299
300
Many algorithms support configuration options:
301
302
```java
303
// Algorithms with convergence thresholds
304
PageRank<Long, String, Double> pr = new PageRank<>(0.85, 0.0001); // threshold-based
305
306
// Algorithms with iteration limits
307
ConnectedComponents<Long, String, Double> cc = new ConnectedComponents<>(100); // max iterations
308
309
// Algorithms with multiple parameters
310
CommunityDetection<Long, String, Double> cd = new CommunityDetection<>(0.5, 50); // threshold + iterations
311
```
312
313
### Performance Considerations
314
315
- **Graph Preprocessing**: Some algorithms benefit from graph preprocessing (e.g., removing self-loops)
316
- **Data Types**: Use appropriate data types for vertex/edge values (primitive wrappers vs objects)
317
- **Memory Management**: Configure Flink memory settings for large graphs
318
- **Parallelism**: Set appropriate parallelism levels for algorithm execution
319
- **Checkpointing**: Enable checkpointing for fault tolerance in long-running algorithms