or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

algorithms.mdanalytics.mddata-access.mdgenerators.mdgraph-creation.mdindex.mditerative-processing.md

algorithms.mddocs/

0

# Graph Algorithms

1

2

Gelly provides a comprehensive library of pre-implemented graph algorithms optimized for distributed execution on Flink clusters. These algorithms cover common graph analysis tasks including path finding, centrality analysis, clustering, and community detection.

3

4

## Capabilities

5

6

### Algorithm Execution Interface

7

8

All algorithms implement the `GraphAlgorithm` interface and are executed using the graph's `run` method.

9

10

```java { .api }

11

public interface GraphAlgorithm<K, VV, EV, T> {

12

T run(Graph<K, VV, EV> input) throws Exception;

13

}

14

15

// Execute algorithm on graph

16

public <T> T run(GraphAlgorithm<K, VV, EV, T> algorithm) throws Exception

17

```

18

19

### Connected Components

20

21

Find connected components in undirected graphs using iterative label propagation.

22

23

```java { .api }

24

public class ConnectedComponents<K, VV, EV> implements GraphAlgorithm<K, VV, EV, DataSet<Vertex<K, K>>> {

25

public ConnectedComponents(int maxIterations)

26

}

27

```

28

29

**Usage Example:**

30

31

```java

32

// Find connected components (max 10 iterations)

33

ConnectedComponents<Long, String, Double> cc = new ConnectedComponents<>(10);

34

DataSet<Vertex<Long, Long>> components = graph.run(cc);

35

36

// Each vertex will have the minimum vertex ID in its component as its value

37

components.print();

38

```

39

40

### Single Source Shortest Paths

41

42

Compute shortest paths from a source vertex to all reachable vertices using the Bellman-Ford algorithm.

43

44

```java { .api }

45

public class SingleSourceShortestPaths<K, VV> implements GraphAlgorithm<K, VV, Double, DataSet<Vertex<K, Double>>> {

46

public SingleSourceShortestPaths(K srcVertexId, Integer maxIterations)

47

}

48

```

49

50

**Usage Example:**

51

52

```java

53

// Find shortest paths from vertex 1 (max 10 iterations)

54

SingleSourceShortestPaths<Long, String> sssp = new SingleSourceShortestPaths<>(1L, 10);

55

DataSet<Vertex<Long, Double>> distances = graph.run(sssp);

56

57

// Each vertex will contain its shortest distance from source vertex 1

58

distances.print();

59

```

60

61

### GSA Single Source Shortest Paths

62

63

Alternative SSSP implementation using the Gather-Sum-Apply iteration model.

64

65

```java { .api }

66

public class GSASingleSourceShortestPaths<K, VV> implements GraphAlgorithm<K, VV, Double, DataSet<Vertex<K, Double>>> {

67

public GSASingleSourceShortestPaths(K srcVertexId, Integer maxIterations)

68

}

69

```

70

71

### PageRank

72

73

Compute PageRank centrality scores using the iterative power method.

74

75

```java { .api }

76

public class PageRank<K, VV, EV> implements GraphAlgorithm<K, VV, EV, DataSet<Vertex<K, Double>>> {

77

public PageRank(double dampingFactor, int maxIterations)

78

public PageRank(double dampingFactor, double convergenceThreshold)

79

}

80

```

81

82

**Usage Example:**

83

84

```java

85

// Run PageRank with damping factor 0.85 for 10 iterations

86

PageRank<Long, String, Double> pageRank = new PageRank<>(0.85, 10);

87

DataSet<Vertex<Long, Double>> ranks = graph.run(pageRank);

88

89

// Each vertex will contain its PageRank score

90

ranks.print();

91

92

// Run until convergence (threshold 0.0001)

93

PageRank<Long, String, Double> convergentPR = new PageRank<>(0.85, 0.0001);

94

DataSet<Vertex<Long, Double>> convergedRanks = graph.run(convergentPR);

95

```

96

97

### HITS Algorithm

98

99

Compute Hub and Authority scores using the Hyperlink-Induced Topic Search algorithm.

100

101

```java { .api }

102

public class HITS<K, VV, EV> implements GraphAlgorithm<K, VV, EV, DataSet<Vertex<K, Tuple2<Double, Double>>>> {

103

public HITS(int maxIterations)

104

public HITS(double convergenceThreshold)

105

}

106

```

107

108

**Usage Example:**

109

110

```java

111

// Run HITS for 10 iterations

112

HITS<Long, String, Double> hits = new HITS<>(10);

113

DataSet<Vertex<Long, Tuple2<Double, Double>>> scores = graph.run(hits);

114

115

// Each vertex contains (hub_score, authority_score)

116

scores.print();

117

```

118

119

### Community Detection

120

121

Detect communities using label propagation algorithm.

122

123

```java { .api }

124

public class CommunityDetection<K, VV, EV> implements GraphAlgorithm<K, VV, EV, DataSet<Vertex<K, K>>> {

125

public CommunityDetection(double deltaThreshold, int maxIterations)

126

}

127

```

128

129

**Usage Example:**

130

131

```java

132

// Detect communities with delta threshold 0.5 and max 20 iterations

133

CommunityDetection<Long, String, Double> cd = new CommunityDetection<>(0.5, 20);

134

DataSet<Vertex<Long, Long>> communities = graph.run(cd);

135

136

// Each vertex will contain its community ID

137

communities.print();

138

```

139

140

### Triangle Enumeration

141

142

Enumerate all triangles in the graph.

143

144

```java { .api }

145

public class TriangleEnumerator<K, VV, EV> implements GraphAlgorithm<K, VV, EV, DataSet<Tuple3<K, K, K>>> {

146

public TriangleEnumerator()

147

}

148

```

149

150

**Usage Example:**

151

152

```java

153

TriangleEnumerator<Long, String, Double> triangles = new TriangleEnumerator<>();

154

DataSet<Tuple3<Long, Long, Long>> allTriangles = graph.run(triangles);

155

156

// Each tuple contains three vertex IDs forming a triangle

157

allTriangles.print();

158

```

159

160

## Link Analysis Algorithms

161

162

Specialized algorithms for link analysis and web graph processing.

163

164

### PageRank Variants

165

166

```java { .api }

167

// Standard PageRank

168

public class PageRank<K, VV, EV> implements GraphAlgorithm<K, VV, EV, DataSet<Vertex<K, Double>>>

169

170

// GSA-based PageRank implementation

171

public class GSAPageRank<K, VV, EV> implements GraphAlgorithm<K, VV, EV, DataSet<Vertex<K, Double>>>

172

```

173

174

### Link Analysis Utilities

175

176

Helper functions and utilities for link analysis algorithms.

177

178

```java { .api }

179

// Link analysis helper functions in org.apache.flink.graph.library.link_analysis.Functions

180

public class Functions {

181

// Utility functions for PageRank and HITS algorithms

182

}

183

```

184

185

## Clustering Algorithms

186

187

Algorithms for computing clustering coefficients and community structure.

188

189

### Directed Graph Clustering

190

191

Clustering algorithms specifically designed for directed graphs.

192

193

```java { .api }

194

// Directed clustering algorithms in org.apache.flink.graph.library.clustering.directed

195

public class LocalClusteringCoefficient<K, VV, EV> implements GraphAlgorithm<K, VV, EV, DataSet<Vertex<K, Double>>>

196

public class GlobalClusteringCoefficient<K, VV, EV> implements GraphAlgorithm<K, VV, EV, Double>

197

public class TriadicCensus<K, VV, EV> implements GraphAlgorithm<K, VV, EV, DataSet<Tuple4<Integer, Integer, Integer, LongValue>>>

198

```

199

200

### Undirected Graph Clustering

201

202

Clustering algorithms for undirected graphs.

203

204

```java { .api }

205

// Undirected clustering algorithms in org.apache.flink.graph.library.clustering.undirected

206

public class LocalClusteringCoefficient<K, VV, EV> implements GraphAlgorithm<K, VV, EV, DataSet<Vertex<K, Double>>>

207

public class GlobalClusteringCoefficient<K, VV, EV> implements GraphAlgorithm<K, VV, EV, Double>

208

public class TriangleListing<K, VV, EV> implements GraphAlgorithm<K, VV, EV, DataSet<Tuple3<K, K, K>>>

209

```

210

211

**Usage Example:**

212

213

```java

214

// Compute local clustering coefficient for each vertex

215

LocalClusteringCoefficient<Long, String, Double> lcc =

216

new org.apache.flink.graph.library.clustering.undirected.LocalClusteringCoefficient<>();

217

DataSet<Vertex<Long, Double>> coefficients = graph.run(lcc);

218

219

// Compute global clustering coefficient for the entire graph

220

GlobalClusteringCoefficient<Long, String, Double> gcc =

221

new org.apache.flink.graph.library.clustering.undirected.GlobalClusteringCoefficient<>();

222

Double globalCoefficient = graph.run(gcc);

223

```

224

225

## Metric Algorithms

226

227

Algorithms for computing various graph metrics and statistics.

228

229

### Directed Graph Metrics

230

231

```java { .api }

232

// Directed graph metrics in org.apache.flink.graph.library.metric.directed

233

public class EdgeMetrics<K, VV, EV> implements GraphAnalytic<K, VV, EV, EdgeMetrics.Result>

234

public class VertexMetrics<K, VV, EV> implements GraphAnalytic<K, VV, EV, VertexMetrics.Result>

235

```

236

237

### Undirected Graph Metrics

238

239

```java { .api }

240

// Undirected graph metrics in org.apache.flink.graph.library.metric.undirected

241

public class EdgeMetrics<K, VV, EV> implements GraphAnalytic<K, VV, EV, EdgeMetrics.Result>

242

public class VertexMetrics<K, VV, EV> implements GraphAnalytic<K, VV, EV, VertexMetrics.Result>

243

```

244

245

**Usage Example:**

246

247

```java

248

// Compute vertex metrics for undirected graph

249

VertexMetrics<Long, String, Double> vertexMetrics =

250

new org.apache.flink.graph.library.metric.undirected.VertexMetrics<>();

251

VertexMetrics.Result result = graph.run(vertexMetrics);

252

253

// Access various metrics

254

System.out.println("Number of vertices: " + result.getNumberOfVertices());

255

System.out.println("Number of edges: " + result.getNumberOfEdges());

256

System.out.println("Average degree: " + result.getAverageDegree());

257

System.out.println("Density: " + result.getDensity());

258

```

259

260

## Similarity Algorithms

261

262

Algorithms for computing graph and vertex similarity measures.

263

264

```java { .api }

265

// Similarity algorithms in org.apache.flink.graph.library.similarity

266

// Various similarity measures and algorithms

267

```

268

269

## Custom Algorithm Development

270

271

### Creating Custom Algorithms

272

273

Implement the `GraphAlgorithm` interface to create custom algorithms:

274

275

```java

276

public class CustomAlgorithm<K, VV, EV> implements GraphAlgorithm<K, VV, EV, DataSet<MyResult>> {

277

278

private final int parameter;

279

280

public CustomAlgorithm(int parameter) {

281

this.parameter = parameter;

282

}

283

284

@Override

285

public DataSet<MyResult> run(Graph<K, VV, EV> input) throws Exception {

286

// Algorithm implementation

287

return input.getVertices()

288

.map(new MyMapFunction(parameter))

289

.returns(MyResult.class);

290

}

291

}

292

293

// Usage

294

CustomAlgorithm<Long, String, Double> custom = new CustomAlgorithm<>(42);

295

DataSet<MyResult> result = graph.run(custom);

296

```

297

298

### Algorithm Configuration

299

300

Many algorithms support configuration options:

301

302

```java

303

// Algorithms with convergence thresholds

304

PageRank<Long, String, Double> pr = new PageRank<>(0.85, 0.0001); // threshold-based

305

306

// Algorithms with iteration limits

307

ConnectedComponents<Long, String, Double> cc = new ConnectedComponents<>(100); // max iterations

308

309

// Algorithms with multiple parameters

310

CommunityDetection<Long, String, Double> cd = new CommunityDetection<>(0.5, 50); // threshold + iterations

311

```

312

313

### Performance Considerations

314

315

- **Graph Preprocessing**: Some algorithms benefit from graph preprocessing (e.g., removing self-loops)

316

- **Data Types**: Use appropriate data types for vertex/edge values (primitive wrappers vs objects)

317

- **Memory Management**: Configure Flink memory settings for large graphs

318

- **Parallelism**: Set appropriate parallelism levels for algorithm execution

319

- **Checkpointing**: Enable checkpointing for fault tolerance in long-running algorithms