or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/maven-org-apache-flink--flink-gelly-2-10

Gelly: Flink Graph API - A comprehensive graph processing library for Apache Flink

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
mavenpkg:maven/org.apache.flink/flink-gelly_2.10@1.3.x

To install, run

npx @tessl/cli install tessl/maven-org-apache-flink--flink-gelly-2-10@1.3.0

0

# Apache Flink Gelly

1

2

Gelly is Apache Flink's Graph API library that provides a comprehensive set of methods and utilities for developing graph analysis applications on the Flink platform. It enables developers to create, transform, and modify graphs using high-level functions similar to batch processing APIs, supporting both vertex-centric and edge-centric operations.

3

4

## Package Information

5

6

- **Package Name**: flink-gelly_2.10

7

- **Package Type**: maven

8

- **Language**: Java (Scala 2.10)

9

- **Group ID**: org.apache.flink

10

- **Artifact ID**: flink-gelly_2.10

11

- **Version**: 1.3.3

12

- **Installation**: Add to your Maven dependencies:

13

14

```xml

15

<dependency>

16

<groupId>org.apache.flink</groupId>

17

<artifactId>flink-gelly_2.10</artifactId>

18

<version>1.3.3</version>

19

</dependency>

20

```

21

22

## Core Imports

23

24

```java

25

import org.apache.flink.graph.Graph;

26

import org.apache.flink.graph.Vertex;

27

import org.apache.flink.graph.Edge;

28

import org.apache.flink.api.java.ExecutionEnvironment;

29

```

30

31

For specific functionality:

32

33

```java

34

// Iterative processing models

35

import org.apache.flink.graph.pregel.ComputeFunction;

36

import org.apache.flink.graph.gsa.GatherFunction;

37

import org.apache.flink.graph.spargel.ScatterFunction;

38

39

// Algorithms

40

import org.apache.flink.graph.library.ConnectedComponents;

41

import org.apache.flink.graph.library.SingleSourceShortestPaths;

42

import org.apache.flink.graph.library.link_analysis.PageRank;

43

44

// Analytics

45

import org.apache.flink.graph.asm.degree.annotate.directed.VertexDegrees;

46

```

47

48

## Basic Usage

49

50

```java

51

import org.apache.flink.api.java.ExecutionEnvironment;

52

import org.apache.flink.graph.Graph;

53

import org.apache.flink.graph.Vertex;

54

import org.apache.flink.graph.Edge;

55

import org.apache.flink.types.NullValue;

56

import java.util.Arrays;

57

58

// Set up execution environment

59

ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

60

61

// Create vertices

62

List<Vertex<Long, Double>> vertices = Arrays.asList(

63

new Vertex<>(1L, 1.0),

64

new Vertex<>(2L, 2.0),

65

new Vertex<>(3L, 3.0)

66

);

67

68

// Create edges

69

List<Edge<Long, NullValue>> edges = Arrays.asList(

70

new Edge<>(1L, 2L, NullValue.getInstance()),

71

new Edge<>(2L, 3L, NullValue.getInstance()),

72

new Edge<>(3L, 1L, NullValue.getInstance())

73

);

74

75

// Create graph

76

Graph<Long, Double, NullValue> graph = Graph.fromCollection(vertices, edges, env);

77

78

// Basic operations

79

DataSet<Vertex<Long, Double>> filteredVertices = graph

80

.filterOnVertices(vertex -> vertex.getValue() > 1.5)

81

.getVertices();

82

83

// Execute

84

env.execute("Basic Graph Example");

85

```

86

87

## Architecture

88

89

Gelly follows a layered architecture built on Flink's DataSet API:

90

91

- **Graph Representation**: Core `Graph<K, VV, EV>` class with generic types for keys (K), vertex values (VV), and edge values (EV)

92

- **Data Types**: `Vertex<K, V>` and `Edge<K, V>` classes extending Flink tuples for efficient serialization

93

- **Iterative Processing**: Three computation models (Pregel, Scatter-Gather, GSA) for distributed graph algorithms

94

- **Algorithm Library**: Pre-implemented graph algorithms (shortest paths, PageRank, clustering, etc.)

95

- **Analytics Framework**: ASM (Analytics, Statistics, Metrics) components for graph analysis

96

- **Type System**: Rich translation capabilities supporting custom vertex/edge types and ID transformations

97

98

This design enables scalable graph processing on distributed Flink clusters while maintaining type safety and integration with Flink's broader ecosystem.

99

100

## Capabilities

101

102

### Graph Creation and Transformation

103

104

Core functionality for creating graphs from various data sources (collections, DataSets, CSV files) and transforming graph structure and data. Includes vertex and edge filtering, subgraph extraction, and graph set operations.

105

106

```java { .api }

107

// Static factory methods

108

public static <K, VV, EV> Graph<K, VV, EV> fromCollection(

109

Collection<Vertex<K, VV>> vertices,

110

Collection<Edge<K, EV>> edges,

111

ExecutionEnvironment context)

112

113

public static <K, VV, EV> Graph<K, VV, EV> fromDataSet(

114

DataSet<Vertex<K, VV>> vertices,

115

DataSet<Edge<K, EV>> edges,

116

ExecutionEnvironment context)

117

118

public static <K, EV> Graph<K, NullValue, EV> fromDataSet(

119

DataSet<Edge<K, EV>> edges,

120

ExecutionEnvironment context)

121

122

// Transformation methods

123

public <NV> Graph<K, NV, EV> mapVertices(MapFunction<Vertex<K, VV>, NV> mapper)

124

public <NE> Graph<K, VV, NE> mapEdges(MapFunction<Edge<K, EV>, NE> mapper)

125

public Graph<K, VV, EV> filterOnVertices(FilterFunction<Vertex<K, VV>> vertexFilter)

126

public Graph<K, VV, EV> filterOnEdges(FilterFunction<Edge<K, EV>> edgeFilter)

127

public Graph<K, VV, EV> subgraph(

128

FilterFunction<Vertex<K, VV>> vertexFilter,

129

FilterFunction<Edge<K, EV>> edgeFilter)

130

```

131

132

[Graph Creation and Transformation](./graph-creation.md)

133

134

### Iterative Processing Models

135

136

Three distributed iterative computation models for implementing graph algorithms: Vertex-centric (Pregel), Scatter-Gather, and Gather-Sum-Apply. Each model provides different abstractions for message-passing graph computations.

137

138

```java { .api }

139

// Vertex-centric (Pregel) iteration

140

public <M> Graph<K, VV, EV> runVertexCentricIteration(

141

ComputeFunction<K, VV, EV, M> computeFunction,

142

MessageCombiner<K, M> combiner,

143

int maxIterations)

144

145

// Scatter-Gather iteration

146

public <M> Graph<K, VV, EV> runScatterGatherIteration(

147

ScatterFunction<K, VV, M, EV> scatterFunction,

148

GatherFunction<K, VV, M> gatherFunction,

149

int maxIterations)

150

151

// Gather-Sum-Apply iteration

152

public <M> Graph<K, VV, EV> runGatherSumApplyIteration(

153

GatherFunction<VV, EV, M> gatherFunction,

154

SumFunction<VV, EV, M> sumFunction,

155

ApplyFunction<K, VV, M> applyFunction,

156

int maxIterations)

157

```

158

159

[Iterative Processing Models](./iterative-processing.md)

160

161

### Graph Algorithms

162

163

Pre-implemented algorithms for common graph analysis tasks including shortest paths, PageRank, connected components, community detection, and clustering. Algorithms are optimized for distributed execution on Flink clusters.

164

165

```java { .api }

166

// Algorithm execution

167

public <T> T run(GraphAlgorithm<K, VV, EV, T> algorithm)

168

169

// Example algorithm constructors

170

public ConnectedComponents(int maxIterations)

171

public SingleSourceShortestPaths(K srcVertexId, Integer maxIterations)

172

public PageRank(double dampingFactor, int maxIterations)

173

public GSAConnectedComponents(int maxIterations)

174

public CommunityDetection(double deltaThreshold, int maxIterations)

175

```

176

177

[Graph Algorithms](./algorithms.md)

178

179

### Graph Analytics and Metrics

180

181

Comprehensive analytics framework (ASM) for computing graph statistics, degree distributions, and structural metrics. Includes both directed and undirected graph analytics with efficient accumulator-based result collection.

182

183

```java { .api }

184

// Analytics execution

185

public <T> T run(GraphAnalytic<K, VV, EV, T> analytic)

186

187

// Example analytics

188

public DataSet<Tuple2<K, LongValue>> inDegrees()

189

public DataSet<Tuple2<K, LongValue>> outDegrees()

190

public DataSet<Tuple2<K, LongValue>> getDegrees()

191

public long numberOfVertices()

192

public long numberOfEdges()

193

194

// ASM analytics interface

195

public interface DataSetAnalytic<T, R> {

196

R getResult();

197

R execute();

198

DataSetAnalytic<T, R> run(DataSet<T> input);

199

}

200

```

201

202

[Graph Analytics and Metrics](./analytics.md)

203

204

### Graph Generators

205

206

Utilities for creating synthetic graphs for testing, benchmarking, and algorithm development. Includes complete graphs, random graphs, paths, hypercubes, and configurable generators with various probability distributions.

207

208

```java { .api }

209

// Generator base class

210

public abstract class AbstractGraphGenerator<K, VV, EV> {

211

public Graph<K, VV, EV> generate();

212

public AbstractGraphGenerator<K, VV, EV> setParallelism(int parallelism);

213

}

214

215

// Specific generators

216

public CompleteGraph(ExecutionEnvironment env, long numVertices)

217

public EchoGraph(ExecutionEnvironment env, long numVertices)

218

public EmptyGraph(ExecutionEnvironment env, long numVertices)

219

public PathGraph(ExecutionEnvironment env, long numVertices)

220

public HypercubeGraph(ExecutionEnvironment env, long dimensions)

221

```

222

223

[Graph Generators](./generators.md)

224

225

### Data Access and Utilities

226

227

Methods for accessing graph data, computing neighborhoods, performing joins, and integrating with Flink's DataSet API. Includes graph validation, format conversion, and utility functions for common operations.

228

229

```java { .api }

230

// Data access

231

public DataSet<Vertex<K, VV>> getVertices()

232

public DataSet<Edge<K, EV>> getEdges()

233

public DataSet<Triplet<K, VV, EV>> getTriplets()

234

public DataSet<K> getVertexIds()

235

public DataSet<Tuple2<K, K>> getEdgeIds()

236

237

// Neighborhood operations

238

public <T> DataSet<T> groupReduceOnEdges(

239

EdgesFunction<K, EV, T> edgesFunction,

240

EdgeDirection direction)

241

public <T> DataSet<T> groupReduceOnNeighbors(

242

NeighborsFunction<K, VV, EV, T> neighborsFunction,

243

EdgeDirection direction)

244

245

// Graph validation

246

public Graph<K, VV, EV> validate(GraphValidator<K, VV, EV> validator)

247

```

248

249

[Data Access and Utilities](./data-access.md)