Gelly: Flink Graph API - A comprehensive graph processing library for Apache Flink
npx @tessl/cli install tessl/maven-org-apache-flink--flink-gelly-2-10@1.3.00
# Apache Flink Gelly
1
2
Gelly is Apache Flink's Graph API library that provides a comprehensive set of methods and utilities for developing graph analysis applications on the Flink platform. It enables developers to create, transform, and modify graphs using high-level functions similar to batch processing APIs, supporting both vertex-centric and edge-centric operations.
3
4
## Package Information
5
6
- **Package Name**: flink-gelly_2.10
7
- **Package Type**: maven
8
- **Language**: Java (Scala 2.10)
9
- **Group ID**: org.apache.flink
10
- **Artifact ID**: flink-gelly_2.10
11
- **Version**: 1.3.3
12
- **Installation**: Add to your Maven dependencies:
13
14
```xml
15
<dependency>
16
<groupId>org.apache.flink</groupId>
17
<artifactId>flink-gelly_2.10</artifactId>
18
<version>1.3.3</version>
19
</dependency>
20
```
21
22
## Core Imports
23
24
```java
25
import org.apache.flink.graph.Graph;
26
import org.apache.flink.graph.Vertex;
27
import org.apache.flink.graph.Edge;
28
import org.apache.flink.api.java.ExecutionEnvironment;
29
```
30
31
For specific functionality:
32
33
```java
34
// Iterative processing models
35
import org.apache.flink.graph.pregel.ComputeFunction;
36
import org.apache.flink.graph.gsa.GatherFunction;
37
import org.apache.flink.graph.spargel.ScatterFunction;
38
39
// Algorithms
40
import org.apache.flink.graph.library.ConnectedComponents;
41
import org.apache.flink.graph.library.SingleSourceShortestPaths;
42
import org.apache.flink.graph.library.link_analysis.PageRank;
43
44
// Analytics
45
import org.apache.flink.graph.asm.degree.annotate.directed.VertexDegrees;
46
```
47
48
## Basic Usage
49
50
```java
51
import org.apache.flink.api.java.ExecutionEnvironment;
52
import org.apache.flink.graph.Graph;
53
import org.apache.flink.graph.Vertex;
54
import org.apache.flink.graph.Edge;
55
import org.apache.flink.types.NullValue;
56
import java.util.Arrays;
57
58
// Set up execution environment
59
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
60
61
// Create vertices
62
List<Vertex<Long, Double>> vertices = Arrays.asList(
63
new Vertex<>(1L, 1.0),
64
new Vertex<>(2L, 2.0),
65
new Vertex<>(3L, 3.0)
66
);
67
68
// Create edges
69
List<Edge<Long, NullValue>> edges = Arrays.asList(
70
new Edge<>(1L, 2L, NullValue.getInstance()),
71
new Edge<>(2L, 3L, NullValue.getInstance()),
72
new Edge<>(3L, 1L, NullValue.getInstance())
73
);
74
75
// Create graph
76
Graph<Long, Double, NullValue> graph = Graph.fromCollection(vertices, edges, env);
77
78
// Basic operations
79
DataSet<Vertex<Long, Double>> filteredVertices = graph
80
.filterOnVertices(vertex -> vertex.getValue() > 1.5)
81
.getVertices();
82
83
// Execute
84
env.execute("Basic Graph Example");
85
```
86
87
## Architecture
88
89
Gelly follows a layered architecture built on Flink's DataSet API:
90
91
- **Graph Representation**: Core `Graph<K, VV, EV>` class with generic types for keys (K), vertex values (VV), and edge values (EV)
92
- **Data Types**: `Vertex<K, V>` and `Edge<K, V>` classes extending Flink tuples for efficient serialization
93
- **Iterative Processing**: Three computation models (Pregel, Scatter-Gather, GSA) for distributed graph algorithms
94
- **Algorithm Library**: Pre-implemented graph algorithms (shortest paths, PageRank, clustering, etc.)
95
- **Analytics Framework**: ASM (Analytics, Statistics, Metrics) components for graph analysis
96
- **Type System**: Rich translation capabilities supporting custom vertex/edge types and ID transformations
97
98
This design enables scalable graph processing on distributed Flink clusters while maintaining type safety and integration with Flink's broader ecosystem.
99
100
## Capabilities
101
102
### Graph Creation and Transformation
103
104
Core functionality for creating graphs from various data sources (collections, DataSets, CSV files) and transforming graph structure and data. Includes vertex and edge filtering, subgraph extraction, and graph set operations.
105
106
```java { .api }
107
// Static factory methods
108
public static <K, VV, EV> Graph<K, VV, EV> fromCollection(
109
Collection<Vertex<K, VV>> vertices,
110
Collection<Edge<K, EV>> edges,
111
ExecutionEnvironment context)
112
113
public static <K, VV, EV> Graph<K, VV, EV> fromDataSet(
114
DataSet<Vertex<K, VV>> vertices,
115
DataSet<Edge<K, EV>> edges,
116
ExecutionEnvironment context)
117
118
public static <K, EV> Graph<K, NullValue, EV> fromDataSet(
119
DataSet<Edge<K, EV>> edges,
120
ExecutionEnvironment context)
121
122
// Transformation methods
123
public <NV> Graph<K, NV, EV> mapVertices(MapFunction<Vertex<K, VV>, NV> mapper)
124
public <NE> Graph<K, VV, NE> mapEdges(MapFunction<Edge<K, EV>, NE> mapper)
125
public Graph<K, VV, EV> filterOnVertices(FilterFunction<Vertex<K, VV>> vertexFilter)
126
public Graph<K, VV, EV> filterOnEdges(FilterFunction<Edge<K, EV>> edgeFilter)
127
public Graph<K, VV, EV> subgraph(
128
FilterFunction<Vertex<K, VV>> vertexFilter,
129
FilterFunction<Edge<K, EV>> edgeFilter)
130
```
131
132
[Graph Creation and Transformation](./graph-creation.md)
133
134
### Iterative Processing Models
135
136
Three distributed iterative computation models for implementing graph algorithms: Vertex-centric (Pregel), Scatter-Gather, and Gather-Sum-Apply. Each model provides different abstractions for message-passing graph computations.
137
138
```java { .api }
139
// Vertex-centric (Pregel) iteration
140
public <M> Graph<K, VV, EV> runVertexCentricIteration(
141
ComputeFunction<K, VV, EV, M> computeFunction,
142
MessageCombiner<K, M> combiner,
143
int maxIterations)
144
145
// Scatter-Gather iteration
146
public <M> Graph<K, VV, EV> runScatterGatherIteration(
147
ScatterFunction<K, VV, M, EV> scatterFunction,
148
GatherFunction<K, VV, M> gatherFunction,
149
int maxIterations)
150
151
// Gather-Sum-Apply iteration
152
public <M> Graph<K, VV, EV> runGatherSumApplyIteration(
153
GatherFunction<VV, EV, M> gatherFunction,
154
SumFunction<VV, EV, M> sumFunction,
155
ApplyFunction<K, VV, M> applyFunction,
156
int maxIterations)
157
```
158
159
[Iterative Processing Models](./iterative-processing.md)
160
161
### Graph Algorithms
162
163
Pre-implemented algorithms for common graph analysis tasks including shortest paths, PageRank, connected components, community detection, and clustering. Algorithms are optimized for distributed execution on Flink clusters.
164
165
```java { .api }
166
// Algorithm execution
167
public <T> T run(GraphAlgorithm<K, VV, EV, T> algorithm)
168
169
// Example algorithm constructors
170
public ConnectedComponents(int maxIterations)
171
public SingleSourceShortestPaths(K srcVertexId, Integer maxIterations)
172
public PageRank(double dampingFactor, int maxIterations)
173
public GSAConnectedComponents(int maxIterations)
174
public CommunityDetection(double deltaThreshold, int maxIterations)
175
```
176
177
[Graph Algorithms](./algorithms.md)
178
179
### Graph Analytics and Metrics
180
181
Comprehensive analytics framework (ASM) for computing graph statistics, degree distributions, and structural metrics. Includes both directed and undirected graph analytics with efficient accumulator-based result collection.
182
183
```java { .api }
184
// Analytics execution
185
public <T> T run(GraphAnalytic<K, VV, EV, T> analytic)
186
187
// Example analytics
188
public DataSet<Tuple2<K, LongValue>> inDegrees()
189
public DataSet<Tuple2<K, LongValue>> outDegrees()
190
public DataSet<Tuple2<K, LongValue>> getDegrees()
191
public long numberOfVertices()
192
public long numberOfEdges()
193
194
// ASM analytics interface
195
public interface DataSetAnalytic<T, R> {
196
R getResult();
197
R execute();
198
DataSetAnalytic<T, R> run(DataSet<T> input);
199
}
200
```
201
202
[Graph Analytics and Metrics](./analytics.md)
203
204
### Graph Generators
205
206
Utilities for creating synthetic graphs for testing, benchmarking, and algorithm development. Includes complete graphs, random graphs, paths, hypercubes, and configurable generators with various probability distributions.
207
208
```java { .api }
209
// Generator base class
210
public abstract class AbstractGraphGenerator<K, VV, EV> {
211
public Graph<K, VV, EV> generate();
212
public AbstractGraphGenerator<K, VV, EV> setParallelism(int parallelism);
213
}
214
215
// Specific generators
216
public CompleteGraph(ExecutionEnvironment env, long numVertices)
217
public EchoGraph(ExecutionEnvironment env, long numVertices)
218
public EmptyGraph(ExecutionEnvironment env, long numVertices)
219
public PathGraph(ExecutionEnvironment env, long numVertices)
220
public HypercubeGraph(ExecutionEnvironment env, long dimensions)
221
```
222
223
[Graph Generators](./generators.md)
224
225
### Data Access and Utilities
226
227
Methods for accessing graph data, computing neighborhoods, performing joins, and integrating with Flink's DataSet API. Includes graph validation, format conversion, and utility functions for common operations.
228
229
```java { .api }
230
// Data access
231
public DataSet<Vertex<K, VV>> getVertices()
232
public DataSet<Edge<K, EV>> getEdges()
233
public DataSet<Triplet<K, VV, EV>> getTriplets()
234
public DataSet<K> getVertexIds()
235
public DataSet<Tuple2<K, K>> getEdgeIds()
236
237
// Neighborhood operations
238
public <T> DataSet<T> groupReduceOnEdges(
239
EdgesFunction<K, EV, T> edgesFunction,
240
EdgeDirection direction)
241
public <T> DataSet<T> groupReduceOnNeighbors(
242
NeighborsFunction<K, VV, EV, T> neighborsFunction,
243
EdgeDirection direction)
244
245
// Graph validation
246
public Graph<K, VV, EV> validate(GraphValidator<K, VV, EV> validator)
247
```
248
249
[Data Access and Utilities](./data-access.md)