0
# Apache Flink Gelly Examples
1
2
Apache Flink Gelly Examples is a comprehensive collection of graph processing examples and algorithms built on Apache Flink's Gelly graph processing library. It provides both a unified command-line driver framework for executing graph algorithms and individual algorithm implementations for programmatic usage. The library includes implementations of fundamental graph algorithms like PageRank, Single Source Shortest Paths, Connected Components, and various graph generators for testing and benchmarking.
3
4
## Package Information
5
6
- **Package Name**: flink-gelly-examples_2.10
7
- **Package Type**: Maven
8
- **Language**: Java and Scala
9
- **Installation**: Add Maven dependency:
10
```xml
11
<dependency>
12
<groupId>org.apache.flink</groupId>
13
<artifactId>flink-gelly-examples_2.10</artifactId>
14
<version>1.3.3</version>
15
</dependency>
16
```
17
18
## Core Imports
19
20
Main entry point for command-line execution:
21
22
```java
23
import org.apache.flink.graph.Runner;
24
```
25
26
For programmatic usage of drivers:
27
28
```java
29
import org.apache.flink.graph.drivers.PageRank;
30
import org.apache.flink.graph.drivers.ConnectedComponents;
31
import org.apache.flink.graph.drivers.input.RMatGraph;
32
import org.apache.flink.graph.drivers.input.CSV;
33
```
34
35
Legacy example algorithms:
36
37
```java
38
import org.apache.flink.graph.examples.PageRank;
39
import org.apache.flink.graph.examples.SingleSourceShortestPaths;
40
```
41
42
## Basic Usage
43
44
Command-line execution using the unified Runner:
45
46
```bash
47
# Run PageRank on a complete graph with 1000 vertices
48
flink run flink-gelly-examples_2.10-1.3.3.jar \
49
--algorithm PageRank \
50
--input CompleteGraph --vertex_count 1000 \
51
--output print
52
53
# Run Connected Components on CSV data
54
flink run flink-gelly-examples_2.10-1.3.3.jar \
55
--algorithm ConnectedComponents \
56
--input CSV --input_filename graph.csv \
57
--output csv --output_filename results.csv
58
```
59
60
Programmatic usage with drivers:
61
62
```java
63
import org.apache.flink.api.java.ExecutionEnvironment;
64
import org.apache.flink.graph.Graph;
65
import org.apache.flink.graph.drivers.PageRank;
66
import org.apache.flink.graph.drivers.input.CompleteGraph;
67
68
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
69
70
// Create a complete graph with 100 vertices
71
CompleteGraph input = new CompleteGraph();
72
// Configure input parameters...
73
Graph<LongValue, NullValue, NullValue> graph = input.create(env);
74
75
// Run PageRank algorithm
76
PageRank<LongValue, NullValue, NullValue> pagerank = new PageRank<>();
77
// Configure algorithm parameters...
78
pagerank.plan(graph);
79
pagerank.print("PageRank Results");
80
```
81
82
## Architecture
83
84
Flink Gelly Examples is organized around several key architectural components:
85
86
- **Driver Framework**: Unified execution system (`Runner`) that coordinates input sources, algorithms, and output formatters through a consistent command-line interface
87
- **Algorithm Drivers**: Reusable graph algorithm implementations that can be executed individually or through the Runner framework
88
- **Input Generators**: Configurable graph generators (R-MAT, grid, complete graphs, etc.) and file readers for creating test graphs
89
- **Output Formatters**: Pluggable output system supporting CSV files, console output, and hash verification
90
- **Parameter System**: Type-safe parameter configuration framework for command-line argument parsing and validation
91
- **Legacy Examples**: Standalone example implementations demonstrating different Flink Gelly programming models
92
93
## Capabilities
94
95
### Command-Line Execution Framework
96
97
Unified driver system for executing graph algorithms with configurable inputs and outputs. Provides consistent parameter handling and usage help.
98
99
```java { .api }
100
public class Runner {
101
public static void main(String[] args) throws Exception;
102
}
103
```
104
105
[Runner Framework](./runner-framework.md)
106
107
### Graph Algorithm Drivers
108
109
Collection of ready-to-use graph algorithm implementations including PageRank, HITS, Connected Components, clustering coefficients, and similarity measures.
110
111
```java { .api }
112
public interface Driver<K, VV, EV> extends Parameterized {
113
String getShortDescription();
114
String getLongDescription();
115
void plan(Graph<K, VV, EV> graph) throws Exception;
116
}
117
```
118
119
[Algorithm Drivers](./algorithm-drivers.md)
120
121
### Graph Input Generation
122
123
Comprehensive set of graph generators and file readers for creating test graphs, including random generators (R-MAT), regular structures (grids, complete graphs), and CSV file input.
124
125
```java { .api }
126
public interface Input<K, VV, EV> extends Parameterized {
127
String getIdentity();
128
Graph<K, VV, EV> create(ExecutionEnvironment env) throws Exception;
129
}
130
```
131
132
[Input Sources](./input-sources.md)
133
134
### Parameter Configuration System
135
136
Type-safe parameter framework supporting various parameter types (long, double, boolean, choice, string) with validation and command-line parsing.
137
138
```java { .api }
139
public interface Parameterized {
140
String getName();
141
String getUsage();
142
void configure(ParameterTool parameterTool) throws ProgramParametrizationException;
143
}
144
```
145
146
[Parameter System](./parameter-system.md)
147
148
### Legacy Example Implementations
149
150
Standalone example implementations demonstrating different Flink Gelly programming models including scatter-gather, gather-sum-apply (GSA), and vertex-centric (Pregel) approaches.
151
152
```java { .api }
153
public interface GraphAlgorithm<K, VV, EV, T> extends Serializable {
154
DataSet<T> run(Graph<K, VV, EV> input) throws Exception;
155
}
156
```
157
158
[Legacy Examples](./legacy-examples.md)
159
160
## Types
161
162
Core Flink and Gelly types used throughout the API:
163
164
```java { .api }
165
// Flink Graph API
166
class Graph<K, VV, EV> {
167
// Graph operations and transformations
168
}
169
170
class Vertex<K, VV> {
171
public K getId();
172
public VV getValue();
173
}
174
175
class Edge<K, EV> {
176
public K getSource();
177
public K getTarget();
178
public EV getValue();
179
}
180
181
// Flink execution environment
182
class ExecutionEnvironment {
183
public static ExecutionEnvironment getExecutionEnvironment();
184
}
185
186
// Flink value types for serialization efficiency
187
class LongValue implements CopyableValue<LongValue> {
188
public LongValue(long value);
189
public long getValue();
190
public void setValue(long value);
191
}
192
193
class NullValue implements Value {
194
public static final NullValue getInstance();
195
}
196
197
// Additional Flink value types for type translation
198
class IntValue implements CopyableValue<IntValue> {
199
public IntValue(int value);
200
public int getValue();
201
}
202
203
class StringValue implements CopyableValue<StringValue> {
204
public StringValue(String value);
205
public String getValue();
206
}
207
208
class ByteValue implements CopyableValue<ByteValue> {
209
public ByteValue(byte value);
210
public byte getValue();
211
}
212
213
class ShortValue implements CopyableValue<ShortValue> {
214
public ShortValue(short value);
215
public short getValue();
216
}
217
218
// Result wrapper for algorithm outputs
219
interface PrintableResult {
220
String toString();
221
}
222
223
// Result class for algorithm outputs with vertex ID and computed value
224
class Result<K> implements PrintableResult {
225
public K getVertexId0();
226
public K getVertexId1(); // for pairwise results
227
public Double getScore(); // computed result value
228
}
229
230
// Parameter parsing
231
class ParameterTool {
232
public static ParameterTool fromArgs(String[] args);
233
public String get(String key);
234
public String getRequired(String key) throws RuntimeException;
235
public boolean has(String key);
236
}
237
238
// Exception types
239
class ProgramParametrizationException extends Exception {
240
public ProgramParametrizationException(String message);
241
}
242
```