0
# Input Sources
1
2
Input sources provide various ways to create graphs for algorithm execution, including file readers and procedural graph generators. All input sources implement the `Input` interface and support configurable parameters through the unified parameter system.
3
4
## Capabilities
5
6
### Input Interface
7
8
Base interface for all graph input sources.
9
10
```java { .api }
11
/**
12
* Input source for a Graph, such as a file reader or graph generator
13
* Produces graphs for algorithm processing
14
*/
15
public interface Input<K, VV, EV> extends Parameterized {
16
/**
17
* Human-readable identifier summarizing the input and configuration
18
* @return Unique identifier string for this input configuration
19
*/
20
String getIdentity();
21
22
/**
23
* Create the input graph using the specified execution environment
24
* @param env Flink ExecutionEnvironment for graph creation
25
* @return Generated or loaded graph ready for algorithm processing
26
* @throws Exception if graph creation fails
27
*/
28
Graph<K, VV, EV> create(ExecutionEnvironment env) throws Exception;
29
}
30
```
31
32
### File-Based Input Sources
33
34
#### CSV Input
35
36
Reads graphs from CSV files with configurable key types and delimiters.
37
38
```java { .api }
39
/**
40
* CSV file input source with configurable key types and parsing options
41
* Supports integer, long, and string vertex IDs
42
*/
43
public class CSV<K extends Comparable<K>> extends ParameterizedBase
44
implements Input<K, NullValue, NullValue> {
45
46
public String getName(); // "CSV"
47
public String getIdentity(); // includes filename and configuration
48
public Graph<K, NullValue, NullValue> create(ExecutionEnvironment env) throws Exception;
49
}
50
```
51
52
**Configuration Parameters:**
53
- `type` (ChoiceParameter): Vertex ID type - "integer", "long", or "string" (default: "integer")
54
- `input_filename` (StringParameter): Path to CSV file (required)
55
- `comment_prefix` (StringParameter): Comment line prefix to skip (default: "#")
56
- `input_line_delimiter` (StringParameter): Line delimiter (default: system default)
57
- `input_field_delimiter` (StringParameter): Field delimiter (default: system default)
58
59
**CSV Format:**
60
```
61
# Comments start with specified prefix
62
source_vertex,target_vertex
63
1,2
64
2,3
65
3,1
66
```
67
68
**Usage Examples:**
69
```bash
70
# Read integer vertex IDs from CSV
71
--input CSV --input_filename graph.csv --type integer
72
73
# Read string vertex IDs with custom delimiters
74
--input CSV --input_filename graph.txt --type string \
75
--input_field_delimiter ";" --comment_prefix "//"
76
```
77
78
### Graph Generators
79
80
#### Generated Graph Base Class
81
82
Abstract base class for procedural graph generators with type translation support.
83
84
```java { .api }
85
/**
86
* Abstract base class for generated graphs with automatic type translation
87
* Supports output in integer, long, string, and native Flink value types
88
*/
89
public abstract class GeneratedGraph<K> extends ParameterizedBase
90
implements Input<K, NullValue, NullValue> {
91
92
/**
93
* Create a graph with automatic type translation based on configured type parameter
94
* Calls generate() internally and translates vertex IDs to requested type
95
* @param env Flink ExecutionEnvironment for graph creation
96
* @return Graph with vertex IDs in the requested type
97
* @throws Exception if graph generation or translation fails
98
*/
99
public Graph<K, NullValue, NullValue> create(ExecutionEnvironment env) throws Exception;
100
101
/**
102
* Get the human-readable name of the configured type
103
* @return Capitalized type name (e.g., "Integer", "Long", "String")
104
*/
105
public String getTypeName();
106
107
/**
108
* Generate the graph with native LongValue vertex IDs (implemented by subclasses)
109
* @param env Flink ExecutionEnvironment
110
* @return Generated graph with LongValue vertex IDs
111
* @throws Exception if generation fails
112
*/
113
protected abstract Graph<LongValue, NullValue, NullValue> generate(ExecutionEnvironment env) throws Exception;
114
115
/**
116
* Return vertex count for validation and metrics (implemented by subclasses)
117
* @return Number of vertices in the generated graph
118
*/
119
protected abstract long vertexCount();
120
121
/**
122
* Check if vertex count matches expected value and log discrepancies
123
* @param graph Generated graph to validate
124
* @throws Exception if validation fails
125
*/
126
protected void checkVertexCount(Graph<?, ?, ?> graph) throws Exception;
127
}
128
```
129
130
**Configuration Parameters:**
131
- `type` (ChoiceParameter): Output type - "integer", "long", "string", plus native types (default varies by generator)
132
133
#### Generated Multi-Graph Base Class
134
135
Extended base class for graphs supporting simplification and multiple edge handling.
136
137
```java { .api }
138
/**
139
* Extended base class for graphs that may contain multiple edges or self-loops
140
* Provides automatic simplification options to remove parallel edges and self-loops
141
*/
142
public abstract class GeneratedMultiGraph<K extends Comparable<K>> extends GeneratedGraph<K> {
143
144
/**
145
* Create graph with optional simplification based on configured parameters
146
* Can remove parallel edges, self-loops, or both based on simplify parameter
147
* @param env Flink ExecutionEnvironment for graph creation
148
* @return Graph with optional simplification applied
149
* @throws Exception if graph generation or simplification fails
150
*/
151
public Graph<K, NullValue, NullValue> create(ExecutionEnvironment env) throws Exception;
152
153
/**
154
* Get short description of simplification settings for identity string
155
* @return Abbreviated simplification description (e.g., "undirected", "simple")
156
*/
157
protected String getSimplifyShortString();
158
}
159
```
160
161
**Additional Configuration Parameters:**
162
- `simplify` (Simplify): Graph simplification options - remove parallel edges, self-loops, or both (default: no simplification)
163
164
#### R-MAT Graph Generator
165
166
Generates R-MAT (Recursive Matrix) scale-free random graphs with configurable skew parameters.
167
168
```java { .api }
169
/**
170
* R-MAT graph generator for scale-free random graphs
171
* Uses recursive matrix approach with configurable skew parameters
172
*/
173
public class RMatGraph extends GeneratedMultiGraph<LongValue> {
174
175
public String getName(); // "RMatGraph"
176
public String getIdentity(); // includes scale, edge factor, and parameters
177
public Graph<LongValue, NullValue, NullValue> generate(ExecutionEnvironment env) throws Exception;
178
}
179
```
180
181
**Configuration Parameters:**
182
- `scale` (LongParameter): Generate 2^scale vertices (default: 10, minimum: 1)
183
- `edge_factor` (LongParameter): Generate edgeFactor * 2^scale edges (default: 16, minimum: 1)
184
- `a`, `b`, `c` (DoubleParameter): R-MAT matrix skew parameters (defaults from generator)
185
- `noise_enabled` (BooleanParameter): Enable noise perturbation (default: false)
186
- `noise` (DoubleParameter): Noise level when enabled (default from generator, range: 0.0-2.0)
187
- `seed` (LongParameter): Random seed for reproducible generation
188
- `little_parallelism` (LongParameter): Parallelism setting
189
190
**Usage Examples:**
191
```bash
192
# Generate R-MAT graph with 2^15 vertices and edge factor 8
193
--input RMatGraph --scale 15 --edge_factor 8 --seed 42
194
195
# R-MAT with custom skew parameters and noise
196
--input RMatGraph --scale 12 --a 0.6 --b 0.2 --c 0.15 \
197
--noise_enabled true --noise 0.1
198
```
199
200
#### Complete Graph Generator
201
202
Generates complete graphs where every vertex is connected to every other vertex.
203
204
```java { .api }
205
/**
206
* Complete graph generator creating fully connected graphs
207
* Generates N vertices with N*(N-1)/2 edges (undirected) or N*(N-1) edges (directed)
208
*/
209
public class CompleteGraph extends GeneratedGraph<LongValue> {
210
211
public String getName(); // "CompleteGraph"
212
public String getIdentity(); // includes vertex count
213
public Graph<LongValue, NullValue, NullValue> generate(ExecutionEnvironment env) throws Exception;
214
}
215
```
216
217
**Configuration Parameters:**
218
- `vertex_count` (LongParameter): Number of vertices (minimum: implementation-defined)
219
- `little_parallelism` (LongParameter): Parallelism setting (default: system default)
220
221
**Usage Example:**
222
```bash
223
# Generate complete graph with 500 vertices
224
--input CompleteGraph --vertex_count 500
225
```
226
227
#### Grid Graph Generator
228
229
Generates multi-dimensional grid graphs with configurable dimensions and endpoint wrapping.
230
231
```java { .api }
232
/**
233
* Multi-dimensional grid graph generator
234
* Supports arbitrary dimensions with optional endpoint wrapping per dimension
235
*/
236
public class GridGraph extends GeneratedGraph<LongValue> {
237
238
public String getName(); // "GridGraph"
239
public String getIdentity(); // includes dimension configuration
240
public String getUsage(); // custom usage for dynamic dimensions
241
public void configure(ParameterTool parameterTool) throws ProgramParametrizationException;
242
public Graph<LongValue, NullValue, NullValue> generate(ExecutionEnvironment env) throws Exception;
243
}
244
```
245
246
**Configuration Parameters:**
247
- Dynamic dimension parameters: `dim0`, `dim1`, `dim2`, etc. with format "size:wrap_endpoints"
248
- `little_parallelism` (LongParameter): Parallelism setting
249
250
**Dimension Format:** `"<size>:<wrap>"` where:
251
- `size`: Number of vertices in this dimension
252
- `wrap`: Boolean indicating whether endpoints connect (true/false)
253
254
**Usage Examples:**
255
```bash
256
# 2D grid: 10x10 with wrapping on both dimensions (torus)
257
--input GridGraph --dim0 "10:true" --dim1 "10:true"
258
259
# 3D grid: 5x8x4 with no wrapping
260
--input GridGraph --dim0 "5:false" --dim1 "8:false" --dim2 "4:false"
261
```
262
263
#### Circulant Graph Generator
264
265
Generates circulant graphs with configurable offset ranges for neighbor connections.
266
267
```java { .api }
268
/**
269
* Circulant graph generator with configurable offset ranges
270
* Each vertex connects to neighbors at specified offset distances
271
*/
272
public class CirculantGraph extends GeneratedGraph<LongValue> {
273
274
public String getName(); // "CirculantGraph"
275
public String getIdentity(); // includes vertex count and ranges
276
public String getUsage(); // custom usage for dynamic ranges
277
public void configure(ParameterTool parameterTool) throws ProgramParametrizationException;
278
public Graph<LongValue, NullValue, NullValue> generate(ExecutionEnvironment env) throws Exception;
279
}
280
```
281
282
**Configuration Parameters:**
283
- `vertex_count` (LongParameter): Number of vertices in the circulant graph
284
- Dynamic range parameters: `range0`, `range1`, etc. with format "offset:length"
285
- `little_parallelism` (LongParameter): Parallelism setting
286
287
**Range Format:** `"<offset>:<length>"` where:
288
- `offset`: Starting offset distance for connections
289
- `length`: Number of consecutive offsets to include
290
291
**Usage Example:**
292
```bash
293
# Circulant graph with 100 vertices, connecting to neighbors at offsets 1-3 and 10-12
294
--input CirculantGraph --vertex_count 100 --range0 "1:3" --range1 "10:3"
295
```
296
297
#### Regular Structure Generators
298
299
##### Cycle Graph Generator
300
301
Generates cycle/ring graphs where vertices form a circular chain.
302
303
```java { .api }
304
/**
305
* Cycle graph generator creating ring topologies
306
* Each vertex connects to its two neighbors in a circular arrangement
307
*/
308
public class CycleGraph extends GeneratedGraph<LongValue> {
309
310
public String getName(); // "CycleGraph"
311
public String getIdentity(); // includes vertex count
312
public Graph<LongValue, NullValue, NullValue> generate(ExecutionEnvironment env) throws Exception;
313
}
314
```
315
316
**Configuration Parameters:**
317
- `vertex_count` (LongParameter): Number of vertices in the cycle
318
319
**Usage Example:**
320
```bash
321
--input CycleGraph --vertex_count 1000
322
```
323
324
##### Star Graph Generator
325
326
Generates star topology graphs with one central vertex connected to all others.
327
328
```java { .api }
329
/**
330
* Star graph generator creating star topologies
331
* One central vertex connects to all other vertices
332
*/
333
public class StarGraph extends GeneratedGraph<LongValue> {
334
335
public String getName(); // "StarGraph"
336
public String getIdentity(); // includes vertex count
337
public Graph<LongValue, NullValue, NullValue> generate(ExecutionEnvironment env) throws Exception;
338
}
339
```
340
341
**Configuration Parameters:**
342
- `vertex_count` (LongParameter): Total number of vertices (including center)
343
344
**Usage Example:**
345
```bash
346
--input StarGraph --vertex_count 5000
347
```
348
349
##### Path Graph Generator
350
351
Generates linear path graphs where vertices form a straight chain.
352
353
```java { .api }
354
/**
355
* Path graph generator creating linear chains
356
* Vertices connect in a single linear sequence
357
*/
358
public class PathGraph extends GeneratedGraph<LongValue> {
359
360
public String getName(); // "PathGraph"
361
public String getIdentity(); // includes vertex count
362
public Graph<LongValue, NullValue, NullValue> generate(ExecutionEnvironment env) throws Exception;
363
}
364
```
365
366
**Configuration Parameters:**
367
- `vertex_count` (LongParameter): Number of vertices in the path
368
369
**Usage Example:**
370
```bash
371
--input PathGraph --vertex_count 10000
372
```
373
374
##### Hypercube Graph Generator
375
376
Generates N-dimensional hypercube graphs.
377
378
```java { .api }
379
/**
380
* Hypercube graph generator creating N-dimensional hypercubes
381
* Vertices represent N-bit binary strings with edges between strings differing by one bit
382
*/
383
public class HypercubeGraph extends GeneratedGraph<LongValue> {
384
385
public String getName(); // "HypercubeGraph"
386
public String getIdentity(); // includes dimensions
387
public Graph<LongValue, NullValue, NullValue> generate(ExecutionEnvironment env) throws Exception;
388
}
389
```
390
391
**Configuration Parameters:**
392
- `dimensions` (LongParameter): Number of hypercube dimensions (generates 2^dimensions vertices)
393
394
**Usage Example:**
395
```bash
396
# 10-dimensional hypercube (1024 vertices)
397
--input HypercubeGraph --dimensions 10
398
```
399
400
#### Specialized Generators
401
402
##### Empty Graph Generator
403
404
Generates graphs with vertices but no edges.
405
406
```java { .api }
407
/**
408
* Empty graph generator creating graphs with no edges
409
* Useful for testing vertex-only algorithms
410
*/
411
public class EmptyGraph extends GeneratedGraph<LongValue> {
412
413
public String getName(); // "EmptyGraph"
414
public String getIdentity(); // includes vertex count
415
public Graph<LongValue, NullValue, NullValue> generate(ExecutionEnvironment env) throws Exception;
416
}
417
```
418
419
**Configuration Parameters:**
420
- `vertex_count` (LongParameter): Number of isolated vertices
421
422
##### Echo Graph Generator
423
424
Generates echo graphs with specified vertex degree.
425
426
```java { .api }
427
/**
428
* Echo graph generator with configurable vertex degree
429
* Each vertex has exactly the specified number of connections
430
*/
431
public class EchoGraph extends GeneratedGraph<LongValue> {
432
433
public String getName(); // "EchoGraph"
434
public String getIdentity(); // includes vertex count and degree
435
public Graph<LongValue, NullValue, NullValue> generate(ExecutionEnvironment env) throws Exception;
436
}
437
```
438
439
**Configuration Parameters:**
440
- `vertex_count` (LongParameter): Number of vertices
441
- `vertex_degree` (LongParameter): Degree for each vertex
442
443
##### Singleton Edge Graph Generator
444
445
Generates graphs composed only of singleton edges (vertex pairs).
446
447
```java { .api }
448
/**
449
* Singleton edge graph generator creating graphs with only disconnected edges
450
* Each edge connects exactly two vertices with no shared vertices between edges
451
*/
452
public class SingletonEdgeGraph extends GeneratedGraph<LongValue> {
453
454
public String getName(); // "SingletonEdgeGraph"
455
public String getIdentity(); // includes vertex pair count
456
public Graph<LongValue, NullValue, NullValue> generate(ExecutionEnvironment env) throws Exception;
457
}
458
```
459
460
**Configuration Parameters:**
461
- `vertex_pairs` (LongParameter): Number of vertex pairs (creates 2 * vertex_pairs vertices)
462
463
## Types
464
465
```java { .api }
466
// Core graph types
467
class Graph<K, VV, EV> {
468
// Graph structure and operations
469
}
470
471
// Flink value types for efficient serialization
472
class LongValue implements CopyableValue<LongValue> {
473
public LongValue(long value);
474
public long getValue();
475
}
476
477
class NullValue implements Value {
478
public static final NullValue getInstance();
479
}
480
481
// ExecutionEnvironment for graph creation
482
class ExecutionEnvironment {
483
public static ExecutionEnvironment getExecutionEnvironment();
484
// Graph creation and data source methods
485
}
486
487
// Parameter parsing
488
class ParameterTool {
489
public String get(String key);
490
public String getRequired(String key) throws RuntimeException;
491
public boolean has(String key);
492
}
493
```