0
# Graph Construction
1
2
Complete API for creating graphs from various data sources.
3
4
## Factory Methods Overview
5
6
The `Graph` companion object provides multiple factory methods for creating graphs from different data sources:
7
8
- DataSets of vertices and edges
9
- Scala collections
10
- Tuple-based data
11
- CSV files with extensive configuration options
12
13
## DataSet-based Construction
14
15
### From Vertex and Edge DataSets
16
17
```scala { .api }
18
def fromDataSet[K: TypeInformation : ClassTag, VV: TypeInformation : ClassTag, EV: TypeInformation : ClassTag](
19
vertices: DataSet[Vertex[K, VV]],
20
edges: DataSet[Edge[K, EV]],
21
env: ExecutionEnvironment
22
): Graph[K, VV, EV]
23
```
24
25
Creates a graph from separate vertex and edge DataSets.
26
27
**Parameters:**
28
- `vertices` - DataSet containing graph vertices with IDs and values
29
- `edges` - DataSet containing graph edges with source, target, and values
30
- `env` - Flink execution environment
31
32
### From Edge DataSet Only
33
34
```scala { .api }
35
def fromDataSet[K: TypeInformation : ClassTag, EV: TypeInformation : ClassTag](
36
edges: DataSet[Edge[K, EV]],
37
env: ExecutionEnvironment
38
): Graph[K, NullValue, EV]
39
```
40
41
Creates a graph from edges only. Vertices are automatically created with `NullValue` as vertex values.
42
43
**Parameters:**
44
- `edges` - DataSet containing graph edges
45
- `env` - Flink execution environment
46
47
### From Edges with Vertex Initializer
48
49
```scala { .api }
50
def fromDataSet[K: TypeInformation : ClassTag, VV: TypeInformation : ClassTag, EV: TypeInformation : ClassTag](
51
edges: DataSet[Edge[K, EV]],
52
vertexValueInitializer: MapFunction[K, VV],
53
env: ExecutionEnvironment
54
): Graph[K, VV, EV]
55
```
56
57
Creates a graph from edges and initializes vertex values using a mapping function.
58
59
**Parameters:**
60
- `edges` - DataSet containing graph edges
61
- `vertexValueInitializer` - Function to initialize vertex values from vertex IDs
62
- `env` - Flink execution environment
63
64
## Collection-based Construction
65
66
### From Scala Collections
67
68
```scala { .api }
69
def fromCollection[K: TypeInformation : ClassTag, VV: TypeInformation : ClassTag, EV: TypeInformation : ClassTag](
70
vertices: Seq[Vertex[K, VV]],
71
edges: Seq[Edge[K, EV]],
72
env: ExecutionEnvironment
73
): Graph[K, VV, EV]
74
```
75
76
Creates a graph from Scala collections of vertices and edges.
77
78
```scala { .api }
79
def fromCollection[K: TypeInformation : ClassTag, EV: TypeInformation : ClassTag](
80
edges: Seq[Edge[K, EV]],
81
env: ExecutionEnvironment
82
): Graph[K, NullValue, EV]
83
```
84
85
Creates a graph from a collection of edges only.
86
87
```scala { .api }
88
def fromCollection[K: TypeInformation : ClassTag, VV: TypeInformation : ClassTag, EV: TypeInformation : ClassTag](
89
edges: Seq[Edge[K, EV]],
90
vertexValueInitializer: MapFunction[K, VV],
91
env: ExecutionEnvironment
92
): Graph[K, VV, EV]
93
```
94
95
Creates a graph from edges with vertex value initialization.
96
97
## Tuple-based Construction
98
99
### From Tuple DataSets
100
101
```scala { .api }
102
def fromTupleDataSet[K: TypeInformation : ClassTag, VV: TypeInformation : ClassTag, EV: TypeInformation : ClassTag](
103
vertices: DataSet[(K, VV)],
104
edges: DataSet[(K, K, EV)],
105
env: ExecutionEnvironment
106
): Graph[K, VV, EV]
107
```
108
109
Creates a graph from tuple DataSets where:
110
- Vertex tuples: `(vertexId, vertexValue)`
111
- Edge tuples: `(sourceId, targetId, edgeValue)`
112
113
```scala { .api }
114
def fromTupleDataSet[K: TypeInformation : ClassTag, EV: TypeInformation : ClassTag](
115
edges: DataSet[(K, K, EV)],
116
env: ExecutionEnvironment
117
): Graph[K, NullValue, EV]
118
```
119
120
Creates a graph from edge tuples only.
121
122
```scala { .api }
123
def fromTupleDataSet[K: TypeInformation : ClassTag, VV: TypeInformation : ClassTag, EV: TypeInformation : ClassTag](
124
edges: DataSet[(K, K, EV)],
125
vertexValueInitializer: MapFunction[K, VV],
126
env: ExecutionEnvironment
127
): Graph[K, VV, EV]
128
```
129
130
Creates a graph from edge tuples with vertex value initialization.
131
132
### From Tuple2 DataSets (No Edge Values)
133
134
```scala { .api }
135
def fromTuple2DataSet[K: TypeInformation : ClassTag](
136
edges: DataSet[(K, K)],
137
env: ExecutionEnvironment
138
): Graph[K, NullValue, NullValue]
139
```
140
141
Creates a graph from simple edge pairs with no values.
142
143
```scala { .api }
144
def fromTuple2DataSet[K: TypeInformation : ClassTag, VV: TypeInformation : ClassTag](
145
edges: DataSet[(K, K)],
146
vertexValueInitializer: MapFunction[K, VV],
147
env: ExecutionEnvironment
148
): Graph[K, VV, NullValue]
149
```
150
151
Creates a graph from edge pairs with vertex value initialization.
152
153
## CSV File Construction
154
155
```scala { .api }
156
def fromCsvReader[K: TypeInformation : ClassTag, VV: TypeInformation : ClassTag, EV: TypeInformation : ClassTag](
157
env: ExecutionEnvironment,
158
pathEdges: String,
159
pathVertices: String = null,
160
lineDelimiterVertices: String = "\n",
161
fieldDelimiterVertices: String = ",",
162
quoteCharacterVertices: Character = null,
163
ignoreFirstLineVertices: Boolean = false,
164
ignoreCommentsVertices: String = null,
165
lenientVertices: Boolean = false,
166
includedFieldsVertices: Array[Int] = null,
167
lineDelimiterEdges: String = "\n",
168
fieldDelimiterEdges: String = ",",
169
quoteCharacterEdges: Character = null,
170
ignoreFirstLineEdges: Boolean = false,
171
ignoreCommentsEdges: String = null,
172
lenientEdges: Boolean = false,
173
includedFieldsEdges: Array[Int] = null,
174
vertexValueInitializer: MapFunction[K, VV] = null
175
): Graph[K, VV, EV]
176
```
177
178
Creates a graph from CSV files with extensive configuration options.
179
180
**Parameters:**
181
- `env` - Flink execution environment
182
- `pathEdges` - File path containing the edges (required)
183
- `pathVertices` - File path containing the vertices (optional)
184
- `lineDelimiterVertices` - Line separator for vertices file (default: "\n")
185
- `fieldDelimiterVertices` - Field separator for vertices file (default: ",")
186
- `quoteCharacterVertices` - Quote character for vertices file parsing
187
- `ignoreFirstLineVertices` - Whether to skip first line in vertices file
188
- `ignoreCommentsVertices` - String prefix for comment lines to ignore in vertices file
189
- `lenientVertices` - Whether to silently ignore malformed lines in vertices file
190
- `includedFieldsVertices` - Array of field indices to read from vertices file
191
- `lineDelimiterEdges` - Line separator for edges file (default: "\n")
192
- `fieldDelimiterEdges` - Field separator for edges file (default: ",")
193
- `quoteCharacterEdges` - Quote character for edges file parsing
194
- `ignoreFirstLineEdges` - Whether to skip first line in edges file
195
- `ignoreCommentsEdges` - String prefix for comment lines to ignore in edges file
196
- `lenientEdges` - Whether to silently ignore malformed lines in edges file
197
- `includedFieldsEdges` - Array of field indices to read from edges file
198
- `vertexValueInitializer` - Function to initialize vertex values if no vertices file provided
199
200
## Usage Examples
201
202
### Basic Graph Creation
203
204
```scala
205
import org.apache.flink.api.scala._
206
import org.apache.flink.graph.scala._
207
import org.apache.flink.graph.{Edge, Vertex}
208
209
val env = ExecutionEnvironment.getExecutionEnvironment
210
211
// From collections
212
val vertices = List(
213
new Vertex(1L, "Node A"),
214
new Vertex(2L, "Node B"),
215
new Vertex(3L, "Node C")
216
)
217
218
val edges = List(
219
new Edge(1L, 2L, 1.0),
220
new Edge(2L, 3L, 2.0),
221
new Edge(1L, 3L, 3.0)
222
)
223
224
val graph = Graph.fromCollection(vertices, edges, env)
225
```
226
227
### From Tuples
228
229
```scala
230
val vertexTuples = env.fromCollection(List(
231
(1L, "A"),
232
(2L, "B"),
233
(3L, "C")
234
))
235
236
val edgeTuples = env.fromCollection(List(
237
(1L, 2L, 1.0),
238
(2L, 3L, 2.0)
239
))
240
241
val graph = Graph.fromTupleDataSet(vertexTuples, edgeTuples, env)
242
```
243
244
### From CSV Files
245
246
```scala
247
val graph = Graph.fromCsvReader[Long, String, Double](
248
env = env,
249
pathEdges = "path/to/edges.csv",
250
pathVertices = "path/to/vertices.csv",
251
fieldDelimiterEdges = "\t",
252
ignoreFirstLineEdges = true
253
)
254
```