Apache Flink Java API - Core Java APIs for Apache Flink's batch and stream processing framework
npx @tessl/cli install tessl/maven-org-apache-flink--flink-java@1.20.00
# Apache Flink Java API (flink-java)
1
2
Apache Flink Java API provides core Java classes and interfaces for developing batch processing applications using DataSet transformations. This module offers comprehensive APIs for data transformation, aggregation, I/O operations, and execution environments within the Apache Flink ecosystem.
3
4
**⚠️ Deprecation Notice**: The entire DataSet API has been deprecated since Flink 1.18 and will be removed in a future major version. Users are encouraged to migrate to the DataStream API or Table API for new applications.
5
6
## Package Information
7
8
- **Package Name**: org.apache.flink:flink-java
9
- **Package Type**: maven
10
- **Language**: Java
11
- **Installation**: Add to your Maven pom.xml:
12
13
```xml
14
<dependency>
15
<groupId>org.apache.flink</groupId>
16
<artifactId>flink-java</artifactId>
17
<version>1.20.2</version>
18
</dependency>
19
```
20
21
## Core Imports
22
23
```java
24
// Core execution environment
25
import org.apache.flink.api.java.ExecutionEnvironment;
26
import org.apache.flink.api.java.DataSet;
27
28
// Common transformation functions
29
import org.apache.flink.api.common.functions.MapFunction;
30
import org.apache.flink.api.common.functions.FilterFunction;
31
import org.apache.flink.api.common.functions.ReduceFunction;
32
```
33
34
## Basic Usage
35
36
```java
37
import org.apache.flink.api.java.ExecutionEnvironment;
38
import org.apache.flink.api.java.DataSet;
39
import org.apache.flink.api.common.functions.MapFunction;
40
41
public class FlinkBatchExample {
42
public static void main(String[] args) throws Exception {
43
// Create execution environment
44
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
45
46
// Create a DataSet from a collection
47
DataSet<String> input = env.fromElements("hello", "world", "flink", "java");
48
49
// Apply transformations
50
DataSet<String> result = input
51
.map(new MapFunction<String, String>() {
52
@Override
53
public String map(String value) {
54
return value.toUpperCase();
55
}
56
})
57
.filter(value -> value.length() > 4);
58
59
// Execute and print results
60
result.print();
61
62
// Execute the program
63
env.execute("Batch Processing Example");
64
}
65
}
66
```
67
68
## Architecture
69
70
The Flink Java API is built around several key architectural components:
71
72
- **ExecutionEnvironment**: Central context for program execution, providing data source creation and execution control
73
- **DataSet<T>**: Primary data abstraction representing distributed collections of typed elements
74
- **Operators**: Transformation operations (map, filter, reduce, join) that can be chained together to form processing pipelines
75
- **Data Sources and Sinks**: Input and output connectors for reading from and writing to various data stores
76
- **Type System**: Rich type information system ensuring type safety across transformations
77
- **Execution Graph**: Lazy evaluation model where operations are planned and optimized before execution
78
79
## Capabilities
80
81
### Execution Environments
82
83
Core execution contexts for running Flink batch programs, providing methods to create data sources, configure execution parameters, and trigger program execution.
84
85
```java { .api }
86
// Get execution environment (auto-detects local vs remote)
87
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
88
89
// Create local execution environment
90
LocalEnvironment localEnv = ExecutionEnvironment.createLocalEnvironment();
91
92
// Create remote execution environment
93
RemoteEnvironment remoteEnv = ExecutionEnvironment.createRemoteEnvironment(
94
String host, int port, String... jarFiles);
95
```
96
97
[Execution Environments](./execution-environments.md)
98
99
### DataSet Operations
100
101
The primary data abstraction for batch processing, providing a comprehensive set of transformation operations for distributed data processing.
102
103
```java { .api }
104
// Core transformation methods
105
<R> MapOperator<T, R> map(MapFunction<T, R> mapper);
106
<R> FlatMapOperator<T, R> flatMap(FlatMapFunction<T, R> flatter);
107
FilterOperator<T> filter(FilterFunction<T> filter);
108
ReduceOperator<T> reduce(ReduceFunction<T> reducer);
109
DistinctOperator<T> distinct();
110
111
// Grouping and aggregation
112
UnsortedGrouping<T> groupBy(int... fields);
113
UnsortedGrouping<T> groupBy(String... fields);
114
<K> UnsortedGrouping<T> groupBy(KeySelector<T, K> keyExtractor);
115
```
116
117
[DataSet Operations](./dataset-operations.md)
118
119
### Join and CoGroup Operations
120
121
Advanced operations for combining multiple DataSets using various join strategies and coGroup operations.
122
123
```java { .api }
124
// Join operations
125
<R> JoinOperatorSets<T, R> join(DataSet<R> other);
126
<R> CoGroupOperator.CoGroupOperatorSets<T, R> coGroup(DataSet<R> other);
127
<R> CrossOperator.DefaultCross<T, R> cross(DataSet<R> other);
128
129
// Union operations
130
UnionOperator<T> union(DataSet<T> other);
131
```
132
133
[Join and CoGroup Operations](./join-cogroup-operations.md)
134
135
### Data Input and Output
136
137
Comprehensive I/O capabilities for reading from and writing to various data formats and storage systems.
138
139
```java { .api }
140
// Data source creation
141
<T> DataSet<T> fromCollection(Collection<T> data);
142
<T> DataSet<T> fromElements(T... data);
143
DataSet<String> readTextFile(String filePath);
144
CsvReader readCsvFile(String filePath);
145
146
// Data output operations
147
DataSink<T> writeAsText(String filePath);
148
DataSink<T> writeAsCsv(String filePath);
149
DataSink<T> print();
150
```
151
152
[Data Input and Output](./data-input-output.md)
153
154
### Aggregation and Grouping
155
156
Built-in aggregation functions and grouping operations for statistical computations and data summarization.
157
158
```java { .api }
159
// Grouping operations
160
UnsortedGrouping<T> groupBy(int... fields);
161
SortedGrouping<T> sortGroup(int field, Order order);
162
163
// Aggregation operations
164
AggregateOperator<T> sum(int field);
165
AggregateOperator<T> min(int field);
166
AggregateOperator<T> max(int field);
167
AggregateOperator<T> aggregate(Aggregations agg, int field);
168
```
169
170
[Aggregation and Grouping](./aggregation-grouping.md)
171
172
### Iteration Operations
173
174
Support for iterative algorithms including bulk iterations and delta iterations for machine learning and graph processing algorithms.
175
176
```java { .api }
177
// Bulk iteration
178
IterativeDataSet<T> iterate(int maxIterations);
179
DataSet<T> closeWith(DataSet<T> iterationResult);
180
181
// Delta iteration
182
DeltaIteration<T, ?> iterateDelta(DataSet<?> workset, int maxIterations, int... keyFields);
183
```
184
185
[Iteration Operations](./iteration-operations.md)
186
187
### Utility Functions
188
189
Helper utilities for common operations, parameter handling, and data set manipulation.
190
191
```java { .api }
192
// Parameter handling
193
ParameterTool fromArgs(String[] args);
194
ParameterTool fromPropertiesFile(String path);
195
196
// DataSet utilities
197
DataSet<Tuple2<Long, T>> zipWithIndex(DataSet<T> input);
198
DataSet<T> sample(DataSet<T> input, boolean withReplacement, double fraction);
199
```
200
201
[Utility Functions](./utility-functions.md)