or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/maven-org-apache-flink--flink-java

Apache Flink Java API - Core Java APIs for Apache Flink's batch and stream processing framework

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
mavenpkg:maven/org.apache.flink/flink-java@1.20.x

To install, run

npx @tessl/cli install tessl/maven-org-apache-flink--flink-java@1.20.0

0

# Apache Flink Java API (flink-java)

1

2

Apache Flink Java API provides core Java classes and interfaces for developing batch processing applications using DataSet transformations. This module offers comprehensive APIs for data transformation, aggregation, I/O operations, and execution environments within the Apache Flink ecosystem.

3

4

**⚠️ Deprecation Notice**: The entire DataSet API has been deprecated since Flink 1.18 and will be removed in a future major version. Users are encouraged to migrate to the DataStream API or Table API for new applications.

5

6

## Package Information

7

8

- **Package Name**: org.apache.flink:flink-java

9

- **Package Type**: maven

10

- **Language**: Java

11

- **Installation**: Add to your Maven pom.xml:

12

13

```xml

14

<dependency>

15

<groupId>org.apache.flink</groupId>

16

<artifactId>flink-java</artifactId>

17

<version>1.20.2</version>

18

</dependency>

19

```

20

21

## Core Imports

22

23

```java

24

// Core execution environment

25

import org.apache.flink.api.java.ExecutionEnvironment;

26

import org.apache.flink.api.java.DataSet;

27

28

// Common transformation functions

29

import org.apache.flink.api.common.functions.MapFunction;

30

import org.apache.flink.api.common.functions.FilterFunction;

31

import org.apache.flink.api.common.functions.ReduceFunction;

32

```

33

34

## Basic Usage

35

36

```java

37

import org.apache.flink.api.java.ExecutionEnvironment;

38

import org.apache.flink.api.java.DataSet;

39

import org.apache.flink.api.common.functions.MapFunction;

40

41

public class FlinkBatchExample {

42

public static void main(String[] args) throws Exception {

43

// Create execution environment

44

ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

45

46

// Create a DataSet from a collection

47

DataSet<String> input = env.fromElements("hello", "world", "flink", "java");

48

49

// Apply transformations

50

DataSet<String> result = input

51

.map(new MapFunction<String, String>() {

52

@Override

53

public String map(String value) {

54

return value.toUpperCase();

55

}

56

})

57

.filter(value -> value.length() > 4);

58

59

// Execute and print results

60

result.print();

61

62

// Execute the program

63

env.execute("Batch Processing Example");

64

}

65

}

66

```

67

68

## Architecture

69

70

The Flink Java API is built around several key architectural components:

71

72

- **ExecutionEnvironment**: Central context for program execution, providing data source creation and execution control

73

- **DataSet<T>**: Primary data abstraction representing distributed collections of typed elements

74

- **Operators**: Transformation operations (map, filter, reduce, join) that can be chained together to form processing pipelines

75

- **Data Sources and Sinks**: Input and output connectors for reading from and writing to various data stores

76

- **Type System**: Rich type information system ensuring type safety across transformations

77

- **Execution Graph**: Lazy evaluation model where operations are planned and optimized before execution

78

79

## Capabilities

80

81

### Execution Environments

82

83

Core execution contexts for running Flink batch programs, providing methods to create data sources, configure execution parameters, and trigger program execution.

84

85

```java { .api }

86

// Get execution environment (auto-detects local vs remote)

87

ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

88

89

// Create local execution environment

90

LocalEnvironment localEnv = ExecutionEnvironment.createLocalEnvironment();

91

92

// Create remote execution environment

93

RemoteEnvironment remoteEnv = ExecutionEnvironment.createRemoteEnvironment(

94

String host, int port, String... jarFiles);

95

```

96

97

[Execution Environments](./execution-environments.md)

98

99

### DataSet Operations

100

101

The primary data abstraction for batch processing, providing a comprehensive set of transformation operations for distributed data processing.

102

103

```java { .api }

104

// Core transformation methods

105

<R> MapOperator<T, R> map(MapFunction<T, R> mapper);

106

<R> FlatMapOperator<T, R> flatMap(FlatMapFunction<T, R> flatter);

107

FilterOperator<T> filter(FilterFunction<T> filter);

108

ReduceOperator<T> reduce(ReduceFunction<T> reducer);

109

DistinctOperator<T> distinct();

110

111

// Grouping and aggregation

112

UnsortedGrouping<T> groupBy(int... fields);

113

UnsortedGrouping<T> groupBy(String... fields);

114

<K> UnsortedGrouping<T> groupBy(KeySelector<T, K> keyExtractor);

115

```

116

117

[DataSet Operations](./dataset-operations.md)

118

119

### Join and CoGroup Operations

120

121

Advanced operations for combining multiple DataSets using various join strategies and coGroup operations.

122

123

```java { .api }

124

// Join operations

125

<R> JoinOperatorSets<T, R> join(DataSet<R> other);

126

<R> CoGroupOperator.CoGroupOperatorSets<T, R> coGroup(DataSet<R> other);

127

<R> CrossOperator.DefaultCross<T, R> cross(DataSet<R> other);

128

129

// Union operations

130

UnionOperator<T> union(DataSet<T> other);

131

```

132

133

[Join and CoGroup Operations](./join-cogroup-operations.md)

134

135

### Data Input and Output

136

137

Comprehensive I/O capabilities for reading from and writing to various data formats and storage systems.

138

139

```java { .api }

140

// Data source creation

141

<T> DataSet<T> fromCollection(Collection<T> data);

142

<T> DataSet<T> fromElements(T... data);

143

DataSet<String> readTextFile(String filePath);

144

CsvReader readCsvFile(String filePath);

145

146

// Data output operations

147

DataSink<T> writeAsText(String filePath);

148

DataSink<T> writeAsCsv(String filePath);

149

DataSink<T> print();

150

```

151

152

[Data Input and Output](./data-input-output.md)

153

154

### Aggregation and Grouping

155

156

Built-in aggregation functions and grouping operations for statistical computations and data summarization.

157

158

```java { .api }

159

// Grouping operations

160

UnsortedGrouping<T> groupBy(int... fields);

161

SortedGrouping<T> sortGroup(int field, Order order);

162

163

// Aggregation operations

164

AggregateOperator<T> sum(int field);

165

AggregateOperator<T> min(int field);

166

AggregateOperator<T> max(int field);

167

AggregateOperator<T> aggregate(Aggregations agg, int field);

168

```

169

170

[Aggregation and Grouping](./aggregation-grouping.md)

171

172

### Iteration Operations

173

174

Support for iterative algorithms including bulk iterations and delta iterations for machine learning and graph processing algorithms.

175

176

```java { .api }

177

// Bulk iteration

178

IterativeDataSet<T> iterate(int maxIterations);

179

DataSet<T> closeWith(DataSet<T> iterationResult);

180

181

// Delta iteration

182

DeltaIteration<T, ?> iterateDelta(DataSet<?> workset, int maxIterations, int... keyFields);

183

```

184

185

[Iteration Operations](./iteration-operations.md)

186

187

### Utility Functions

188

189

Helper utilities for common operations, parameter handling, and data set manipulation.

190

191

```java { .api }

192

// Parameter handling

193

ParameterTool fromArgs(String[] args);

194

ParameterTool fromPropertiesFile(String path);

195

196

// DataSet utilities

197

DataSet<Tuple2<Long, T>> zipWithIndex(DataSet<T> input);

198

DataSet<T> sample(DataSet<T> input, boolean withReplacement, double fraction);

199

```

200

201

[Utility Functions](./utility-functions.md)