0
# Apache Flink Streaming Scala API
1
2
Apache Flink Streaming Scala API provides elegant and fluent Scala APIs for building high-throughput, low-latency stream processing applications with fault-tolerance and exactly-once processing guarantees. This library wraps Flink's Java DataStream API with Scala-idiomatic interfaces, offering type-safe streaming data processing with functional programming constructs.
3
4
## Package Information
5
6
- **Package Name**: flink-streaming-scala_2.11
7
- **Package Type**: maven
8
- **Language**: Scala
9
- **Installation**: Add to your Maven `pom.xml`:
10
11
```xml
12
<dependency>
13
<groupId>org.apache.flink</groupId>
14
<artifactId>flink-streaming-scala_2.11</artifactId>
15
<version>1.14.6</version>
16
</dependency>
17
```
18
19
For SBT:
20
```scala
21
libraryDependencies += "org.apache.flink" %% "flink-streaming-scala_2.11" % "1.14.6"
22
```
23
24
## Core Imports
25
26
```scala
27
import org.apache.flink.streaming.api.scala._
28
import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment
29
```
30
31
## Basic Usage
32
33
```scala
34
import org.apache.flink.streaming.api.scala._
35
36
// Create execution environment
37
val env = StreamExecutionEnvironment.getExecutionEnvironment
38
39
// Create a data stream from a collection
40
val dataStream = env.fromCollection(List(1, 2, 3, 4, 5))
41
42
// Transform the data
43
val result = dataStream
44
.map(_ * 2)
45
.filter(_ > 5)
46
.keyBy(identity)
47
.sum(0)
48
49
// Add sink and execute
50
result.print()
51
env.execute("Basic Flink Job")
52
```
53
54
## Architecture
55
56
The Flink Streaming Scala API is built around several key components:
57
58
- **Execution Environment**: Entry point for creating and configuring streaming jobs
59
- **DataStream**: Core abstraction for unbounded streams of data with transformation operations
60
- **Keyed Streams**: Partitioned streams enabling stateful operations and aggregations
61
- **Windowing**: Time-based and count-based grouping of stream elements
62
- **Functions**: User-defined functions for custom processing logic
63
- **Type System**: Automatic TypeInformation generation for Scala types
64
65
## Capabilities
66
67
### Stream Execution Environment
68
69
Main entry point for creating streaming applications and configuring execution parameters like parallelism, checkpointing, and state backends.
70
71
```scala { .api }
72
class StreamExecutionEnvironment(javaEnv: JavaEnv) {
73
def getExecutionEnvironment: StreamExecutionEnvironment
74
def setParallelism(parallelism: Int): Unit
75
def enableCheckpointing(interval: Long): StreamExecutionEnvironment
76
}
77
```
78
79
[Stream Execution Environment](./stream-execution-environment.md)
80
81
### Data Stream Operations
82
83
Core streaming operations for transforming, filtering, and processing unbounded data streams with type safety and functional programming patterns.
84
85
```scala { .api }
86
class DataStream[T] {
87
def map[R](mapper: T => R): DataStream[R]
88
def filter(predicate: T => Boolean): DataStream[T]
89
def keyBy[K](keySelector: T => K): KeyedStream[T, K]
90
def union(otherStreams: DataStream[T]*): DataStream[T]
91
}
92
```
93
94
[Data Stream](./data-stream.md)
95
96
### Keyed Stream Operations
97
98
Partitioned stream operations enabling stateful computations, aggregations, and key-based processing with state management.
99
100
```scala { .api }
101
class KeyedStream[T, K] {
102
def sum(field: Int): DataStream[T]
103
def reduce(reducer: (T, T) => T): DataStream[T]
104
def window[W <: Window](assigner: WindowAssigner[T, W]): WindowedStream[T, K, W]
105
def process[R](function: KeyedProcessFunction[K, T, R]): DataStream[R]
106
}
107
```
108
109
[Keyed Stream](./keyed-stream.md)
110
111
### Windowing Operations
112
113
Time-based and count-based grouping of stream elements for aggregations and computations over bounded sets of data.
114
115
```scala { .api }
116
class WindowedStream[T, K, W <: Window] {
117
def reduce(reducer: (T, T) => T): DataStream[T]
118
def aggregate[ACC, R](aggregateFunction: AggregateFunction[T, ACC, R]): DataStream[R]
119
def apply[R](function: WindowFunction[T, R, K, W]): DataStream[R]
120
}
121
```
122
123
[Windowing](./windowing.md)
124
125
### Stream Joining Operations
126
127
Operations for combining multiple data streams based on keys, time windows, or custom join conditions.
128
129
```scala { .api }
130
class JoinedStreams[T1, T2] {
131
def where[KEY](keySelector: T1 => KEY): Where[T1, T2, KEY]
132
def equalTo[KEY](keySelector: T2 => KEY): EqualTo[T1, T2, KEY]
133
def window[W <: Window](assigner: WindowAssigner[TaggedUnion[T1, T2], W]): WithWindow[T1, T2, KEY, W]
134
}
135
```
136
137
[Joining](./joining.md)
138
139
### Async Operations
140
141
Asynchronous I/O operations for non-blocking external system interactions with configurable timeouts and capacity management.
142
143
```scala { .api }
144
object AsyncDataStream {
145
def unorderedWait[IN, OUT](
146
stream: DataStream[IN],
147
function: AsyncFunction[IN, OUT],
148
timeout: Long,
149
timeUnit: TimeUnit
150
): DataStream[OUT]
151
}
152
```
153
154
[Async Operations](./async-operations.md)
155
156
### User-Defined Functions
157
158
Interfaces for creating custom processing functions including window functions, process functions, and rich functions with lifecycle management.
159
160
```scala { .api }
161
trait ProcessFunction[I, O] {
162
def processElement(value: I, ctx: ProcessFunction.Context, out: Collector[O]): Unit
163
}
164
165
trait WindowFunction[IN, OUT, KEY, W <: Window] {
166
def apply(key: KEY, window: W, input: Iterable[IN], out: Collector[OUT]): Unit
167
}
168
```
169
170
[Functions](./functions.md)
171
172
## Types
173
174
```scala { .api }
175
// Core stream types
176
class DataStream[T]
177
class KeyedStream[T, K]
178
class WindowedStream[T, K, W <: Window]
179
class AllWindowedStream[T, W <: Window]
180
class ConnectedStreams[T1, T2]
181
class BroadcastConnectedStream[IN1, IN2]
182
183
// Environment and configuration
184
class StreamExecutionEnvironment
185
class ExecutionConfig
186
class CheckpointConfig
187
188
// Output and utility types
189
class OutputTag[T]
190
trait CloseableIterator[T]
191
```