Apache Flink is an open source stream processing framework with powerful stream- and batch-processing capabilities
npx @tessl/cli install tessl/maven-org-apache-flink--flink-parent@2.1.0Apache Flink is a distributed stream processing framework that provides unified batch and stream processing capabilities with low-latency, high-throughput data processing. It offers elegant APIs in Java for building streaming and batch applications, supports event time processing with exactly-once guarantees, provides flexible windowing mechanisms, and includes advanced features like fault tolerance, natural back-pressure, and custom memory management.
Add the appropriate Flink dependencies to your pom.xml:
DataStream API (Traditional):
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java</artifactId>
<version>2.1.0</version>
</dependency>DataStream API (New v2):
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-datastream-api</artifactId>
<version>2.1.0</version>
</dependency>Table API & SQL:
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-api-java</artifactId>
<version>2.1.0</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-runtime</artifactId>
<version>2.1.0</version>
</dependency>Core APIs:
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-core</artifactId>
<version>2.1.0</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-core-api</artifactId>
<version>2.1.0</version>
</dependency>For complete applications, also include:
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-clients</artifactId>
<version>2.1.0</version>
</dependency>For DataStream API (new v2):
import org.apache.flink.datastream.api.ExecutionEnvironment;
import org.apache.flink.datastream.api.stream.DataStream;
import org.apache.flink.datastream.api.function.OneInputStreamProcessFunction;For traditional DataStream API:
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.functions.ProcessFunction;For Table API:
import org.apache.flink.table.api.TableEnvironment;
import org.apache.flink.table.api.Table;import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.api.common.functions.MapFunction;
// Create execution environment
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
// Create a data stream from a source
DataStream<String> stream = env.socketTextStream("localhost", 9999);
// Transform the data
DataStream<String> transformed = stream
.map(new MapFunction<String, String>() {
@Override
public String map(String value) throws Exception {
return value.toUpperCase();
}
});
// Add a sink
transformed.print();
// Execute the program
env.execute("Basic Stream Processing");import org.apache.flink.table.api.TableEnvironment;
import org.apache.flink.table.api.Table;
// Create table environment
TableEnvironment tableEnv = TableEnvironment.create();
// Create table from source
tableEnv.executeSql("CREATE TABLE Orders (" +
"order_id BIGINT, " +
"product STRING, " +
"amount DECIMAL(10,2)" +
") WITH (...)");
// Query the table
Table result = tableEnv.sqlQuery(
"SELECT product, SUM(amount) as total_amount " +
"FROM Orders " +
"GROUP BY product"
);
// Execute and print results
result.execute().print();Apache Flink is built around several key architectural components:
StreamExecutionEnvironment, TableEnvironment)flink-streaming-java) and new v2 (flink-datastream-api)Fundamental function interfaces and type system that form the building blocks for all Flink applications. Includes user-defined functions, tuple system, and core abstractions.
// Core function interfaces
interface Function {}
interface MapFunction<T, O> extends Function {
O map(T value) throws Exception;
}
interface ReduceFunction<T> extends Function {
T reduce(T value1, T value2) throws Exception;
}
// Tuple system
class Tuple2<T0, T1> {
public T0 f0;
public T1 f1;
public Tuple2(T0 f0, T1 f1);
}Comprehensive state management API supporting both synchronous and asynchronous operations. Includes value state, list state, map state, and specialized state types for different use cases.
interface State {}
interface ValueState<T> extends State {
T value() throws Exception;
void update(T value) throws Exception;
}
interface ListState<T> extends State {
Iterable<T> get() throws Exception;
void add(T value) throws Exception;
}Next-generation DataStream API with improved type safety, better performance, and enhanced functionality. Provides streamlined programming model for stream processing applications.
interface ExecutionEnvironment {
<T> DataStream<T> fromSource(Source<T> source);
}
interface DataStream<T> {
<OUT> DataStream<OUT> process(OneInputStreamProcessFunction<T, OUT> function);
KeyedPartitionStream<K, T> keyBy(KeySelector<T, K> keySelector);
}Traditional DataStream API providing comprehensive stream processing capabilities with windowing, state management, and complex event processing features.
class StreamExecutionEnvironment {
static StreamExecutionEnvironment getExecutionEnvironment();
<T> DataStream<T> addSource(SourceFunction<T> function);
JobExecutionResult execute(String jobName) throws Exception;
}Declarative programming model for relational data processing with SQL support, catalog integration, and comprehensive type system for structured data operations.
interface TableEnvironment {
Table sqlQuery(String query);
TableResult executeSql(String statement);
Table from(String path);
}
interface Table {
Table select(Expression... fields);
Table where(Expression predicate);
TableResult execute();
}Complete windowing system for time-based and count-based data aggregation, supporting event time and processing time semantics with customizable triggers and evictors.
class TumblingEventTimeWindows extends WindowAssigner<Object, TimeWindow> {
static TumblingEventTimeWindows of(Time size);
}
class SlidingEventTimeWindows extends WindowAssigner<Object, TimeWindow> {
static SlidingEventTimeWindows of(Time size, Time slide);
}Unified connector framework for integrating with external systems, supporting both source and sink operations with exactly-once processing guarantees.
interface Source<T> {
SourceReader<T, ?> createReader(SourceReaderContext readerContext);
}
interface Sink<InputT> {
SinkWriter<InputT> createWriter(WriterInitContext context);
}Configuration system and utility classes for program execution, memory management, parameter handling, and system integration.
class Configuration {
<T> T get(ConfigOption<T> option);
<T> void set(ConfigOption<T> option, T value);
}
class MemorySize {
static MemorySize parse(String text);
long getBytes();
}