0
# Flink Filesystem Connector
1
2
Apache Flink filesystem connector provides fault-tolerant rolling file sinks for streaming data to HDFS and other Hadoop-compatible filesystems. It offers exactly-once processing guarantees through integration with Flink's checkpointing mechanism and supports multiple file formats and bucketing strategies.
3
4
## Package Information
5
6
- **Package Name**: flink-connector-filesystem_2.10
7
- **Package Type**: Maven
8
- **Language**: Java (Scala 2.10 compatibility)
9
- **Version**: 1.3.3
10
- **Installation**: Add Maven dependency:
11
12
```xml
13
<dependency>
14
<groupId>org.apache.flink</groupId>
15
<artifactId>flink-connector-filesystem_2.10</artifactId>
16
<version>1.3.3</version>
17
</dependency>
18
```
19
20
## Core Imports
21
22
```java
23
// Modern BucketingSink (recommended)
24
import org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink;
25
import org.apache.flink.streaming.connectors.fs.bucketing.DateTimeBucketer;
26
import org.apache.flink.streaming.connectors.fs.bucketing.BasePathBucketer;
27
28
// Writers
29
import org.apache.flink.streaming.connectors.fs.StringWriter;
30
import org.apache.flink.streaming.connectors.fs.SequenceFileWriter;
31
import org.apache.flink.streaming.connectors.fs.AvroKeyValueSinkWriter;
32
33
// Legacy RollingSink (deprecated)
34
import org.apache.flink.streaming.connectors.fs.RollingSink;
35
```
36
37
## Basic Usage
38
39
```java
40
import org.apache.flink.streaming.api.datastream.DataStream;
41
import org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink;
42
import org.apache.flink.streaming.connectors.fs.StringWriter;
43
44
// Create a sink to write strings to HDFS
45
BucketingSink<String> sink = new BucketingSink<>("/tmp/flink-output");
46
sink.setWriter(new StringWriter<String>());
47
sink.setBatchSize(1024 * 1024 * 400); // 400 MB
48
49
// Add to streaming job
50
DataStream<String> textStream = //... your data stream
51
textStream.addSink(sink);
52
```
53
54
## Architecture
55
56
The connector provides two main sink implementations:
57
58
1. **BucketingSink** (recommended) - Modern implementation supporting multiple concurrent buckets with flexible bucketing strategies
59
2. **RollingSink** (deprecated) - Legacy implementation with single active bucket
60
61
Key components include:
62
- **Sinks**: Main entry points for streaming data to filesystems
63
- **Writers**: Handle actual file I/O for different formats (text, SequenceFile, Avro)
64
- **Bucketers**: Determine file organization strategies (time-based, custom)
65
- **File State Management**: Tracks file lifecycle (in-progress → pending → finished)
66
67
## Capabilities
68
69
### [Sink Implementations](./sinks.md)
70
Configure and use BucketingSink and RollingSink for fault-tolerant file writing with various batching and bucketing options.
71
72
```java { .api }
73
// BucketingSink - modern implementation
74
public class BucketingSink<T> extends RichSinkFunction<T>
75
public BucketingSink(String basePath)
76
public BucketingSink<T> setBatchSize(long batchSize)
77
public BucketingSink<T> setBucketer(Bucketer<T> bucketer)
78
public BucketingSink<T> setWriter(Writer<T> writer)
79
```
80
81
### [File Writers](./writers.md)
82
Different writer implementations for various file formats including text, Hadoop SequenceFiles, and Avro.
83
84
```java { .api }
85
// Writer interface and implementations
86
public interface Writer<T> extends Serializable
87
public class StringWriter<T> extends StreamWriterBase<T>
88
public class SequenceFileWriter<K extends Writable, V extends Writable> extends StreamWriterBase<Tuple2<K, V>>
89
public class AvroKeyValueSinkWriter<K, V> extends StreamWriterBase<Tuple2<K, V>>
90
```
91
92
### [Bucketing Strategies](./bucketers.md)
93
Organize output files into buckets based on time, custom logic, or no bucketing at all.
94
95
```java { .api }
96
// Bucketing interface and implementations
97
public interface Bucketer<T> extends Serializable
98
public class DateTimeBucketer<T> implements Bucketer<T>
99
public class BasePathBucketer<T> implements Bucketer<T>
100
```
101
102
### [Utility Classes](./utilities.md)
103
Supporting interfaces and classes including Clock implementations for time-based operations.
104
105
```java { .api }
106
// Utility interfaces and implementations
107
public interface Clock
108
public class SystemClock implements Clock
109
```