or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/maven-org-apache-flink--flink-connector-filesystem-2-10

Flink filesystem connector providing fault-tolerant rolling file sinks for streaming data to HDFS and Hadoop-compatible filesystems

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
mavenpkg:maven/org.apache.flink/flink-connector-filesystem_2.10@1.3.x

To install, run

npx @tessl/cli install tessl/maven-org-apache-flink--flink-connector-filesystem-2-10@1.3.0

0

# Flink Filesystem Connector

1

2

Apache Flink filesystem connector provides fault-tolerant rolling file sinks for streaming data to HDFS and other Hadoop-compatible filesystems. It offers exactly-once processing guarantees through integration with Flink's checkpointing mechanism and supports multiple file formats and bucketing strategies.

3

4

## Package Information

5

6

- **Package Name**: flink-connector-filesystem_2.10

7

- **Package Type**: Maven

8

- **Language**: Java (Scala 2.10 compatibility)

9

- **Version**: 1.3.3

10

- **Installation**: Add Maven dependency:

11

12

```xml

13

<dependency>

14

<groupId>org.apache.flink</groupId>

15

<artifactId>flink-connector-filesystem_2.10</artifactId>

16

<version>1.3.3</version>

17

</dependency>

18

```

19

20

## Core Imports

21

22

```java

23

// Modern BucketingSink (recommended)

24

import org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink;

25

import org.apache.flink.streaming.connectors.fs.bucketing.DateTimeBucketer;

26

import org.apache.flink.streaming.connectors.fs.bucketing.BasePathBucketer;

27

28

// Writers

29

import org.apache.flink.streaming.connectors.fs.StringWriter;

30

import org.apache.flink.streaming.connectors.fs.SequenceFileWriter;

31

import org.apache.flink.streaming.connectors.fs.AvroKeyValueSinkWriter;

32

33

// Legacy RollingSink (deprecated)

34

import org.apache.flink.streaming.connectors.fs.RollingSink;

35

```

36

37

## Basic Usage

38

39

```java

40

import org.apache.flink.streaming.api.datastream.DataStream;

41

import org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink;

42

import org.apache.flink.streaming.connectors.fs.StringWriter;

43

44

// Create a sink to write strings to HDFS

45

BucketingSink<String> sink = new BucketingSink<>("/tmp/flink-output");

46

sink.setWriter(new StringWriter<String>());

47

sink.setBatchSize(1024 * 1024 * 400); // 400 MB

48

49

// Add to streaming job

50

DataStream<String> textStream = //... your data stream

51

textStream.addSink(sink);

52

```

53

54

## Architecture

55

56

The connector provides two main sink implementations:

57

58

1. **BucketingSink** (recommended) - Modern implementation supporting multiple concurrent buckets with flexible bucketing strategies

59

2. **RollingSink** (deprecated) - Legacy implementation with single active bucket

60

61

Key components include:

62

- **Sinks**: Main entry points for streaming data to filesystems

63

- **Writers**: Handle actual file I/O for different formats (text, SequenceFile, Avro)

64

- **Bucketers**: Determine file organization strategies (time-based, custom)

65

- **File State Management**: Tracks file lifecycle (in-progress → pending → finished)

66

67

## Capabilities

68

69

### [Sink Implementations](./sinks.md)

70

Configure and use BucketingSink and RollingSink for fault-tolerant file writing with various batching and bucketing options.

71

72

```java { .api }

73

// BucketingSink - modern implementation

74

public class BucketingSink<T> extends RichSinkFunction<T>

75

public BucketingSink(String basePath)

76

public BucketingSink<T> setBatchSize(long batchSize)

77

public BucketingSink<T> setBucketer(Bucketer<T> bucketer)

78

public BucketingSink<T> setWriter(Writer<T> writer)

79

```

80

81

### [File Writers](./writers.md)

82

Different writer implementations for various file formats including text, Hadoop SequenceFiles, and Avro.

83

84

```java { .api }

85

// Writer interface and implementations

86

public interface Writer<T> extends Serializable

87

public class StringWriter<T> extends StreamWriterBase<T>

88

public class SequenceFileWriter<K extends Writable, V extends Writable> extends StreamWriterBase<Tuple2<K, V>>

89

public class AvroKeyValueSinkWriter<K, V> extends StreamWriterBase<Tuple2<K, V>>

90

```

91

92

### [Bucketing Strategies](./bucketers.md)

93

Organize output files into buckets based on time, custom logic, or no bucketing at all.

94

95

```java { .api }

96

// Bucketing interface and implementations

97

public interface Bucketer<T> extends Serializable

98

public class DateTimeBucketer<T> implements Bucketer<T>

99

public class BasePathBucketer<T> implements Bucketer<T>

100

```

101

102

### [Utility Classes](./utilities.md)

103

Supporting interfaces and classes including Clock implementations for time-based operations.

104

105

```java { .api }

106

// Utility interfaces and implementations

107

public interface Clock

108

public class SystemClock implements Clock

109

```