or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

bucketers.mdindex.mdsinks.mdutilities.mdwriters.md
tile.json

tessl/maven-org-apache-flink--flink-connector-filesystem-2-10

Flink filesystem connector providing fault-tolerant rolling file sinks for streaming data to HDFS and Hadoop-compatible filesystems

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
mavenpkg:maven/org.apache.flink/flink-connector-filesystem_2.10@1.3.x

To install, run

npx @tessl/cli install tessl/maven-org-apache-flink--flink-connector-filesystem-2-10@1.3.0

index.mddocs/

Flink Filesystem Connector

Apache Flink filesystem connector provides fault-tolerant rolling file sinks for streaming data to HDFS and other Hadoop-compatible filesystems. It offers exactly-once processing guarantees through integration with Flink's checkpointing mechanism and supports multiple file formats and bucketing strategies.

Package Information

  • Package Name: flink-connector-filesystem_2.10
  • Package Type: Maven
  • Language: Java (Scala 2.10 compatibility)
  • Version: 1.3.3
  • Installation: Add Maven dependency:
<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-connector-filesystem_2.10</artifactId>
    <version>1.3.3</version>
</dependency>

Core Imports

// Modern BucketingSink (recommended)
import org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink;
import org.apache.flink.streaming.connectors.fs.bucketing.DateTimeBucketer;
import org.apache.flink.streaming.connectors.fs.bucketing.BasePathBucketer;

// Writers
import org.apache.flink.streaming.connectors.fs.StringWriter;
import org.apache.flink.streaming.connectors.fs.SequenceFileWriter;
import org.apache.flink.streaming.connectors.fs.AvroKeyValueSinkWriter;

// Legacy RollingSink (deprecated)
import org.apache.flink.streaming.connectors.fs.RollingSink;

Basic Usage

import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink;
import org.apache.flink.streaming.connectors.fs.StringWriter;

// Create a sink to write strings to HDFS
BucketingSink<String> sink = new BucketingSink<>("/tmp/flink-output");
sink.setWriter(new StringWriter<String>());
sink.setBatchSize(1024 * 1024 * 400); // 400 MB

// Add to streaming job
DataStream<String> textStream = //... your data stream
textStream.addSink(sink);

Architecture

The connector provides two main sink implementations:

  1. BucketingSink (recommended) - Modern implementation supporting multiple concurrent buckets with flexible bucketing strategies
  2. RollingSink (deprecated) - Legacy implementation with single active bucket

Key components include:

  • Sinks: Main entry points for streaming data to filesystems
  • Writers: Handle actual file I/O for different formats (text, SequenceFile, Avro)
  • Bucketers: Determine file organization strategies (time-based, custom)
  • File State Management: Tracks file lifecycle (in-progress → pending → finished)

Capabilities

Sink Implementations

Configure and use BucketingSink and RollingSink for fault-tolerant file writing with various batching and bucketing options.

// BucketingSink - modern implementation
public class BucketingSink<T> extends RichSinkFunction<T>
public BucketingSink(String basePath)
public BucketingSink<T> setBatchSize(long batchSize)
public BucketingSink<T> setBucketer(Bucketer<T> bucketer)
public BucketingSink<T> setWriter(Writer<T> writer)

File Writers

Different writer implementations for various file formats including text, Hadoop SequenceFiles, and Avro.

// Writer interface and implementations
public interface Writer<T> extends Serializable
public class StringWriter<T> extends StreamWriterBase<T>
public class SequenceFileWriter<K extends Writable, V extends Writable> extends StreamWriterBase<Tuple2<K, V>>
public class AvroKeyValueSinkWriter<K, V> extends StreamWriterBase<Tuple2<K, V>>

Bucketing Strategies

Organize output files into buckets based on time, custom logic, or no bucketing at all.

// Bucketing interface and implementations
public interface Bucketer<T> extends Serializable
public class DateTimeBucketer<T> implements Bucketer<T>
public class BasePathBucketer<T> implements Bucketer<T>

Utility Classes

Supporting interfaces and classes including Clock implementations for time-based operations.

// Utility interfaces and implementations
public interface Clock
public class SystemClock implements Clock