CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/maven-org-apache-flink--flink-statebackend-rocksdb

RocksDB state backend for Apache Flink - provides persistent state storage using RocksDB as the underlying storage engine for stateful stream processing applications

Pending
Overview
Eval results
Files

Apache Flink RocksDB State Backend

The Apache Flink RocksDB State Backend provides persistent, fault-tolerant state storage for stream processing applications using RocksDB as the underlying embedded database. It enables high-throughput stateful computations with support for very large state sizes that exceed available memory, making it essential for production-scale streaming applications.

Package Information

Maven Dependency:

<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-statebackend-rocksdb</artifactId>
    <version>2.1.0</version>
</dependency>

Primary Package: org.apache.flink.state.rocksdb
Deprecated Package: org.apache.flink.contrib.streaming.state (use primary package for new development)

Core Imports

import org.apache.flink.state.rocksdb.EmbeddedRocksDBStateBackend;
import org.apache.flink.state.rocksdb.RocksDBOptionsFactory;
import org.apache.flink.state.rocksdb.ConfigurableRocksDBOptionsFactory;
import org.apache.flink.state.rocksdb.PredefinedOptions;
import org.apache.flink.state.rocksdb.RocksDBMemoryConfiguration;
import org.apache.flink.state.rocksdb.RocksDBNativeMetricOptions;
import org.apache.flink.state.rocksdb.RocksDBOptions;
import org.apache.flink.state.rocksdb.RocksDBConfigurableOptions;

Basic Usage

Simple Setup

// Create RocksDB state backend with default settings
EmbeddedRocksDBStateBackend backend = new EmbeddedRocksDBStateBackend();

// Configure local storage path
backend.setDbStoragePath("/path/to/rocksdb");

// Apply to StreamExecutionEnvironment
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setStateBackend(backend);

Incremental Checkpointing

// Enable incremental checkpointing for better performance
EmbeddedRocksDBStateBackend backend = new EmbeddedRocksDBStateBackend(true);
backend.setDbStoragePath("/path/to/rocksdb");

// Use predefined optimization for your hardware
backend.setPredefinedOptions(PredefinedOptions.SPINNING_DISK_OPTIMIZED);

Memory Management

EmbeddedRocksDBStateBackend backend = new EmbeddedRocksDBStateBackend();

// Configure memory usage
RocksDBMemoryConfiguration memConfig = backend.getMemoryConfiguration();
memConfig.setUseManagedMemory(true);
memConfig.setWriteBufferRatio(0.4);
memConfig.setHighPriorityPoolRatio(0.2);

Architecture Overview

Key Components

The RocksDB State Backend consists of several key components working together:

  1. EmbeddedRocksDBStateBackend - Main entry point and configuration hub
  2. RocksDB Options System - Configurable performance tuning via factories and predefined options
  3. Memory Management - Sophisticated memory allocation and caching strategies
  4. Metrics Integration - Native RocksDB metrics forwarded to Flink's monitoring system
  5. Checkpointing Integration - Incremental and full checkpoint support with the Flink runtime

State Storage Model

┌─────────────────────────────────────┐
│        Flink Application            │
├─────────────────────────────────────┤
│     State Backend Interface         │
├─────────────────────────────────────┤
│   EmbeddedRocksDBStateBackend       │
├─────────────────────────────────────┤
│        RocksDB Engine               │
├─────────────────────────────────────┤
│      Local File System              │
└─────────────────────────────────────┘

State is organized as:

  • Key Groups: Distributed across parallel instances
  • Column Families: Separate RocksDB column families per state
  • Memory Layers: Write buffers, block cache, and file system
  • Checkpoints: Incremental snapshots to external storage

Capability Summaries

Core State Backend Configuration

Main EmbeddedRocksDBStateBackend class with configuration options for storage paths, checkpointing modes, and integration with Flink runtime.

Key APIs:

public class EmbeddedRocksDBStateBackend implements StateBackend {
    public EmbeddedRocksDBStateBackend();
    public EmbeddedRocksDBStateBackend(boolean enableIncrementalCheckpointing);
    public EmbeddedRocksDBStateBackend(TernaryBoolean enableIncrementalCheckpointing);
    public void setDbStoragePath(String path);
    public void setDbStoragePaths(String... paths);
}

RocksDB Options and Factories

Flexible options system using factory patterns for customizing RocksDB behavior, including predefined optimizations and custom configurations.

Key APIs:

public interface RocksDBOptionsFactory extends Serializable {
    DBOptions createDBOptions(DBOptions currentOptions, Collection<AutoCloseable> handlesToClose);
    ColumnFamilyOptions createColumnOptions(ColumnFamilyOptions currentOptions, Collection<AutoCloseable> handlesToClose);
}

public enum PredefinedOptions {
    DEFAULT, SPINNING_DISK_OPTIMIZED, SPINNING_DISK_OPTIMIZED_HIGH_MEM, FLASH_SSD_OPTIMIZED
}

Memory Configuration and Management

Sophisticated memory management with support for managed memory integration, configurable caching strategies, and memory-aware optimizations.

Key APIs:

public final class RocksDBMemoryConfiguration {
    public void setUseManagedMemory(boolean useManagedMemory);
    public void setFixedMemoryPerSlot(MemorySize fixedMemoryPerSlot);
    public void setWriteBufferRatio(double writeBufferRatio);
    public void setHighPriorityPoolRatio(double highPriorityPoolRatio);
    public void validate();
}

Metrics and Monitoring

Comprehensive metrics integration that forwards RocksDB native metrics to Flink's metrics system for monitoring performance, memory usage, and operational health.

Key APIs:

public class RocksDBNativeMetricOptions {
    public static final ConfigOption<Boolean> MONITOR_BLOCK_CACHE_HIT;
    public static final ConfigOption<Boolean> ESTIMATE_NUM_KEYS;
    public static final ConfigOption<Boolean> MONITOR_COMPACTION_READ_BYTES;
    // ... many more monitoring options
}

Advanced Features

Incremental Checkpointing

Enables efficient fault recovery by only checkpointing changes since the last checkpoint, reducing I/O overhead for large state.

TTL Support

Automatic cleanup of expired state entries based on configurable time-to-live policies.

Memory Optimization

Advanced memory management including:

  • Managed memory integration with Flink's memory model
  • Configurable write buffer and block cache sizing
  • Memory-aware compaction strategies

Production Monitoring

Native RocksDB metrics integration providing insights into:

  • Memory usage patterns
  • I/O performance
  • Compaction behavior
  • Cache effectiveness

Migration from Deprecated Package

All classes in org.apache.flink.contrib.streaming.state are deprecated. Update imports to use org.apache.flink.state.rocksdb:

// Old (deprecated)
import org.apache.flink.contrib.streaming.state.EmbeddedRocksDBStateBackend;

// New (recommended)
import org.apache.flink.state.rocksdb.EmbeddedRocksDBStateBackend;

Related Documentation

  • Flink State Backend Documentation
  • RocksDB Configuration Guide

Install with Tessl CLI

npx tessl i tessl/maven-org-apache-flink--flink-statebackend-rocksdb
Workspace
tessl
Visibility
Public
Created
Last updated
Describes
mavenpkg:maven/org.apache.flink/flink-statebackend-rocksdb@2.1.x
Badge
tessl/maven-org-apache-flink--flink-statebackend-rocksdb badge