or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

core-state-backend.mdindex.mdmemory-configuration.mdmetrics-monitoring.mdoptions-and-factories.md
tile.json

tessl/maven-org-apache-flink--flink-statebackend-rocksdb

RocksDB state backend for Apache Flink - provides persistent state storage using RocksDB as the underlying storage engine for stateful stream processing applications

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
mavenpkg:maven/org.apache.flink/flink-statebackend-rocksdb@2.1.x

To install, run

npx @tessl/cli install tessl/maven-org-apache-flink--flink-statebackend-rocksdb@2.1.0

index.mddocs/

Apache Flink RocksDB State Backend

The Apache Flink RocksDB State Backend provides persistent, fault-tolerant state storage for stream processing applications using RocksDB as the underlying embedded database. It enables high-throughput stateful computations with support for very large state sizes that exceed available memory, making it essential for production-scale streaming applications.

Package Information

Maven Dependency:

<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-statebackend-rocksdb</artifactId>
    <version>2.1.0</version>
</dependency>

Primary Package: org.apache.flink.state.rocksdb
Deprecated Package: org.apache.flink.contrib.streaming.state (use primary package for new development)

Core Imports

import org.apache.flink.state.rocksdb.EmbeddedRocksDBStateBackend;
import org.apache.flink.state.rocksdb.RocksDBOptionsFactory;
import org.apache.flink.state.rocksdb.ConfigurableRocksDBOptionsFactory;
import org.apache.flink.state.rocksdb.PredefinedOptions;
import org.apache.flink.state.rocksdb.RocksDBMemoryConfiguration;
import org.apache.flink.state.rocksdb.RocksDBNativeMetricOptions;
import org.apache.flink.state.rocksdb.RocksDBOptions;
import org.apache.flink.state.rocksdb.RocksDBConfigurableOptions;

Basic Usage

Simple Setup

// Create RocksDB state backend with default settings
EmbeddedRocksDBStateBackend backend = new EmbeddedRocksDBStateBackend();

// Configure local storage path
backend.setDbStoragePath("/path/to/rocksdb");

// Apply to StreamExecutionEnvironment
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setStateBackend(backend);

Incremental Checkpointing

// Enable incremental checkpointing for better performance
EmbeddedRocksDBStateBackend backend = new EmbeddedRocksDBStateBackend(true);
backend.setDbStoragePath("/path/to/rocksdb");

// Use predefined optimization for your hardware
backend.setPredefinedOptions(PredefinedOptions.SPINNING_DISK_OPTIMIZED);

Memory Management

EmbeddedRocksDBStateBackend backend = new EmbeddedRocksDBStateBackend();

// Configure memory usage
RocksDBMemoryConfiguration memConfig = backend.getMemoryConfiguration();
memConfig.setUseManagedMemory(true);
memConfig.setWriteBufferRatio(0.4);
memConfig.setHighPriorityPoolRatio(0.2);

Architecture Overview

Key Components

The RocksDB State Backend consists of several key components working together:

  1. EmbeddedRocksDBStateBackend - Main entry point and configuration hub
  2. RocksDB Options System - Configurable performance tuning via factories and predefined options
  3. Memory Management - Sophisticated memory allocation and caching strategies
  4. Metrics Integration - Native RocksDB metrics forwarded to Flink's monitoring system
  5. Checkpointing Integration - Incremental and full checkpoint support with the Flink runtime

State Storage Model

┌─────────────────────────────────────┐
│        Flink Application            │
├─────────────────────────────────────┤
│     State Backend Interface         │
├─────────────────────────────────────┤
│   EmbeddedRocksDBStateBackend       │
├─────────────────────────────────────┤
│        RocksDB Engine               │
├─────────────────────────────────────┤
│      Local File System              │
└─────────────────────────────────────┘

State is organized as:

  • Key Groups: Distributed across parallel instances
  • Column Families: Separate RocksDB column families per state
  • Memory Layers: Write buffers, block cache, and file system
  • Checkpoints: Incremental snapshots to external storage

Capability Summaries

Core State Backend Configuration

Main EmbeddedRocksDBStateBackend class with configuration options for storage paths, checkpointing modes, and integration with Flink runtime.

Key APIs:

public class EmbeddedRocksDBStateBackend implements StateBackend {
    public EmbeddedRocksDBStateBackend();
    public EmbeddedRocksDBStateBackend(boolean enableIncrementalCheckpointing);
    public EmbeddedRocksDBStateBackend(TernaryBoolean enableIncrementalCheckpointing);
    public void setDbStoragePath(String path);
    public void setDbStoragePaths(String... paths);
}

RocksDB Options and Factories

Flexible options system using factory patterns for customizing RocksDB behavior, including predefined optimizations and custom configurations.

Key APIs:

public interface RocksDBOptionsFactory extends Serializable {
    DBOptions createDBOptions(DBOptions currentOptions, Collection<AutoCloseable> handlesToClose);
    ColumnFamilyOptions createColumnOptions(ColumnFamilyOptions currentOptions, Collection<AutoCloseable> handlesToClose);
}

public enum PredefinedOptions {
    DEFAULT, SPINNING_DISK_OPTIMIZED, SPINNING_DISK_OPTIMIZED_HIGH_MEM, FLASH_SSD_OPTIMIZED
}

Memory Configuration and Management

Sophisticated memory management with support for managed memory integration, configurable caching strategies, and memory-aware optimizations.

Key APIs:

public final class RocksDBMemoryConfiguration {
    public void setUseManagedMemory(boolean useManagedMemory);
    public void setFixedMemoryPerSlot(MemorySize fixedMemoryPerSlot);
    public void setWriteBufferRatio(double writeBufferRatio);
    public void setHighPriorityPoolRatio(double highPriorityPoolRatio);
    public void validate();
}

Metrics and Monitoring

Comprehensive metrics integration that forwards RocksDB native metrics to Flink's metrics system for monitoring performance, memory usage, and operational health.

Key APIs:

public class RocksDBNativeMetricOptions {
    public static final ConfigOption<Boolean> MONITOR_BLOCK_CACHE_HIT;
    public static final ConfigOption<Boolean> ESTIMATE_NUM_KEYS;
    public static final ConfigOption<Boolean> MONITOR_COMPACTION_READ_BYTES;
    // ... many more monitoring options
}

Advanced Features

Incremental Checkpointing

Enables efficient fault recovery by only checkpointing changes since the last checkpoint, reducing I/O overhead for large state.

TTL Support

Automatic cleanup of expired state entries based on configurable time-to-live policies.

Memory Optimization

Advanced memory management including:

  • Managed memory integration with Flink's memory model
  • Configurable write buffer and block cache sizing
  • Memory-aware compaction strategies

Production Monitoring

Native RocksDB metrics integration providing insights into:

  • Memory usage patterns
  • I/O performance
  • Compaction behavior
  • Cache effectiveness

Migration from Deprecated Package

All classes in org.apache.flink.contrib.streaming.state are deprecated. Update imports to use org.apache.flink.state.rocksdb:

// Old (deprecated)
import org.apache.flink.contrib.streaming.state.EmbeddedRocksDBStateBackend;

// New (recommended)
import org.apache.flink.state.rocksdb.EmbeddedRocksDBStateBackend;

Related Documentation