RocksDB state backend for Apache Flink - provides persistent state storage using RocksDB as the underlying storage engine for stateful stream processing applications
npx @tessl/cli install tessl/maven-org-apache-flink--flink-statebackend-rocksdb@2.1.0The Apache Flink RocksDB State Backend provides persistent, fault-tolerant state storage for stream processing applications using RocksDB as the underlying embedded database. It enables high-throughput stateful computations with support for very large state sizes that exceed available memory, making it essential for production-scale streaming applications.
Maven Dependency:
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-statebackend-rocksdb</artifactId>
<version>2.1.0</version>
</dependency>Primary Package: org.apache.flink.state.rocksdb
Deprecated Package: org.apache.flink.contrib.streaming.state (use primary package for new development)
import org.apache.flink.state.rocksdb.EmbeddedRocksDBStateBackend;
import org.apache.flink.state.rocksdb.RocksDBOptionsFactory;
import org.apache.flink.state.rocksdb.ConfigurableRocksDBOptionsFactory;
import org.apache.flink.state.rocksdb.PredefinedOptions;
import org.apache.flink.state.rocksdb.RocksDBMemoryConfiguration;
import org.apache.flink.state.rocksdb.RocksDBNativeMetricOptions;
import org.apache.flink.state.rocksdb.RocksDBOptions;
import org.apache.flink.state.rocksdb.RocksDBConfigurableOptions;// Create RocksDB state backend with default settings
EmbeddedRocksDBStateBackend backend = new EmbeddedRocksDBStateBackend();
// Configure local storage path
backend.setDbStoragePath("/path/to/rocksdb");
// Apply to StreamExecutionEnvironment
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setStateBackend(backend);// Enable incremental checkpointing for better performance
EmbeddedRocksDBStateBackend backend = new EmbeddedRocksDBStateBackend(true);
backend.setDbStoragePath("/path/to/rocksdb");
// Use predefined optimization for your hardware
backend.setPredefinedOptions(PredefinedOptions.SPINNING_DISK_OPTIMIZED);EmbeddedRocksDBStateBackend backend = new EmbeddedRocksDBStateBackend();
// Configure memory usage
RocksDBMemoryConfiguration memConfig = backend.getMemoryConfiguration();
memConfig.setUseManagedMemory(true);
memConfig.setWriteBufferRatio(0.4);
memConfig.setHighPriorityPoolRatio(0.2);The RocksDB State Backend consists of several key components working together:
┌─────────────────────────────────────┐
│ Flink Application │
├─────────────────────────────────────┤
│ State Backend Interface │
├─────────────────────────────────────┤
│ EmbeddedRocksDBStateBackend │
├─────────────────────────────────────┤
│ RocksDB Engine │
├─────────────────────────────────────┤
│ Local File System │
└─────────────────────────────────────┘State is organized as:
Main EmbeddedRocksDBStateBackend class with configuration options for storage paths, checkpointing modes, and integration with Flink runtime.
Key APIs:
public class EmbeddedRocksDBStateBackend implements StateBackend {
public EmbeddedRocksDBStateBackend();
public EmbeddedRocksDBStateBackend(boolean enableIncrementalCheckpointing);
public EmbeddedRocksDBStateBackend(TernaryBoolean enableIncrementalCheckpointing);
public void setDbStoragePath(String path);
public void setDbStoragePaths(String... paths);
}Flexible options system using factory patterns for customizing RocksDB behavior, including predefined optimizations and custom configurations.
Key APIs:
public interface RocksDBOptionsFactory extends Serializable {
DBOptions createDBOptions(DBOptions currentOptions, Collection<AutoCloseable> handlesToClose);
ColumnFamilyOptions createColumnOptions(ColumnFamilyOptions currentOptions, Collection<AutoCloseable> handlesToClose);
}
public enum PredefinedOptions {
DEFAULT, SPINNING_DISK_OPTIMIZED, SPINNING_DISK_OPTIMIZED_HIGH_MEM, FLASH_SSD_OPTIMIZED
}Sophisticated memory management with support for managed memory integration, configurable caching strategies, and memory-aware optimizations.
Key APIs:
public final class RocksDBMemoryConfiguration {
public void setUseManagedMemory(boolean useManagedMemory);
public void setFixedMemoryPerSlot(MemorySize fixedMemoryPerSlot);
public void setWriteBufferRatio(double writeBufferRatio);
public void setHighPriorityPoolRatio(double highPriorityPoolRatio);
public void validate();
}Comprehensive metrics integration that forwards RocksDB native metrics to Flink's metrics system for monitoring performance, memory usage, and operational health.
Key APIs:
public class RocksDBNativeMetricOptions {
public static final ConfigOption<Boolean> MONITOR_BLOCK_CACHE_HIT;
public static final ConfigOption<Boolean> ESTIMATE_NUM_KEYS;
public static final ConfigOption<Boolean> MONITOR_COMPACTION_READ_BYTES;
// ... many more monitoring options
}Enables efficient fault recovery by only checkpointing changes since the last checkpoint, reducing I/O overhead for large state.
Automatic cleanup of expired state entries based on configurable time-to-live policies.
Advanced memory management including:
Native RocksDB metrics integration providing insights into:
All classes in org.apache.flink.contrib.streaming.state are deprecated. Update imports to use org.apache.flink.state.rocksdb:
// Old (deprecated)
import org.apache.flink.contrib.streaming.state.EmbeddedRocksDBStateBackend;
// New (recommended)
import org.apache.flink.state.rocksdb.EmbeddedRocksDBStateBackend;