CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/maven-org-apache-flink--flink-statebackend-rocksdb-2-12

RocksDB state backend for Apache Flink streaming applications providing persistent, scalable state storage with fault tolerance, comprehensive configuration options, and native metrics monitoring.

Pending
Overview
Eval results
Files

predefined-options.mddocs/

Predefined Options

Pre-configured RocksDB options optimized for different hardware profiles and use cases, providing easy setup for common deployment scenarios.

Capabilities

PredefinedOptions Enum

Enumeration of pre-configured RocksDB option sets optimized for different hardware and workload characteristics.

/**
 * Predefined RocksDB options optimized for different hardware profiles.
 * Each option set provides tuned database and column family configurations.
 */
enum PredefinedOptions {
    
    /** Default configuration with basic optimizations */
    DEFAULT,
    
    /** Optimized for spinning disk storage (HDDs) */
    SPINNING_DISK_OPTIMIZED,
    
    /** Optimized for spinning disks with higher memory usage */
    SPINNING_DISK_OPTIMIZED_HIGH_MEM,
    
    /** Optimized for flash SSD storage */
    FLASH_SSD_OPTIMIZED;
    
    /**
     * Creates database options for this predefined configuration.
     * @param handlesToClose collection to register objects that need cleanup
     * @return configured DBOptions instance
     */
    abstract DBOptions createDBOptions(Collection<AutoCloseable> handlesToClose);
    
    /**
     * Creates column family options for this predefined configuration.
     * @param handlesToClose collection to register objects that need cleanup
     * @return configured ColumnFamilyOptions instance
     */
    abstract ColumnFamilyOptions createColumnOptions(Collection<AutoCloseable> handlesToClose);
}

DEFAULT Configuration

Basic configuration suitable for general-purpose workloads with minimal tuning.

Characteristics:

  • Disables fsync for better performance (trade-off: potential data loss on system crash)
  • Sets log level to header-only to reduce log verbosity
  • Disables statistics dump to reduce overhead
  • Uses RocksDB defaults for most other settings

Usage:

EmbeddedRocksDBStateBackend stateBackend = new EmbeddedRocksDBStateBackend();
stateBackend.setPredefinedOptions(PredefinedOptions.DEFAULT);

Configuration Details:

  • setUseFsync(false) - Disables fsync for performance
  • setInfoLogLevel(InfoLogLevel.HEADER_LEVEL) - Minimal logging
  • setStatsDumpPeriodSec(0) - Disables stats dumping

SPINNING_DISK_OPTIMIZED Configuration

Optimized for traditional hard disk drives (HDDs) with slower sequential I/O characteristics.

Characteristics:

  • Increases parallelism for background operations
  • Uses level-based compaction with dynamic level sizes
  • Optimizes file sizes and compaction for spinning disk access patterns
  • Reduces random I/O through better file organization

Usage:

EmbeddedRocksDBStateBackend stateBackend = new EmbeddedRocksDBStateBackend();
stateBackend.setPredefinedOptions(PredefinedOptions.SPINNING_DISK_OPTIMIZED);

Configuration Details:

Database Options:

  • setIncreaseParallelism(4) - Increases background thread count
  • setUseFsync(false) - Disables fsync for performance
  • setMaxOpenFiles(-1) - Unlimited open files
  • setInfoLogLevel(InfoLogLevel.HEADER_LEVEL) - Minimal logging
  • setStatsDumpPeriodSec(0) - Disables stats dumping

Column Family Options:

  • setCompactionStyle(CompactionStyle.LEVEL) - Uses level-based compaction
  • setLevelCompactionDynamicLevelBytes(true) - Enables dynamic level sizing

SPINNING_DISK_OPTIMIZED_HIGH_MEM Configuration

Optimized for spinning disks with higher memory usage to reduce I/O operations.

Characteristics:

  • All optimizations from SPINNING_DISK_OPTIMIZED
  • Larger block cache (256MB) to cache frequently accessed data
  • Larger block size (128KB) for better sequential reads
  • Larger target file size (256MB) for fewer files
  • Larger write buffer (64MB) to batch writes

Usage:

EmbeddedRocksDBStateBackend stateBackend = new EmbeddedRocksDBStateBackend();
stateBackend.setPredefinedOptions(PredefinedOptions.SPINNING_DISK_OPTIMIZED_HIGH_MEM);

Configuration Details:

Includes all SPINNING_DISK_OPTIMIZED settings plus:

Enhanced Memory Usage:

  • Block cache size: 256MB
  • Block size: 128KB
  • Target file size: 256MB
  • Write buffer size: 64MB

FLASH_SSD_OPTIMIZED Configuration

Optimized for flash-based SSD storage with fast random I/O characteristics.

Characteristics:

  • Increases parallelism for background operations
  • Uses default compaction settings suitable for SSD random access
  • Optimizes for SSD write patterns and longevity
  • Balances performance with SSD wear leveling

Usage:

EmbeddedRocksDBStateBackend stateBackend = new EmbeddedRocksDBStateBackend();
stateBackend.setPredefinedOptions(PredefinedOptions.FLASH_SSD_OPTIMIZED);

Configuration Details:

Database Options:

  • setIncreaseParallelism(4) - Increases background thread count
  • setUseFsync(false) - Disables fsync for performance
  • setMaxOpenFiles(-1) - Unlimited open files
  • setInfoLogLevel(InfoLogLevel.HEADER_LEVEL) - Minimal logging
  • setStatsDumpPeriodSec(0) - Disables stats dumping

Column Family Options:

  • Uses RocksDB default column family options optimized for SSD

Configuration Comparison

ConfigurationUse CaseMemory UsageI/O PatternParallelism
DEFAULTGeneral purposeLowBalancedDefault
SPINNING_DISK_OPTIMIZEDHDD storageModerateSequential-optimizedHigh (4x)
SPINNING_DISK_OPTIMIZED_HIGH_MEMHDD with more RAMHighSequential-optimizedHigh (4x)
FLASH_SSD_OPTIMIZEDSSD storageModerateRandom-optimizedHigh (4x)

Hardware-Specific Recommendations

Traditional Hard Drives (HDDs)

Recommended: SPINNING_DISK_OPTIMIZED or SPINNING_DISK_OPTIMIZED_HIGH_MEM

// For HDDs with limited memory
stateBackend.setPredefinedOptions(PredefinedOptions.SPINNING_DISK_OPTIMIZED);

// For HDDs with abundant memory (>8GB available for Flink)
stateBackend.setPredefinedOptions(PredefinedOptions.SPINNING_DISK_OPTIMIZED_HIGH_MEM);

Benefits:

  • Reduces random I/O through better file organization
  • Uses level-based compaction for better sequential access patterns
  • Dynamic level sizing reduces write amplification

Solid State Drives (SSDs)

Recommended: FLASH_SSD_OPTIMIZED

stateBackend.setPredefinedOptions(PredefinedOptions.FLASH_SSD_OPTIMIZED);

Benefits:

  • Takes advantage of fast random I/O capabilities
  • Optimizes for SSD write characteristics
  • Balances performance with drive longevity

Cloud Storage (EBS, Persistent Disks)

Recommended: Start with FLASH_SSD_OPTIMIZED, tune based on performance characteristics

// Most cloud storage behaves like SSDs
stateBackend.setPredefinedOptions(PredefinedOptions.FLASH_SSD_OPTIMIZED);

// For high-IOPS volumes with abundant memory
stateBackend.setPredefinedOptions(PredefinedOptions.SPINNING_DISK_OPTIMIZED_HIGH_MEM);

Complete Configuration Examples

Basic Setup with Predefined Options

import org.apache.flink.contrib.streaming.state.EmbeddedRocksDBStateBackend;
import org.apache.flink.contrib.streaming.state.PredefinedOptions;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

// Create state backend with incremental checkpointing
EmbeddedRocksDBStateBackend stateBackend = new EmbeddedRocksDBStateBackend(true);

// Configure for SSD storage
stateBackend.setPredefinedOptions(PredefinedOptions.FLASH_SSD_OPTIMIZED);
stateBackend.setDbStoragePath("/ssd/flink/rocksdb");

env.setStateBackend(stateBackend);

Combining Predefined Options with Custom Settings

import org.apache.flink.contrib.streaming.state.DefaultConfigurableOptionsFactory;

// Start with predefined options
EmbeddedRocksDBStateBackend stateBackend = new EmbeddedRocksDBStateBackend(true);
stateBackend.setPredefinedOptions(PredefinedOptions.SPINNING_DISK_OPTIMIZED);

// Add custom optimizations
DefaultConfigurableOptionsFactory customFactory = new DefaultConfigurableOptionsFactory()
    .setWriteBufferSize("128mb")     // Custom write buffer size
    .setBlockCacheSize("512mb")      // Custom block cache size
    .setUseBloomFilter(true)         // Enable Bloom filter
    .setBloomFilterBitsPerKey(10.0); // Configure Bloom filter

stateBackend.setRocksDBOptions(customFactory);

Environment-Specific Configurations

// Development/Testing Environment
EmbeddedRocksDBStateBackend devBackend = new EmbeddedRocksDBStateBackend(false);
devBackend.setPredefinedOptions(PredefinedOptions.DEFAULT);

// Production Environment with HDDs
EmbeddedRocksDBStateBackend prodBackend = new EmbeddedRocksDBStateBackend(true);
prodBackend.setPredefinedOptions(PredefinedOptions.SPINNING_DISK_OPTIMIZED_HIGH_MEM);
prodBackend.setDbStoragePaths("/data1/rocksdb", "/data2/rocksdb", "/data3/rocksdb");

// Production Environment with SSDs
EmbeddedRocksDBStateBackend ssdBackend = new EmbeddedRocksDBStateBackend(true);
ssdBackend.setPredefinedOptions(PredefinedOptions.FLASH_SSD_OPTIMIZED);
ssdBackend.setDbStoragePath("/nvme/flink/rocksdb");

Performance Tuning Guidelines

When to Use Each Option

  1. Start with appropriate predefined option based on your storage type
  2. Monitor performance metrics (checkpoint duration, state access latency)
  3. Fine-tune with custom options if needed using DefaultConfigurableOptionsFactory
  4. Test under realistic load before production deployment

Combining with Memory Configuration

// Optimized setup for high-memory SSD environment
EmbeddedRocksDBStateBackend stateBackend = new EmbeddedRocksDBStateBackend(true);
stateBackend.setPredefinedOptions(PredefinedOptions.FLASH_SSD_OPTIMIZED);

// Configure memory allocation
RocksDBMemoryConfiguration memConfig = stateBackend.getMemoryConfiguration();
memConfig.setUseManagedMemory(true);
memConfig.setWriteBufferRatio(0.3);  // More memory for caching with SSD
memConfig.setHighPriorityPoolRatio(0.1);

stateBackend.setNumberOfTransferThreads(8);  // More threads for SSD

Install with Tessl CLI

npx tessl i tessl/maven-org-apache-flink--flink-statebackend-rocksdb-2-12

docs

index.md

memory-configuration.md

native-metrics-configuration.md

options-factory.md

predefined-options.md

state-backend-configuration.md

tile.json