Apache Flink HBase 1.4 Connector provides bidirectional data integration between Flink streaming applications and HBase NoSQL database.
—
Configuration options for optimizing write performance through buffering, batching, and parallelism control in the Apache Flink HBase 1.4 Connector.
Configuration class that encapsulates all write-related settings for optimal performance tuning of HBase sink operations.
/**
* Options for HBase writing operations
* Provides configuration for buffering, batching, and parallelism
*/
@Internal
public class HBaseWriteOptions implements Serializable {
/**
* Returns the maximum buffer size in bytes before flushing
* @return Buffer size threshold in bytes (default: 2MB from ConnectionConfiguration.WRITE_BUFFER_SIZE_DEFAULT)
*/
public long getBufferFlushMaxSizeInBytes();
/**
* Returns the maximum number of rows to buffer before flushing
* @return Row count threshold (default: 0, disabled)
*/
public long getBufferFlushMaxRows();
/**
* Returns the flush interval in milliseconds
* @return Interval in milliseconds between automatic flushes (default: 0, disabled)
*/
public long getBufferFlushIntervalMillis();
/**
* Returns the configured parallelism for sink operations
* @return Parallelism level, or null for framework default
*/
public Integer getParallelism();
/**
* Creates a new builder for configuring write options
* @return Builder instance for fluent configuration
*/
public static Builder builder();
}Builder class providing fluent API for configuring write options with method chaining.
/**
* Builder for HBaseWriteOptions using fluent interface pattern
* Allows step-by-step configuration of all write parameters
*/
public static class Builder {
/**
* Sets the maximum buffer size in bytes for flushing
* @param bufferFlushMaxSizeInBytes Buffer size threshold (default: 2MB)
* @return Builder instance for method chaining
*/
public Builder setBufferFlushMaxSizeInBytes(long bufferFlushMaxSizeInBytes);
/**
* Sets the maximum number of rows to buffer before flushing
* @param bufferFlushMaxRows Row count threshold (default: 0, disabled)
* @return Builder instance for method chaining
*/
public Builder setBufferFlushMaxRows(long bufferFlushMaxRows);
/**
* Sets the flush interval in milliseconds for time-based flushing
* @param bufferFlushIntervalMillis Interval in milliseconds (default: 0, disabled)
* @return Builder instance for method chaining
*/
public Builder setBufferFlushIntervalMillis(long bufferFlushIntervalMillis);
/**
* Sets the parallelism level for sink operations
* @param parallelism Number of parallel sink instances (null for framework default)
* @return Builder instance for method chaining
*/
public Builder setParallelism(Integer parallelism);
/**
* Creates a new HBaseWriteOptions instance with configured settings
* @return Configured HBaseWriteOptions instance
*/
public HBaseWriteOptions build();
}Usage Example:
// Example: Creating optimized write options for high-throughput scenario
HBaseWriteOptions highThroughputOptions = HBaseWriteOptions.builder()
.setBufferFlushMaxSizeInBytes(10 * 1024 * 1024) // 10MB buffer
.setBufferFlushMaxRows(5000) // 5000 rows per batch
.setBufferFlushIntervalMillis(5000) // 5 second interval
.setParallelism(8) // 8 parallel writers
.build();
// Example: Creating low-latency write options
HBaseWriteOptions lowLatencyOptions = HBaseWriteOptions.builder()
.setBufferFlushMaxSizeInBytes(100 * 1024) // 100KB buffer
.setBufferFlushMaxRows(100) // 100 rows per batch
.setBufferFlushIntervalMillis(500) // 500ms interval
.setParallelism(4) // 4 parallel writers
.build();Configure buffer flushing based on memory consumption to control memory usage and write batch sizes.
// Memory-efficient configuration
HBaseWriteOptions memoryOptimized = HBaseWriteOptions.builder()
.setBufferFlushMaxSizeInBytes(4 * 1024 * 1024) // 4MB buffer
.setBufferFlushMaxRows(0) // Disable row-based flushing
.setBufferFlushIntervalMillis(0) // Disable time-based flushing
.build();Buffer Size Guidelines:
Configure buffer flushing based on the number of accumulated rows for predictable batch sizes.
// Row-count based configuration
HBaseWriteOptions rowBased = HBaseWriteOptions.builder()
.setBufferFlushMaxSizeInBytes(0) // Disable size-based flushing
.setBufferFlushMaxRows(2000) // Flush every 2000 rows
.setBufferFlushIntervalMillis(0) // Disable time-based flushing
.build();Row Count Guidelines:
Configure periodic flushing to ensure low latency even with small data volumes.
// Time-based configuration for low latency
HBaseWriteOptions timeBased = HBaseWriteOptions.builder()
.setBufferFlushMaxSizeInBytes(0) // Disable size-based flushing
.setBufferFlushMaxRows(0) // Disable row-based flushing
.setBufferFlushIntervalMillis(1000) // Flush every 1 second
.build();Interval Guidelines:
Use multiple flushing triggers simultaneously for optimal performance across varying load patterns.
// Combined strategy for adaptive performance
HBaseWriteOptions combined = HBaseWriteOptions.builder()
.setBufferFlushMaxSizeInBytes(8 * 1024 * 1024) // 8MB size limit
.setBufferFlushMaxRows(3000) // 3000 row limit
.setBufferFlushIntervalMillis(2000) // 2 second time limit
.setParallelism(6) // 6 parallel writers
.build();
// This configuration will flush when ANY condition is met:
// - Buffer reaches 8MB in size, OR
// - Buffer contains 3000 rows, OR
// - 2 seconds have elapsed since last flushOptimize for maximum data ingestion rates with larger buffers and higher parallelism.
HBaseWriteOptions highThroughput = HBaseWriteOptions.builder()
.setBufferFlushMaxSizeInBytes(16 * 1024 * 1024) // 16MB buffers
.setBufferFlushMaxRows(8000) // Large batches
.setBufferFlushIntervalMillis(10000) // 10s intervals
.setParallelism(12) // High parallelism
.build();High-Throughput Characteristics:
Optimize for minimal end-to-end latency with smaller buffers and frequent flushing.
HBaseWriteOptions lowLatency = HBaseWriteOptions.builder()
.setBufferFlushMaxSizeInBytes(256 * 1024) // 256KB buffers
.setBufferFlushMaxRows(50) // Small batches
.setBufferFlushIntervalMillis(200) // 200ms intervals
.setParallelism(4) // Moderate parallelism
.build();Low-Latency Characteristics:
General-purpose configuration balancing throughput and latency for most use cases.
HBaseWriteOptions balanced = HBaseWriteOptions.builder()
.setBufferFlushMaxSizeInBytes(4 * 1024 * 1024) // 4MB buffers
.setBufferFlushMaxRows(2000) // Medium batches
.setBufferFlushIntervalMillis(2000) // 2s intervals
.setParallelism(6) // Balanced parallelism
.build();Write options can be configured through SQL DDL statements using connector properties.
-- High-throughput SQL configuration
CREATE TABLE high_volume_events (
event_id STRING,
event_data ROW<timestamp TIMESTAMP(3), payload STRING, metadata MAP<STRING, STRING>>,
PRIMARY KEY (event_id) NOT ENFORCED
) WITH (
'connector' = 'hbase-1.4',
'table-name' = 'events',
'zookeeper.quorum' = 'zk1:2181,zk2:2181,zk3:2181',
'sink.buffer-flush.max-size' = '16mb', -- setBufferFlushMaxSizeInBytes
'sink.buffer-flush.max-rows' = '8000', -- setBufferFlushMaxRows
'sink.buffer-flush.interval' = '10s', -- setBufferFlushIntervalMillis
'sink.parallelism' = '12' -- setParallelism
);
-- Low-latency SQL configuration
CREATE TABLE realtime_alerts (
alert_id STRING,
alert_info ROW<severity STRING, message STRING, timestamp TIMESTAMP(3)>,
PRIMARY KEY (alert_id) NOT ENFORCED
) WITH (
'connector' = 'hbase-1.4',
'table-name' = 'alerts',
'zookeeper.quorum' = 'localhost:2181',
'sink.buffer-flush.max-size' = '256kb', -- Small buffers
'sink.buffer-flush.max-rows' = '50', -- Small batches
'sink.buffer-flush.interval' = '200ms', -- Frequent flushing
'sink.parallelism' = '4' -- Moderate parallelism
);Monitor these metrics to evaluate write performance and adjust configuration:
Problem: High write latency Solution: Reduce buffer sizes and flush intervals
Problem: Low write throughput
Solution: Increase buffer sizes and parallelism
Problem: Memory pressure Solution: Reduce buffer sizes or increase flush frequency
Problem: HBase region hotspots Solution: Optimize row key distribution and increase parallelism
Install with Tessl CLI
npx tessl i tessl/maven-org-apache-flink--flink-connector-hbase-1-4-2-11