or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

index.mdlookup-options.mdsink-operations.mdsource-operations.mdtable-factory.mdwrite-options.md

write-options.mddocs/

0

# Write Options and Performance Tuning

1

2

Configuration options for optimizing write performance through buffering, batching, and parallelism control in the Apache Flink HBase 1.4 Connector.

3

4

## Capabilities

5

6

### HBaseWriteOptions

7

8

Configuration class that encapsulates all write-related settings for optimal performance tuning of HBase sink operations.

9

10

```java { .api }

11

/**

12

* Options for HBase writing operations

13

* Provides configuration for buffering, batching, and parallelism

14

*/

15

@Internal

16

public class HBaseWriteOptions implements Serializable {

17

18

/**

19

* Returns the maximum buffer size in bytes before flushing

20

* @return Buffer size threshold in bytes (default: 2MB from ConnectionConfiguration.WRITE_BUFFER_SIZE_DEFAULT)

21

*/

22

public long getBufferFlushMaxSizeInBytes();

23

24

/**

25

* Returns the maximum number of rows to buffer before flushing

26

* @return Row count threshold (default: 0, disabled)

27

*/

28

public long getBufferFlushMaxRows();

29

30

/**

31

* Returns the flush interval in milliseconds

32

* @return Interval in milliseconds between automatic flushes (default: 0, disabled)

33

*/

34

public long getBufferFlushIntervalMillis();

35

36

/**

37

* Returns the configured parallelism for sink operations

38

* @return Parallelism level, or null for framework default

39

*/

40

public Integer getParallelism();

41

42

/**

43

* Creates a new builder for configuring write options

44

* @return Builder instance for fluent configuration

45

*/

46

public static Builder builder();

47

}

48

```

49

50

### HBaseWriteOptions.Builder

51

52

Builder class providing fluent API for configuring write options with method chaining.

53

54

```java { .api }

55

/**

56

* Builder for HBaseWriteOptions using fluent interface pattern

57

* Allows step-by-step configuration of all write parameters

58

*/

59

public static class Builder {

60

61

/**

62

* Sets the maximum buffer size in bytes for flushing

63

* @param bufferFlushMaxSizeInBytes Buffer size threshold (default: 2MB)

64

* @return Builder instance for method chaining

65

*/

66

public Builder setBufferFlushMaxSizeInBytes(long bufferFlushMaxSizeInBytes);

67

68

/**

69

* Sets the maximum number of rows to buffer before flushing

70

* @param bufferFlushMaxRows Row count threshold (default: 0, disabled)

71

* @return Builder instance for method chaining

72

*/

73

public Builder setBufferFlushMaxRows(long bufferFlushMaxRows);

74

75

/**

76

* Sets the flush interval in milliseconds for time-based flushing

77

* @param bufferFlushIntervalMillis Interval in milliseconds (default: 0, disabled)

78

* @return Builder instance for method chaining

79

*/

80

public Builder setBufferFlushIntervalMillis(long bufferFlushIntervalMillis);

81

82

/**

83

* Sets the parallelism level for sink operations

84

* @param parallelism Number of parallel sink instances (null for framework default)

85

* @return Builder instance for method chaining

86

*/

87

public Builder setParallelism(Integer parallelism);

88

89

/**

90

* Creates a new HBaseWriteOptions instance with configured settings

91

* @return Configured HBaseWriteOptions instance

92

*/

93

public HBaseWriteOptions build();

94

}

95

```

96

97

**Usage Example:**

98

99

```java

100

// Example: Creating optimized write options for high-throughput scenario

101

HBaseWriteOptions highThroughputOptions = HBaseWriteOptions.builder()

102

.setBufferFlushMaxSizeInBytes(10 * 1024 * 1024) // 10MB buffer

103

.setBufferFlushMaxRows(5000) // 5000 rows per batch

104

.setBufferFlushIntervalMillis(5000) // 5 second interval

105

.setParallelism(8) // 8 parallel writers

106

.build();

107

108

// Example: Creating low-latency write options

109

HBaseWriteOptions lowLatencyOptions = HBaseWriteOptions.builder()

110

.setBufferFlushMaxSizeInBytes(100 * 1024) // 100KB buffer

111

.setBufferFlushMaxRows(100) // 100 rows per batch

112

.setBufferFlushIntervalMillis(500) // 500ms interval

113

.setParallelism(4) // 4 parallel writers

114

.build();

115

```

116

117

## Buffer Configuration Strategies

118

119

### Size-Based Flushing

120

121

Configure buffer flushing based on memory consumption to control memory usage and write batch sizes.

122

123

```java

124

// Memory-efficient configuration

125

HBaseWriteOptions memoryOptimized = HBaseWriteOptions.builder()

126

.setBufferFlushMaxSizeInBytes(4 * 1024 * 1024) // 4MB buffer

127

.setBufferFlushMaxRows(0) // Disable row-based flushing

128

.setBufferFlushIntervalMillis(0) // Disable time-based flushing

129

.build();

130

```

131

132

**Buffer Size Guidelines:**

133

134

- **Small datasets (< 1000 records/sec)**: 1-2MB buffer size

135

- **Medium datasets (1000-10000 records/sec)**: 4-8MB buffer size

136

- **Large datasets (> 10000 records/sec)**: 10-20MB buffer size

137

- **Memory-constrained environments**: 512KB-1MB buffer size

138

139

### Row-Based Flushing

140

141

Configure buffer flushing based on the number of accumulated rows for predictable batch sizes.

142

143

```java

144

// Row-count based configuration

145

HBaseWriteOptions rowBased = HBaseWriteOptions.builder()

146

.setBufferFlushMaxSizeInBytes(0) // Disable size-based flushing

147

.setBufferFlushMaxRows(2000) // Flush every 2000 rows

148

.setBufferFlushIntervalMillis(0) // Disable time-based flushing

149

.build();

150

```

151

152

**Row Count Guidelines:**

153

154

- **Small records (< 1KB each)**: 1000-5000 rows per batch

155

- **Medium records (1-10KB each)**: 500-2000 rows per batch

156

- **Large records (> 10KB each)**: 100-500 rows per batch

157

158

### Time-Based Flushing

159

160

Configure periodic flushing to ensure low latency even with small data volumes.

161

162

```java

163

// Time-based configuration for low latency

164

HBaseWriteOptions timeBased = HBaseWriteOptions.builder()

165

.setBufferFlushMaxSizeInBytes(0) // Disable size-based flushing

166

.setBufferFlushMaxRows(0) // Disable row-based flushing

167

.setBufferFlushIntervalMillis(1000) // Flush every 1 second

168

.build();

169

```

170

171

**Interval Guidelines:**

172

173

- **Real-time applications**: 100-500ms intervals

174

- **Near real-time applications**: 1-5 second intervals

175

- **Batch processing**: 10-60 second intervals

176

177

### Combined Flushing Strategies

178

179

Use multiple flushing triggers simultaneously for optimal performance across varying load patterns.

180

181

```java

182

// Combined strategy for adaptive performance

183

HBaseWriteOptions combined = HBaseWriteOptions.builder()

184

.setBufferFlushMaxSizeInBytes(8 * 1024 * 1024) // 8MB size limit

185

.setBufferFlushMaxRows(3000) // 3000 row limit

186

.setBufferFlushIntervalMillis(2000) // 2 second time limit

187

.setParallelism(6) // 6 parallel writers

188

.build();

189

190

// This configuration will flush when ANY condition is met:

191

// - Buffer reaches 8MB in size, OR

192

// - Buffer contains 3000 rows, OR

193

// - 2 seconds have elapsed since last flush

194

```

195

196

## Performance Tuning Guidelines

197

198

### High-Throughput Configuration

199

200

Optimize for maximum data ingestion rates with larger buffers and higher parallelism.

201

202

```java

203

HBaseWriteOptions highThroughput = HBaseWriteOptions.builder()

204

.setBufferFlushMaxSizeInBytes(16 * 1024 * 1024) // 16MB buffers

205

.setBufferFlushMaxRows(8000) // Large batches

206

.setBufferFlushIntervalMillis(10000) // 10s intervals

207

.setParallelism(12) // High parallelism

208

.build();

209

```

210

211

**High-Throughput Characteristics:**

212

- Large buffer sizes (10-20MB)

213

- High row counts (5000-10000 rows)

214

- Longer intervals (5-15 seconds)

215

- High parallelism (8-16 instances)

216

217

### Low-Latency Configuration

218

219

Optimize for minimal end-to-end latency with smaller buffers and frequent flushing.

220

221

```java

222

HBaseWriteOptions lowLatency = HBaseWriteOptions.builder()

223

.setBufferFlushMaxSizeInBytes(256 * 1024) // 256KB buffers

224

.setBufferFlushMaxRows(50) // Small batches

225

.setBufferFlushIntervalMillis(200) // 200ms intervals

226

.setParallelism(4) // Moderate parallelism

227

.build();

228

```

229

230

**Low-Latency Characteristics:**

231

- Small buffer sizes (100KB-1MB)

232

- Low row counts (50-500 rows)

233

- Short intervals (100-1000ms)

234

- Moderate parallelism (2-6 instances)

235

236

### Balanced Configuration

237

238

General-purpose configuration balancing throughput and latency for most use cases.

239

240

```java

241

HBaseWriteOptions balanced = HBaseWriteOptions.builder()

242

.setBufferFlushMaxSizeInBytes(4 * 1024 * 1024) // 4MB buffers

243

.setBufferFlushMaxRows(2000) // Medium batches

244

.setBufferFlushIntervalMillis(2000) // 2s intervals

245

.setParallelism(6) // Balanced parallelism

246

.build();

247

```

248

249

## SQL Configuration Mapping

250

251

Write options can be configured through SQL DDL statements using connector properties.

252

253

```sql

254

-- High-throughput SQL configuration

255

CREATE TABLE high_volume_events (

256

event_id STRING,

257

event_data ROW<timestamp TIMESTAMP(3), payload STRING, metadata MAP<STRING, STRING>>,

258

PRIMARY KEY (event_id) NOT ENFORCED

259

) WITH (

260

'connector' = 'hbase-1.4',

261

'table-name' = 'events',

262

'zookeeper.quorum' = 'zk1:2181,zk2:2181,zk3:2181',

263

'sink.buffer-flush.max-size' = '16mb', -- setBufferFlushMaxSizeInBytes

264

'sink.buffer-flush.max-rows' = '8000', -- setBufferFlushMaxRows

265

'sink.buffer-flush.interval' = '10s', -- setBufferFlushIntervalMillis

266

'sink.parallelism' = '12' -- setParallelism

267

);

268

269

-- Low-latency SQL configuration

270

CREATE TABLE realtime_alerts (

271

alert_id STRING,

272

alert_info ROW<severity STRING, message STRING, timestamp TIMESTAMP(3)>,

273

PRIMARY KEY (alert_id) NOT ENFORCED

274

) WITH (

275

'connector' = 'hbase-1.4',

276

'table-name' = 'alerts',

277

'zookeeper.quorum' = 'localhost:2181',

278

'sink.buffer-flush.max-size' = '256kb', -- Small buffers

279

'sink.buffer-flush.max-rows' = '50', -- Small batches

280

'sink.buffer-flush.interval' = '200ms', -- Frequent flushing

281

'sink.parallelism' = '4' -- Moderate parallelism

282

);

283

```

284

285

## Monitoring Write Performance

286

287

### Key Performance Indicators

288

289

Monitor these metrics to evaluate write performance and adjust configuration:

290

291

- **Throughput**: Records/second and bytes/second written to HBase

292

- **Latency**: End-to-end time from Flink to HBase persistence

293

- **Buffer utilization**: Percentage of buffer capacity used

294

- **Flush frequency**: Number of flushes per minute

295

- **Error rate**: Percentage of failed write operations

296

297

### Performance Tuning Process

298

299

1. **Baseline measurement**: Start with default settings and measure performance

300

2. **Identify bottlenecks**: Determine if limited by CPU, memory, network, or HBase

301

3. **Incremental tuning**: Adjust one parameter at a time and measure impact

302

4. **Load testing**: Validate configuration under expected production load

303

5. **Monitoring setup**: Establish alerting for performance degradation

304

305

### Common Performance Issues

306

307

**Problem**: High write latency

308

**Solution**: Reduce buffer sizes and flush intervals

309

310

**Problem**: Low write throughput

311

**Solution**: Increase buffer sizes and parallelism

312

313

**Problem**: Memory pressure

314

**Solution**: Reduce buffer sizes or increase flush frequency

315

316

**Problem**: HBase region hotspots

317

**Solution**: Optimize row key distribution and increase parallelism