or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

connection-config.mddata-types.mdindex.mdlookup-operations.mdsink-operations.mdsql-ddl.md

index.mddocs/

0

# Apache Flink HBase 2.2 SQL Connector

1

2

Apache Flink SQL connector that enables seamless integration with HBase 2.2.x databases through Flink's Table API and SQL interface. This shaded JAR bundles all necessary HBase client dependencies with proper relocation to avoid classpath conflicts in Flink deployments.

3

4

## Package Information

5

6

- **Package Name**: flink-sql-connector-hbase-2.2_2.11

7

- **Group ID**: org.apache.flink

8

- **Language**: Java

9

- **Maven Installation**: Add to your project dependencies:

10

11

```xml

12

<dependency>

13

<groupId>org.apache.flink</groupId>

14

<artifactId>flink-sql-connector-hbase-2.2_2.11</artifactId>

15

<version>1.14.6</version>

16

</dependency>

17

```

18

19

For runtime usage (when running Flink applications):

20

21

```bash

22

# Copy to Flink lib directory

23

cp flink-sql-connector-hbase-2.2_2.11-1.14.6.jar $FLINK_HOME/lib/

24

```

25

26

## Core Usage

27

28

The connector integrates with Flink SQL through DDL statements. No explicit Java imports are needed for SQL usage.

29

30

## Basic Usage

31

32

```sql

33

-- Create HBase table mapping

34

CREATE TABLE hbase_table (

35

rowkey STRING,

36

info ROW<name STRING, age INT>,

37

data ROW<value DOUBLE, timestamp BIGINT>,

38

PRIMARY KEY (rowkey) NOT ENFORCED

39

) WITH (

40

'connector' = 'hbase-2.2',

41

'table-name' = 'my_hbase_table',

42

'zookeeper.quorum' = 'localhost:2181'

43

);

44

45

-- Insert data (UPSERT operation)

46

INSERT INTO hbase_table VALUES (

47

'user_001',

48

ROW('Alice', 25),

49

ROW(95.5, 1234567890)

50

);

51

52

-- Query data

53

SELECT rowkey, info.name, info.age, data.value

54

FROM hbase_table

55

WHERE info.age > 18;

56

57

-- Lookup join with another table

58

SELECT o.order_id, u.info.name, u.info.age

59

FROM orders o

60

JOIN hbase_table FOR SYSTEM_TIME AS OF o.proc_time AS u

61

ON o.user_id = u.rowkey;

62

```

63

64

## Architecture

65

66

The connector architecture consists of several key components:

67

68

- **Factory System**: `HBase2DynamicTableFactory` provides the main entry point for Flink's connector discovery mechanism

69

- **Table Sources/Sinks**: Support both batch scanning and streaming lookup operations with UPSERT capabilities (INSERT, UPDATE_AFTER, DELETE)

70

- **Serialization Layer**: Handles type mapping between Flink's type system and HBase's byte storage format

71

- **Configuration System**: Comprehensive options for connection, caching, buffering, and performance tuning

72

- **Shaded Dependencies**: All HBase client libraries are relocated to prevent classpath conflicts

73

74

## Capabilities

75

76

### SQL DDL and Table Operations

77

78

Core SQL operations for creating HBase table mappings, defining column families and qualifiers, and managing table schemas with proper data type mappings.

79

80

```sql { .api }

81

CREATE TABLE table_name (

82

rowkey STRING,

83

family_name ROW<qualifier_name DATA_TYPE, ...>,

84

PRIMARY KEY (rowkey) NOT ENFORCED

85

) WITH (

86

'connector' = 'hbase-2.2',

87

'table-name' = 'hbase_table_name',

88

-- connection and configuration options

89

);

90

```

91

92

[SQL DDL and Operations](./sql-ddl.md)

93

94

### Connection and Configuration

95

96

Comprehensive configuration options for HBase connections, Zookeeper settings, performance tuning, and operational parameters.

97

98

```java { .api }

99

// Key configuration options available via WITH clause

100

'connector' = 'hbase-2.2' // Required: Connector identifier

101

'table-name' = 'hbase_table_name' // Required: HBase table name

102

'zookeeper.quorum' = 'host1:2181,host2:2181' // HBase Zookeeper quorum

103

'zookeeper.znode.parent' = '/hbase' // Zookeeper root (default: '/hbase')

104

'null-string-literal' = 'null' // Null representation (default: 'null')

105

```

106

107

[Connection and Configuration](./connection-config.md)

108

109

### Lookup Operations and Caching

110

111

Temporal table join functionality with both synchronous and asynchronous lookup modes, including comprehensive caching strategies for performance optimization.

112

113

```java { .api }

114

// Lookup configuration options

115

'lookup.async' = 'false' // Enable async lookup (default: false)

116

'lookup.cache.max-rows' = '1000' // Cache size (default: -1, disabled)

117

'lookup.cache.ttl' = '60s' // Cache TTL (default: 0)

118

'lookup.max-retries' = '3' // Max retry attempts (default: 3)

119

```

120

121

[Lookup Operations](./lookup-operations.md)

122

123

### Sink Operations and Buffering

124

125

UPSERT sink operations with intelligent buffering strategies, exactly-once semantics through checkpointing, and comprehensive error handling.

126

127

```java { .api }

128

// Sink configuration options

129

'sink.buffer-flush.max-size' = '2mb' // Buffer size threshold (default: 2MB)

130

'sink.buffer-flush.max-rows' = '1000' // Buffer row count (default: 1000)

131

'sink.buffer-flush.interval' = '1s' // Flush interval (default: 1s)

132

'sink.parallelism' = '1' // Sink parallelism

133

```

134

135

[Sink Operations](./sink-operations.md)

136

137

### Data Types and Schema Mapping

138

139

Complete data type mapping between Flink's type system and HBase storage format, including support for complex nested types and proper serialization handling.

140

141

```java { .api }

142

// Supported Flink data types mapping to HBase bytes

143

BOOLEAN, TINYINT, SMALLINT, INTEGER, BIGINT // Integer types

144

FLOAT, DOUBLE, DECIMAL // Floating point types

145

CHAR, VARCHAR // String types

146

BINARY, VARBINARY // Binary types

147

DATE, TIME, TIMESTAMP // Temporal types

148

ROW<field_name field_type, ...> // Nested row types for column families

149

```

150

151

[Data Types and Schema](./data-types.md)

152

153

## Error Handling

154

155

The connector provides robust error handling with:

156

157

- **Connection Retry Logic**: Configurable retry attempts for HBase connection failures

158

- **Graceful Degradation**: Proper handling of table not found and region server unavailability

159

- **Buffer Management**: Overflow protection and timeout handling for sink operations

160

- **Checkpointing Support**: Exactly-once processing guarantees through Flink's checkpoint mechanism

161

- **Validation Errors**: Clear error messages for schema mismatches and configuration issues