Apache Flink SQL connector that enables seamless integration with HBase 2.2.x databases through Flink's Table API and SQL interface
npx @tessl/cli install tessl/maven-org-apache-flink--flink-sql-connector-hbase-2-2-2-11@1.14.00
# Apache Flink HBase 2.2 SQL Connector
1
2
Apache Flink SQL connector that enables seamless integration with HBase 2.2.x databases through Flink's Table API and SQL interface. This shaded JAR bundles all necessary HBase client dependencies with proper relocation to avoid classpath conflicts in Flink deployments.
3
4
## Package Information
5
6
- **Package Name**: flink-sql-connector-hbase-2.2_2.11
7
- **Group ID**: org.apache.flink
8
- **Language**: Java
9
- **Maven Installation**: Add to your project dependencies:
10
11
```xml
12
<dependency>
13
<groupId>org.apache.flink</groupId>
14
<artifactId>flink-sql-connector-hbase-2.2_2.11</artifactId>
15
<version>1.14.6</version>
16
</dependency>
17
```
18
19
For runtime usage (when running Flink applications):
20
21
```bash
22
# Copy to Flink lib directory
23
cp flink-sql-connector-hbase-2.2_2.11-1.14.6.jar $FLINK_HOME/lib/
24
```
25
26
## Core Usage
27
28
The connector integrates with Flink SQL through DDL statements. No explicit Java imports are needed for SQL usage.
29
30
## Basic Usage
31
32
```sql
33
-- Create HBase table mapping
34
CREATE TABLE hbase_table (
35
rowkey STRING,
36
info ROW<name STRING, age INT>,
37
data ROW<value DOUBLE, timestamp BIGINT>,
38
PRIMARY KEY (rowkey) NOT ENFORCED
39
) WITH (
40
'connector' = 'hbase-2.2',
41
'table-name' = 'my_hbase_table',
42
'zookeeper.quorum' = 'localhost:2181'
43
);
44
45
-- Insert data (UPSERT operation)
46
INSERT INTO hbase_table VALUES (
47
'user_001',
48
ROW('Alice', 25),
49
ROW(95.5, 1234567890)
50
);
51
52
-- Query data
53
SELECT rowkey, info.name, info.age, data.value
54
FROM hbase_table
55
WHERE info.age > 18;
56
57
-- Lookup join with another table
58
SELECT o.order_id, u.info.name, u.info.age
59
FROM orders o
60
JOIN hbase_table FOR SYSTEM_TIME AS OF o.proc_time AS u
61
ON o.user_id = u.rowkey;
62
```
63
64
## Architecture
65
66
The connector architecture consists of several key components:
67
68
- **Factory System**: `HBase2DynamicTableFactory` provides the main entry point for Flink's connector discovery mechanism
69
- **Table Sources/Sinks**: Support both batch scanning and streaming lookup operations with UPSERT capabilities (INSERT, UPDATE_AFTER, DELETE)
70
- **Serialization Layer**: Handles type mapping between Flink's type system and HBase's byte storage format
71
- **Configuration System**: Comprehensive options for connection, caching, buffering, and performance tuning
72
- **Shaded Dependencies**: All HBase client libraries are relocated to prevent classpath conflicts
73
74
## Capabilities
75
76
### SQL DDL and Table Operations
77
78
Core SQL operations for creating HBase table mappings, defining column families and qualifiers, and managing table schemas with proper data type mappings.
79
80
```sql { .api }
81
CREATE TABLE table_name (
82
rowkey STRING,
83
family_name ROW<qualifier_name DATA_TYPE, ...>,
84
PRIMARY KEY (rowkey) NOT ENFORCED
85
) WITH (
86
'connector' = 'hbase-2.2',
87
'table-name' = 'hbase_table_name',
88
-- connection and configuration options
89
);
90
```
91
92
[SQL DDL and Operations](./sql-ddl.md)
93
94
### Connection and Configuration
95
96
Comprehensive configuration options for HBase connections, Zookeeper settings, performance tuning, and operational parameters.
97
98
```java { .api }
99
// Key configuration options available via WITH clause
100
'connector' = 'hbase-2.2' // Required: Connector identifier
101
'table-name' = 'hbase_table_name' // Required: HBase table name
102
'zookeeper.quorum' = 'host1:2181,host2:2181' // HBase Zookeeper quorum
103
'zookeeper.znode.parent' = '/hbase' // Zookeeper root (default: '/hbase')
104
'null-string-literal' = 'null' // Null representation (default: 'null')
105
```
106
107
[Connection and Configuration](./connection-config.md)
108
109
### Lookup Operations and Caching
110
111
Temporal table join functionality with both synchronous and asynchronous lookup modes, including comprehensive caching strategies for performance optimization.
112
113
```java { .api }
114
// Lookup configuration options
115
'lookup.async' = 'false' // Enable async lookup (default: false)
116
'lookup.cache.max-rows' = '1000' // Cache size (default: -1, disabled)
117
'lookup.cache.ttl' = '60s' // Cache TTL (default: 0)
118
'lookup.max-retries' = '3' // Max retry attempts (default: 3)
119
```
120
121
[Lookup Operations](./lookup-operations.md)
122
123
### Sink Operations and Buffering
124
125
UPSERT sink operations with intelligent buffering strategies, exactly-once semantics through checkpointing, and comprehensive error handling.
126
127
```java { .api }
128
// Sink configuration options
129
'sink.buffer-flush.max-size' = '2mb' // Buffer size threshold (default: 2MB)
130
'sink.buffer-flush.max-rows' = '1000' // Buffer row count (default: 1000)
131
'sink.buffer-flush.interval' = '1s' // Flush interval (default: 1s)
132
'sink.parallelism' = '1' // Sink parallelism
133
```
134
135
[Sink Operations](./sink-operations.md)
136
137
### Data Types and Schema Mapping
138
139
Complete data type mapping between Flink's type system and HBase storage format, including support for complex nested types and proper serialization handling.
140
141
```java { .api }
142
// Supported Flink data types mapping to HBase bytes
143
BOOLEAN, TINYINT, SMALLINT, INTEGER, BIGINT // Integer types
144
FLOAT, DOUBLE, DECIMAL // Floating point types
145
CHAR, VARCHAR // String types
146
BINARY, VARBINARY // Binary types
147
DATE, TIME, TIMESTAMP // Temporal types
148
ROW<field_name field_type, ...> // Nested row types for column families
149
```
150
151
[Data Types and Schema](./data-types.md)
152
153
## Error Handling
154
155
The connector provides robust error handling with:
156
157
- **Connection Retry Logic**: Configurable retry attempts for HBase connection failures
158
- **Graceful Degradation**: Proper handling of table not found and region server unavailability
159
- **Buffer Management**: Overflow protection and timeout handling for sink operations
160
- **Checkpointing Support**: Exactly-once processing guarantees through Flink's checkpoint mechanism
161
- **Validation Errors**: Clear error messages for schema mismatches and configuration issues