0
# Connection and Configuration
1
2
Comprehensive configuration options for HBase connections, Zookeeper settings, performance tuning, and operational parameters.
3
4
## Capabilities
5
6
### Required Configuration Options
7
8
Essential settings that must be provided for the connector to function.
9
10
```sql { .api }
11
WITH (
12
'connector' = 'hbase-2.2', -- Required: Connector identifier
13
'table-name' = 'hbase_table_name' -- Required: HBase table name
14
)
15
```
16
17
**Usage Examples**:
18
19
```sql
20
CREATE TABLE minimal_config (
21
rowkey STRING,
22
data ROW<value STRING>,
23
PRIMARY KEY (rowkey) NOT ENFORCED
24
) WITH (
25
'connector' = 'hbase-2.2',
26
'table-name' = 'my_hbase_table'
27
);
28
```
29
30
### Connection Configuration
31
32
Settings for establishing connections to the HBase cluster through Zookeeper.
33
34
```sql { .api }
35
WITH (
36
'zookeeper.quorum' = 'host1:port1,host2:port2,...', -- Zookeeper ensemble
37
'zookeeper.znode.parent' = '/hbase' -- Zookeeper root path (default: '/hbase')
38
)
39
```
40
41
**Parameters**:
42
- `zookeeper.quorum`: Comma-separated list of Zookeeper servers with optional ports
43
- `zookeeper.znode.parent`: Root directory in Zookeeper for HBase cluster metadata
44
45
**Usage Examples**:
46
47
```sql
48
-- Single Zookeeper node (development)
49
CREATE TABLE dev_table (
50
rowkey STRING,
51
data ROW<value STRING>,
52
PRIMARY KEY (rowkey) NOT ENFORCED
53
) WITH (
54
'connector' = 'hbase-2.2',
55
'table-name' = 'dev_data',
56
'zookeeper.quorum' = 'localhost:2181'
57
);
58
59
-- Production cluster with multiple Zookeeper nodes
60
CREATE TABLE prod_table (
61
rowkey STRING,
62
info ROW<name STRING, timestamp BIGINT>,
63
PRIMARY KEY (rowkey) NOT ENFORCED
64
) WITH (
65
'connector' = 'hbase-2.2',
66
'table-name' = 'production_data',
67
'zookeeper.quorum' = 'zk1.example.com:2181,zk2.example.com:2181,zk3.example.com:2181',
68
'zookeeper.znode.parent' = '/hbase-prod'
69
);
70
```
71
72
### Data Handling Configuration
73
74
Options for controlling how data is processed and represented.
75
76
```sql { .api }
77
WITH (
78
'null-string-literal' = 'null' -- Null value representation (default: 'null')
79
)
80
```
81
82
**Parameters**:
83
- `null-string-literal`: String representation used for null values in string fields
84
85
**Usage Examples**:
86
87
```sql
88
-- Custom null representation
89
CREATE TABLE custom_nulls (
90
rowkey STRING,
91
data ROW<optional_field STRING, required_field STRING>,
92
PRIMARY KEY (rowkey) NOT ENFORCED
93
) WITH (
94
'connector' = 'hbase-2.2',
95
'table-name' = 'nullable_data',
96
'zookeeper.quorum' = 'localhost:2181',
97
'null-string-literal' = 'N/A'
98
);
99
```
100
101
### Sink Configuration Options
102
103
Settings for controlling write operations, buffering behavior, and performance tuning.
104
105
```sql { .api }
106
WITH (
107
'sink.buffer-flush.max-size' = '2mb', -- Buffer size threshold (default: 2MB)
108
'sink.buffer-flush.max-rows' = '1000', -- Buffer row count threshold (default: 1000)
109
'sink.buffer-flush.interval' = '1s', -- Time-based flush interval (default: 1s)
110
'sink.parallelism' = '1' -- Sink operator parallelism
111
)
112
```
113
114
**Parameters**:
115
- `sink.buffer-flush.max-size`: Maximum memory size for buffered mutations before flushing
116
- `sink.buffer-flush.max-rows`: Maximum number of rows to buffer before flushing
117
- `sink.buffer-flush.interval`: Maximum time to wait before flushing buffered operations
118
- `sink.parallelism`: Number of parallel sink operators (affects write throughput)
119
120
**Usage Examples**:
121
122
```sql
123
-- High-throughput sink configuration
124
CREATE TABLE high_volume_sink (
125
rowkey STRING,
126
metrics ROW<value DOUBLE, timestamp BIGINT, source STRING>,
127
PRIMARY KEY (rowkey) NOT ENFORCED
128
) WITH (
129
'connector' = 'hbase-2.2',
130
'table-name' = 'metrics_data',
131
'zookeeper.quorum' = 'localhost:2181',
132
'sink.buffer-flush.max-size' = '10mb',
133
'sink.buffer-flush.max-rows' = '5000',
134
'sink.buffer-flush.interval' = '5s',
135
'sink.parallelism' = '4'
136
);
137
138
-- Low-latency sink configuration
139
CREATE TABLE low_latency_sink (
140
rowkey STRING,
141
events ROW<event_type STRING, payload STRING>,
142
PRIMARY KEY (rowkey) NOT ENFORCED
143
) WITH (
144
'connector' = 'hbase-2.2',
145
'table-name' = 'event_stream',
146
'zookeeper.quorum' = 'localhost:2181',
147
'sink.buffer-flush.max-size' = '100kb',
148
'sink.buffer-flush.max-rows' = '50',
149
'sink.buffer-flush.interval' = '100ms'
150
);
151
```
152
153
### Lookup Configuration Options
154
155
Settings for temporal table joins, caching behavior, and retry logic.
156
157
```sql { .api }
158
WITH (
159
'lookup.async' = 'false', -- Enable async lookup (default: false)
160
'lookup.cache.max-rows' = '-1', -- Cache size limit (default: -1, disabled)
161
'lookup.cache.ttl' = '0', -- Cache time-to-live (default: 0, no expiration)
162
'lookup.max-retries' = '3' -- Maximum retry attempts (default: 3)
163
)
164
```
165
166
**Parameters**:
167
- `lookup.async`: Enable asynchronous lookup operations for better throughput
168
- `lookup.cache.max-rows`: Maximum number of lookup results to cache (-1 disables caching)
169
- `lookup.cache.ttl`: Cache entry expiration time (0 means no expiration)
170
- `lookup.max-retries`: Number of retry attempts for failed lookup operations
171
172
**Usage Examples**:
173
174
```sql
175
-- High-performance async lookup with caching
176
CREATE TABLE cached_lookup (
177
rowkey STRING,
178
user_data ROW<name STRING, email STRING, preferences STRING>,
179
PRIMARY KEY (rowkey) NOT ENFORCED
180
) WITH (
181
'connector' = 'hbase-2.2',
182
'table-name' = 'user_profiles',
183
'zookeeper.quorum' = 'localhost:2181',
184
'lookup.async' = 'true',
185
'lookup.cache.max-rows' = '10000',
186
'lookup.cache.ttl' = '300s', -- 5 minute cache
187
'lookup.max-retries' = '5'
188
);
189
190
-- Simple synchronous lookup without caching
191
CREATE TABLE sync_lookup (
192
rowkey STRING,
193
reference_data ROW<description STRING, category STRING>,
194
PRIMARY KEY (rowkey) NOT ENFORCED
195
) WITH (
196
'connector' = 'hbase-2.2',
197
'table-name' = 'reference_table',
198
'zookeeper.quorum' = 'localhost:2181',
199
'lookup.async' = 'false'
200
);
201
```
202
203
### Advanced HBase Configuration
204
205
Pass-through mechanism for additional HBase client configuration properties using the properties prefix.
206
207
```sql { .api }
208
WITH (
209
'properties.*' = 'value' -- HBase configuration pass-through
210
)
211
```
212
213
**Parameters**:
214
- `properties.*`: Any HBase configuration property can be passed by prefixing with "properties."
215
- The connector strips the "properties." prefix and passes the remaining key-value pairs to HBase Configuration
216
- Supports any configuration option documented in [HBase Configuration Reference](http://hbase.apache.org/2.2/book.html#hbase_default_configurations)
217
218
**Usage Examples**:
219
220
```sql
221
-- Kerberos authentication configuration
222
CREATE TABLE secure_hbase (
223
rowkey STRING,
224
data ROW<value STRING, timestamp BIGINT>,
225
PRIMARY KEY (rowkey) NOT ENFORCED
226
) WITH (
227
'connector' = 'hbase-2.2',
228
'table-name' = 'secure_data',
229
'zookeeper.quorum' = 'localhost:2181',
230
'properties.hbase.security.authentication' = 'kerberos',
231
'properties.hbase.security.authorization' = 'true',
232
'properties.hbase.kerberos.regionserver.principal' = 'hbase/_HOST@REALM.COM'
233
);
234
235
-- Performance tuning with HBase client timeouts
236
CREATE TABLE tuned_hbase (
237
rowkey STRING,
238
metrics ROW<cpu DOUBLE, memory BIGINT>,
239
PRIMARY KEY (rowkey) NOT ENFORCED
240
) WITH (
241
'connector' = 'hbase-2.2',
242
'table-name' = 'performance_data',
243
'zookeeper.quorum' = 'localhost:2181',
244
'properties.hbase.client.scanner.timeout.period' = '120000', -- 2 minutes
245
'properties.hbase.rpc.timeout' = '60000', -- 1 minute
246
'properties.hbase.regionserver.lease.period' = '120000', -- 2 minutes
247
'properties.hbase.client.operation.timeout' = '90000' -- 90 seconds
248
);
249
250
-- Custom HBase client connection pooling
251
CREATE TABLE pooled_hbase (
252
rowkey STRING,
253
data ROW<payload STRING>,
254
PRIMARY KEY (rowkey) NOT ENFORCED
255
) WITH (
256
'connector' = 'hbase-2.2',
257
'table-name' = 'pooled_data',
258
'zookeeper.quorum' = 'localhost:2181',
259
'properties.hbase.client.max.total.tasks' = '100',
260
'properties.hbase.client.max.perserver.tasks' = '10',
261
'properties.hbase.client.max.perregion.tasks' = '1'
262
);
263
```
264
265
### Complete Configuration Example
266
267
Comprehensive example showing all configuration options together.
268
269
```sql
270
CREATE TABLE comprehensive_config (
271
transaction_id STRING,
272
transaction_data ROW<
273
amount DECIMAL(15,2),
274
currency STRING,
275
timestamp TIMESTAMP(3),
276
description STRING
277
>,
278
customer_info ROW<
279
customer_id STRING,
280
account_type STRING
281
>,
282
metadata ROW<
283
processing_time TIMESTAMP(3),
284
source_system STRING,
285
batch_id STRING
286
>,
287
PRIMARY KEY (transaction_id) NOT ENFORCED
288
) WITH (
289
-- Required settings
290
'connector' = 'hbase-2.2',
291
'table-name' = 'financial_transactions',
292
293
-- Connection settings
294
'zookeeper.quorum' = 'zk1.bank.com:2181,zk2.bank.com:2181,zk3.bank.com:2181',
295
'zookeeper.znode.parent' = '/hbase-production',
296
297
-- Data handling
298
'null-string-literal' = 'NULL',
299
300
-- Sink performance tuning
301
'sink.buffer-flush.max-size' = '16mb',
302
'sink.buffer-flush.max-rows' = '2000',
303
'sink.buffer-flush.interval' = '3s',
304
'sink.parallelism' = '8',
305
306
-- Lookup optimization
307
'lookup.async' = 'true',
308
'lookup.cache.max-rows' = '50000',
309
'lookup.cache.ttl' = '600s', -- 10 minute cache
310
'lookup.max-retries' = '5',
311
312
-- Advanced HBase configuration
313
'properties.hbase.client.scanner.timeout.period' = '300000', -- 5 minutes
314
'properties.hbase.rpc.timeout' = '120000', -- 2 minutes
315
'properties.hbase.security.authentication' = 'kerberos' -- Enable Kerberos
316
);
317
```
318
319
### Configuration Validation
320
321
The connector performs validation of configuration options at table creation time:
322
323
**Connection Validation**:
324
- Zookeeper quorum accessibility
325
- HBase table existence verification
326
- Proper permissions for table access
327
328
**Schema Validation**:
329
- Primary key constraint presence
330
- Row key column definition
331
- Data type compatibility with HBase storage
332
333
**Performance Validation**:
334
- Buffer size limits (max 64MB per buffer)
335
- Reasonable parallelism values
336
- Cache size limits based on available memory