or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

connection-config.mddata-types.mdindex.mdlookup-operations.mdsink-operations.mdsql-ddl.md

connection-config.mddocs/

0

# Connection and Configuration

1

2

Comprehensive configuration options for HBase connections, Zookeeper settings, performance tuning, and operational parameters.

3

4

## Capabilities

5

6

### Required Configuration Options

7

8

Essential settings that must be provided for the connector to function.

9

10

```sql { .api }

11

WITH (

12

'connector' = 'hbase-2.2', -- Required: Connector identifier

13

'table-name' = 'hbase_table_name' -- Required: HBase table name

14

)

15

```

16

17

**Usage Examples**:

18

19

```sql

20

CREATE TABLE minimal_config (

21

rowkey STRING,

22

data ROW<value STRING>,

23

PRIMARY KEY (rowkey) NOT ENFORCED

24

) WITH (

25

'connector' = 'hbase-2.2',

26

'table-name' = 'my_hbase_table'

27

);

28

```

29

30

### Connection Configuration

31

32

Settings for establishing connections to the HBase cluster through Zookeeper.

33

34

```sql { .api }

35

WITH (

36

'zookeeper.quorum' = 'host1:port1,host2:port2,...', -- Zookeeper ensemble

37

'zookeeper.znode.parent' = '/hbase' -- Zookeeper root path (default: '/hbase')

38

)

39

```

40

41

**Parameters**:

42

- `zookeeper.quorum`: Comma-separated list of Zookeeper servers with optional ports

43

- `zookeeper.znode.parent`: Root directory in Zookeeper for HBase cluster metadata

44

45

**Usage Examples**:

46

47

```sql

48

-- Single Zookeeper node (development)

49

CREATE TABLE dev_table (

50

rowkey STRING,

51

data ROW<value STRING>,

52

PRIMARY KEY (rowkey) NOT ENFORCED

53

) WITH (

54

'connector' = 'hbase-2.2',

55

'table-name' = 'dev_data',

56

'zookeeper.quorum' = 'localhost:2181'

57

);

58

59

-- Production cluster with multiple Zookeeper nodes

60

CREATE TABLE prod_table (

61

rowkey STRING,

62

info ROW<name STRING, timestamp BIGINT>,

63

PRIMARY KEY (rowkey) NOT ENFORCED

64

) WITH (

65

'connector' = 'hbase-2.2',

66

'table-name' = 'production_data',

67

'zookeeper.quorum' = 'zk1.example.com:2181,zk2.example.com:2181,zk3.example.com:2181',

68

'zookeeper.znode.parent' = '/hbase-prod'

69

);

70

```

71

72

### Data Handling Configuration

73

74

Options for controlling how data is processed and represented.

75

76

```sql { .api }

77

WITH (

78

'null-string-literal' = 'null' -- Null value representation (default: 'null')

79

)

80

```

81

82

**Parameters**:

83

- `null-string-literal`: String representation used for null values in string fields

84

85

**Usage Examples**:

86

87

```sql

88

-- Custom null representation

89

CREATE TABLE custom_nulls (

90

rowkey STRING,

91

data ROW<optional_field STRING, required_field STRING>,

92

PRIMARY KEY (rowkey) NOT ENFORCED

93

) WITH (

94

'connector' = 'hbase-2.2',

95

'table-name' = 'nullable_data',

96

'zookeeper.quorum' = 'localhost:2181',

97

'null-string-literal' = 'N/A'

98

);

99

```

100

101

### Sink Configuration Options

102

103

Settings for controlling write operations, buffering behavior, and performance tuning.

104

105

```sql { .api }

106

WITH (

107

'sink.buffer-flush.max-size' = '2mb', -- Buffer size threshold (default: 2MB)

108

'sink.buffer-flush.max-rows' = '1000', -- Buffer row count threshold (default: 1000)

109

'sink.buffer-flush.interval' = '1s', -- Time-based flush interval (default: 1s)

110

'sink.parallelism' = '1' -- Sink operator parallelism

111

)

112

```

113

114

**Parameters**:

115

- `sink.buffer-flush.max-size`: Maximum memory size for buffered mutations before flushing

116

- `sink.buffer-flush.max-rows`: Maximum number of rows to buffer before flushing

117

- `sink.buffer-flush.interval`: Maximum time to wait before flushing buffered operations

118

- `sink.parallelism`: Number of parallel sink operators (affects write throughput)

119

120

**Usage Examples**:

121

122

```sql

123

-- High-throughput sink configuration

124

CREATE TABLE high_volume_sink (

125

rowkey STRING,

126

metrics ROW<value DOUBLE, timestamp BIGINT, source STRING>,

127

PRIMARY KEY (rowkey) NOT ENFORCED

128

) WITH (

129

'connector' = 'hbase-2.2',

130

'table-name' = 'metrics_data',

131

'zookeeper.quorum' = 'localhost:2181',

132

'sink.buffer-flush.max-size' = '10mb',

133

'sink.buffer-flush.max-rows' = '5000',

134

'sink.buffer-flush.interval' = '5s',

135

'sink.parallelism' = '4'

136

);

137

138

-- Low-latency sink configuration

139

CREATE TABLE low_latency_sink (

140

rowkey STRING,

141

events ROW<event_type STRING, payload STRING>,

142

PRIMARY KEY (rowkey) NOT ENFORCED

143

) WITH (

144

'connector' = 'hbase-2.2',

145

'table-name' = 'event_stream',

146

'zookeeper.quorum' = 'localhost:2181',

147

'sink.buffer-flush.max-size' = '100kb',

148

'sink.buffer-flush.max-rows' = '50',

149

'sink.buffer-flush.interval' = '100ms'

150

);

151

```

152

153

### Lookup Configuration Options

154

155

Settings for temporal table joins, caching behavior, and retry logic.

156

157

```sql { .api }

158

WITH (

159

'lookup.async' = 'false', -- Enable async lookup (default: false)

160

'lookup.cache.max-rows' = '-1', -- Cache size limit (default: -1, disabled)

161

'lookup.cache.ttl' = '0', -- Cache time-to-live (default: 0, no expiration)

162

'lookup.max-retries' = '3' -- Maximum retry attempts (default: 3)

163

)

164

```

165

166

**Parameters**:

167

- `lookup.async`: Enable asynchronous lookup operations for better throughput

168

- `lookup.cache.max-rows`: Maximum number of lookup results to cache (-1 disables caching)

169

- `lookup.cache.ttl`: Cache entry expiration time (0 means no expiration)

170

- `lookup.max-retries`: Number of retry attempts for failed lookup operations

171

172

**Usage Examples**:

173

174

```sql

175

-- High-performance async lookup with caching

176

CREATE TABLE cached_lookup (

177

rowkey STRING,

178

user_data ROW<name STRING, email STRING, preferences STRING>,

179

PRIMARY KEY (rowkey) NOT ENFORCED

180

) WITH (

181

'connector' = 'hbase-2.2',

182

'table-name' = 'user_profiles',

183

'zookeeper.quorum' = 'localhost:2181',

184

'lookup.async' = 'true',

185

'lookup.cache.max-rows' = '10000',

186

'lookup.cache.ttl' = '300s', -- 5 minute cache

187

'lookup.max-retries' = '5'

188

);

189

190

-- Simple synchronous lookup without caching

191

CREATE TABLE sync_lookup (

192

rowkey STRING,

193

reference_data ROW<description STRING, category STRING>,

194

PRIMARY KEY (rowkey) NOT ENFORCED

195

) WITH (

196

'connector' = 'hbase-2.2',

197

'table-name' = 'reference_table',

198

'zookeeper.quorum' = 'localhost:2181',

199

'lookup.async' = 'false'

200

);

201

```

202

203

### Advanced HBase Configuration

204

205

Pass-through mechanism for additional HBase client configuration properties using the properties prefix.

206

207

```sql { .api }

208

WITH (

209

'properties.*' = 'value' -- HBase configuration pass-through

210

)

211

```

212

213

**Parameters**:

214

- `properties.*`: Any HBase configuration property can be passed by prefixing with "properties."

215

- The connector strips the "properties." prefix and passes the remaining key-value pairs to HBase Configuration

216

- Supports any configuration option documented in [HBase Configuration Reference](http://hbase.apache.org/2.2/book.html#hbase_default_configurations)

217

218

**Usage Examples**:

219

220

```sql

221

-- Kerberos authentication configuration

222

CREATE TABLE secure_hbase (

223

rowkey STRING,

224

data ROW<value STRING, timestamp BIGINT>,

225

PRIMARY KEY (rowkey) NOT ENFORCED

226

) WITH (

227

'connector' = 'hbase-2.2',

228

'table-name' = 'secure_data',

229

'zookeeper.quorum' = 'localhost:2181',

230

'properties.hbase.security.authentication' = 'kerberos',

231

'properties.hbase.security.authorization' = 'true',

232

'properties.hbase.kerberos.regionserver.principal' = 'hbase/_HOST@REALM.COM'

233

);

234

235

-- Performance tuning with HBase client timeouts

236

CREATE TABLE tuned_hbase (

237

rowkey STRING,

238

metrics ROW<cpu DOUBLE, memory BIGINT>,

239

PRIMARY KEY (rowkey) NOT ENFORCED

240

) WITH (

241

'connector' = 'hbase-2.2',

242

'table-name' = 'performance_data',

243

'zookeeper.quorum' = 'localhost:2181',

244

'properties.hbase.client.scanner.timeout.period' = '120000', -- 2 minutes

245

'properties.hbase.rpc.timeout' = '60000', -- 1 minute

246

'properties.hbase.regionserver.lease.period' = '120000', -- 2 minutes

247

'properties.hbase.client.operation.timeout' = '90000' -- 90 seconds

248

);

249

250

-- Custom HBase client connection pooling

251

CREATE TABLE pooled_hbase (

252

rowkey STRING,

253

data ROW<payload STRING>,

254

PRIMARY KEY (rowkey) NOT ENFORCED

255

) WITH (

256

'connector' = 'hbase-2.2',

257

'table-name' = 'pooled_data',

258

'zookeeper.quorum' = 'localhost:2181',

259

'properties.hbase.client.max.total.tasks' = '100',

260

'properties.hbase.client.max.perserver.tasks' = '10',

261

'properties.hbase.client.max.perregion.tasks' = '1'

262

);

263

```

264

265

### Complete Configuration Example

266

267

Comprehensive example showing all configuration options together.

268

269

```sql

270

CREATE TABLE comprehensive_config (

271

transaction_id STRING,

272

transaction_data ROW<

273

amount DECIMAL(15,2),

274

currency STRING,

275

timestamp TIMESTAMP(3),

276

description STRING

277

>,

278

customer_info ROW<

279

customer_id STRING,

280

account_type STRING

281

>,

282

metadata ROW<

283

processing_time TIMESTAMP(3),

284

source_system STRING,

285

batch_id STRING

286

>,

287

PRIMARY KEY (transaction_id) NOT ENFORCED

288

) WITH (

289

-- Required settings

290

'connector' = 'hbase-2.2',

291

'table-name' = 'financial_transactions',

292

293

-- Connection settings

294

'zookeeper.quorum' = 'zk1.bank.com:2181,zk2.bank.com:2181,zk3.bank.com:2181',

295

'zookeeper.znode.parent' = '/hbase-production',

296

297

-- Data handling

298

'null-string-literal' = 'NULL',

299

300

-- Sink performance tuning

301

'sink.buffer-flush.max-size' = '16mb',

302

'sink.buffer-flush.max-rows' = '2000',

303

'sink.buffer-flush.interval' = '3s',

304

'sink.parallelism' = '8',

305

306

-- Lookup optimization

307

'lookup.async' = 'true',

308

'lookup.cache.max-rows' = '50000',

309

'lookup.cache.ttl' = '600s', -- 10 minute cache

310

'lookup.max-retries' = '5',

311

312

-- Advanced HBase configuration

313

'properties.hbase.client.scanner.timeout.period' = '300000', -- 5 minutes

314

'properties.hbase.rpc.timeout' = '120000', -- 2 minutes

315

'properties.hbase.security.authentication' = 'kerberos' -- Enable Kerberos

316

);

317

```

318

319

### Configuration Validation

320

321

The connector performs validation of configuration options at table creation time:

322

323

**Connection Validation**:

324

- Zookeeper quorum accessibility

325

- HBase table existence verification

326

- Proper permissions for table access

327

328

**Schema Validation**:

329

- Primary key constraint presence

330

- Row key column definition

331

- Data type compatibility with HBase storage

332

333

**Performance Validation**:

334

- Buffer size limits (max 64MB per buffer)

335

- Reasonable parallelism values

336

- Cache size limits based on available memory