or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

connection-config.mddata-types.mdindex.mdlookup-operations.mdsink-operations.mdsql-ddl.md
tile.json

tessl/maven-org-apache-flink--flink-sql-connector-hbase-2-2-2-11

Apache Flink SQL connector that enables seamless integration with HBase 2.2.x databases through Flink's Table API and SQL interface

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
mavenpkg:maven/org.apache.flink/flink-sql-connector-hbase-2.2_2.11@1.14.x

To install, run

npx @tessl/cli install tessl/maven-org-apache-flink--flink-sql-connector-hbase-2-2-2-11@1.14.0

index.mddocs/

Apache Flink HBase 2.2 SQL Connector

Apache Flink SQL connector that enables seamless integration with HBase 2.2.x databases through Flink's Table API and SQL interface. This shaded JAR bundles all necessary HBase client dependencies with proper relocation to avoid classpath conflicts in Flink deployments.

Package Information

  • Package Name: flink-sql-connector-hbase-2.2_2.11
  • Group ID: org.apache.flink
  • Language: Java
  • Maven Installation: Add to your project dependencies:
<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-sql-connector-hbase-2.2_2.11</artifactId>
    <version>1.14.6</version>
</dependency>

For runtime usage (when running Flink applications):

# Copy to Flink lib directory
cp flink-sql-connector-hbase-2.2_2.11-1.14.6.jar $FLINK_HOME/lib/

Core Usage

The connector integrates with Flink SQL through DDL statements. No explicit Java imports are needed for SQL usage.

Basic Usage

-- Create HBase table mapping
CREATE TABLE hbase_table (
    rowkey STRING,
    info ROW<name STRING, age INT>,
    data ROW<value DOUBLE, timestamp BIGINT>,
    PRIMARY KEY (rowkey) NOT ENFORCED
) WITH (
    'connector' = 'hbase-2.2',
    'table-name' = 'my_hbase_table',
    'zookeeper.quorum' = 'localhost:2181'
);

-- Insert data (UPSERT operation)
INSERT INTO hbase_table VALUES (
    'user_001',
    ROW('Alice', 25),
    ROW(95.5, 1234567890)
);

-- Query data
SELECT rowkey, info.name, info.age, data.value 
FROM hbase_table 
WHERE info.age > 18;

-- Lookup join with another table
SELECT o.order_id, u.info.name, u.info.age
FROM orders o
JOIN hbase_table FOR SYSTEM_TIME AS OF o.proc_time AS u
ON o.user_id = u.rowkey;

Architecture

The connector architecture consists of several key components:

  • Factory System: HBase2DynamicTableFactory provides the main entry point for Flink's connector discovery mechanism
  • Table Sources/Sinks: Support both batch scanning and streaming lookup operations with UPSERT capabilities (INSERT, UPDATE_AFTER, DELETE)
  • Serialization Layer: Handles type mapping between Flink's type system and HBase's byte storage format
  • Configuration System: Comprehensive options for connection, caching, buffering, and performance tuning
  • Shaded Dependencies: All HBase client libraries are relocated to prevent classpath conflicts

Capabilities

SQL DDL and Table Operations

Core SQL operations for creating HBase table mappings, defining column families and qualifiers, and managing table schemas with proper data type mappings.

CREATE TABLE table_name (
    rowkey STRING,
    family_name ROW<qualifier_name DATA_TYPE, ...>,
    PRIMARY KEY (rowkey) NOT ENFORCED
) WITH (
    'connector' = 'hbase-2.2',
    'table-name' = 'hbase_table_name',
    -- connection and configuration options
);

SQL DDL and Operations

Connection and Configuration

Comprehensive configuration options for HBase connections, Zookeeper settings, performance tuning, and operational parameters.

// Key configuration options available via WITH clause
'connector' = 'hbase-2.2'                           // Required: Connector identifier
'table-name' = 'hbase_table_name'                   // Required: HBase table name
'zookeeper.quorum' = 'host1:2181,host2:2181'       // HBase Zookeeper quorum
'zookeeper.znode.parent' = '/hbase'                 // Zookeeper root (default: '/hbase')
'null-string-literal' = 'null'                     // Null representation (default: 'null')

Connection and Configuration

Lookup Operations and Caching

Temporal table join functionality with both synchronous and asynchronous lookup modes, including comprehensive caching strategies for performance optimization.

// Lookup configuration options
'lookup.async' = 'false'                           // Enable async lookup (default: false)
'lookup.cache.max-rows' = '1000'                   // Cache size (default: -1, disabled)
'lookup.cache.ttl' = '60s'                         // Cache TTL (default: 0)
'lookup.max-retries' = '3'                         // Max retry attempts (default: 3)

Lookup Operations

Sink Operations and Buffering

UPSERT sink operations with intelligent buffering strategies, exactly-once semantics through checkpointing, and comprehensive error handling.

// Sink configuration options
'sink.buffer-flush.max-size' = '2mb'               // Buffer size threshold (default: 2MB)
'sink.buffer-flush.max-rows' = '1000'              // Buffer row count (default: 1000)
'sink.buffer-flush.interval' = '1s'                // Flush interval (default: 1s)
'sink.parallelism' = '1'                           // Sink parallelism

Sink Operations

Data Types and Schema Mapping

Complete data type mapping between Flink's type system and HBase storage format, including support for complex nested types and proper serialization handling.

// Supported Flink data types mapping to HBase bytes
BOOLEAN, TINYINT, SMALLINT, INTEGER, BIGINT        // Integer types
FLOAT, DOUBLE, DECIMAL                             // Floating point types  
CHAR, VARCHAR                                      // String types
BINARY, VARBINARY                                  // Binary types
DATE, TIME, TIMESTAMP                              // Temporal types
ROW<field_name field_type, ...>                   // Nested row types for column families

Data Types and Schema

Error Handling

The connector provides robust error handling with:

  • Connection Retry Logic: Configurable retry attempts for HBase connection failures
  • Graceful Degradation: Proper handling of table not found and region server unavailability
  • Buffer Management: Overflow protection and timeout handling for sink operations
  • Checkpointing Support: Exactly-once processing guarantees through Flink's checkpoint mechanism
  • Validation Errors: Clear error messages for schema mismatches and configuration issues