0
# Lookup Operations and Caching
1
2
Temporal table join functionality with both synchronous and asynchronous lookup modes, including comprehensive caching strategies for performance optimization.
3
4
## Capabilities
5
6
### Temporal Table Joins
7
8
Performs lookups against HBase table for enriching streaming data using temporal join semantics.
9
10
```sql { .api }
11
SELECT
12
stream_table.field,
13
hbase_table.column_family.qualifier
14
FROM stream_table s
15
JOIN hbase_table FOR SYSTEM_TIME AS OF s.proc_time AS h
16
ON s.lookup_key = h.rowkey
17
WHERE additional_conditions;
18
```
19
20
**Requirements**:
21
- Lookup join must use `FOR SYSTEM_TIME AS OF` syntax
22
- Join condition must be based on row key equality
23
- Processing time field (`proc_time`) must be available in stream
24
25
**Usage Examples**:
26
27
```sql
28
-- User profile enrichment
29
SELECT
30
event.user_id,
31
event.action,
32
event.timestamp,
33
profile.info.name AS user_name,
34
profile.info.email AS user_email,
35
profile.preferences.language
36
FROM user_events event
37
JOIN user_profiles FOR SYSTEM_TIME AS OF event.proc_time AS profile
38
ON event.user_id = profile.user_id;
39
40
-- Product catalog lookup for sales
41
SELECT
42
sale.transaction_id,
43
sale.quantity,
44
catalog.basic_info.name AS product_name,
45
catalog.basic_info.price AS unit_price,
46
catalog.basic_info.price * sale.quantity AS total_amount
47
FROM sales_stream sale
48
JOIN product_catalog FOR SYSTEM_TIME AS OF sale.proc_time AS catalog
49
ON sale.product_id = catalog.product_id
50
WHERE catalog.inventory.stock_count > 0;
51
```
52
53
### Synchronous Lookup Mode
54
55
Default lookup mode that performs blocking lookups with immediate results.
56
57
```sql { .api }
58
WITH (
59
'lookup.async' = 'false' -- Synchronous mode (default)
60
)
61
```
62
63
**Characteristics**:
64
- Blocking operation that waits for HBase response
65
- Lower throughput but predictable latency
66
- Simpler error handling and debugging
67
- Suitable for low-to-medium volume streams
68
69
**Usage Examples**:
70
71
```sql
72
-- Synchronous lookup table for reference data
73
CREATE TABLE reference_lookup (
74
code STRING,
75
reference ROW<description STRING, category STRING, active BOOLEAN>,
76
PRIMARY KEY (code) NOT ENFORCED
77
) WITH (
78
'connector' = 'hbase-2.2',
79
'table-name' = 'reference_codes',
80
'zookeeper.quorum' = 'localhost:2181',
81
'lookup.async' = 'false',
82
'lookup.max-retries' = '3'
83
);
84
85
-- Using synchronous lookup in join
86
SELECT
87
transaction.id,
88
transaction.amount,
89
ref.reference.description AS transaction_type
90
FROM transaction_stream transaction
91
JOIN reference_lookup FOR SYSTEM_TIME AS OF transaction.proc_time AS ref
92
ON transaction.type_code = ref.code;
93
```
94
95
### Asynchronous Lookup Mode
96
97
High-performance lookup mode using non-blocking operations for improved throughput.
98
99
```sql { .api }
100
WITH (
101
'lookup.async' = 'true' -- Asynchronous mode
102
)
103
```
104
105
**Characteristics**:
106
- Non-blocking operations with callback-based results
107
- Higher throughput for high-volume streams
108
- More complex error handling
109
- Requires proper backpressure management
110
111
**Usage Examples**:
112
113
```sql
114
-- High-throughput async lookup table
115
CREATE TABLE async_user_lookup (
116
user_id STRING,
117
profile ROW<name STRING, email STRING, tier STRING>,
118
activity ROW<last_login TIMESTAMP(3), total_orders INT>,
119
PRIMARY KEY (user_id) NOT ENFORCED
120
) WITH (
121
'connector' = 'hbase-2.2',
122
'table-name' = 'user_profiles',
123
'zookeeper.quorum' = 'localhost:2181',
124
'lookup.async' = 'true',
125
'lookup.max-retries' = '5'
126
);
127
128
-- Using async lookup for high-volume stream
129
SELECT
130
click.event_id,
131
click.page_url,
132
click.timestamp,
133
user.profile.name AS user_name,
134
user.profile.tier AS user_tier
135
FROM clickstream click
136
JOIN async_user_lookup FOR SYSTEM_TIME AS OF click.proc_time AS user
137
ON click.user_id = user.user_id;
138
```
139
140
### Caching Configuration
141
142
Caching strategies to reduce HBase lookup load and improve performance.
143
144
```sql { .api }
145
WITH (
146
'lookup.cache.max-rows' = '10000', -- Maximum cached entries
147
'lookup.cache.ttl' = '300s' -- Cache time-to-live
148
)
149
```
150
151
**Parameters**:
152
- `lookup.cache.max-rows`: Maximum number of lookup results to cache (-1 disables caching)
153
- `lookup.cache.ttl`: Cache entry expiration time (0 means no expiration)
154
155
**Cache Behavior**:
156
- LRU (Least Recently Used) eviction when cache is full
157
- TTL-based expiration for data freshness
158
- Cache is per lookup function instance (per task)
159
- Memory usage proportional to cache size and record size
160
161
**Usage Examples**:
162
163
```sql
164
-- Long-lived reference data with large cache
165
CREATE TABLE reference_cache (
166
country_code STRING,
167
info ROW<name STRING, currency STRING, timezone STRING>,
168
PRIMARY KEY (country_code) NOT ENFORCED
169
) WITH (
170
'connector' = 'hbase-2.2',
171
'table-name' = 'country_reference',
172
'zookeeper.quorum' = 'localhost:2181',
173
'lookup.cache.max-rows' = '50000',
174
'lookup.cache.ttl' = '3600s' -- 1 hour cache
175
);
176
177
-- Frequently changing data with short cache
178
CREATE TABLE price_cache (
179
symbol STRING,
180
pricing ROW<current_price DECIMAL(10,4), last_update TIMESTAMP(3)>,
181
PRIMARY KEY (symbol) NOT ENFORCED
182
) WITH (
183
'connector' = 'hbase-2.2',
184
'table-name' = 'stock_prices',
185
'zookeeper.quorum' = 'localhost:2181',
186
'lookup.cache.max-rows' = '1000',
187
'lookup.cache.ttl' = '30s' -- 30 second cache
188
);
189
```
190
191
### Retry Configuration
192
193
Error handling and retry logic for failed lookup operations.
194
195
```sql { .api }
196
WITH (
197
'lookup.max-retries' = '3' -- Maximum retry attempts (default: 3)
198
)
199
```
200
201
**Retry Behavior**:
202
- Exponential backoff between retry attempts
203
- Retries are attempted for transient failures (connection issues, timeouts)
204
- Non-retryable errors (table not found, invalid row key) fail immediately
205
- Failed lookups after all retries result in null values
206
207
**Usage Examples**:
208
209
```sql
210
-- Robust lookup with extensive retry
211
CREATE TABLE robust_lookup (
212
id STRING,
213
data ROW<value STRING, timestamp BIGINT>,
214
PRIMARY KEY (id) NOT ENFORCED
215
) WITH (
216
'connector' = 'hbase-2.2',
217
'table-name' = 'critical_data',
218
'zookeeper.quorum' = 'localhost:2181',
219
'lookup.max-retries' = '10' -- High retry count for critical data
220
);
221
222
-- Fast-fail lookup for non-critical data
223
CREATE TABLE optional_lookup (
224
ref_id STRING,
225
optional_data ROW<description STRING>,
226
PRIMARY KEY (ref_id) NOT ENFORCED
227
) WITH (
228
'connector' = 'hbase-2.2',
229
'table-name' = 'optional_reference',
230
'zookeeper.quorum' = 'localhost:2181',
231
'lookup.max-retries' = '1' -- Minimal retry for optional data
232
);
233
```
234
235
### Lookup Performance Optimization
236
237
Strategies for optimizing lookup performance in different scenarios.
238
239
**High Cache Hit Rate Scenario**:
240
```sql
241
-- Optimize for frequently accessed reference data
242
CREATE TABLE frequent_lookup (
243
key STRING,
244
data ROW<value STRING>,
245
PRIMARY KEY (key) NOT ENFORCED
246
) WITH (
247
'connector' = 'hbase-2.2',
248
'table-name' = 'frequent_data',
249
'zookeeper.quorum' = 'localhost:2181',
250
'lookup.async' = 'true', -- Async for high throughput
251
'lookup.cache.max-rows' = '100000', -- Large cache
252
'lookup.cache.ttl' = '1800s', -- 30 min TTL
253
'lookup.max-retries' = '3'
254
);
255
```
256
257
**Low Cache Hit Rate Scenario**:
258
```sql
259
-- Optimize for unique lookups with minimal caching
260
CREATE TABLE unique_lookup (
261
uuid STRING,
262
session_data ROW<user_id STRING, created_at TIMESTAMP(3)>,
263
PRIMARY KEY (uuid) NOT ENFORCED
264
) WITH (
265
'connector' = 'hbase-2.2',
266
'table-name' = 'session_store',
267
'zookeeper.quorum' = 'localhost:2181',
268
'lookup.async' = 'true', -- Async for throughput
269
'lookup.cache.max-rows' = '1000', -- Small cache
270
'lookup.cache.ttl' = '60s', -- Short TTL
271
'lookup.max-retries' = '5' -- More retries for network issues
272
);
273
```
274
275
### Lookup Monitoring and Troubleshooting
276
277
**Common Issues and Solutions**:
278
279
1. **High Lookup Latency**:
280
- Enable async mode: `'lookup.async' = 'true'`
281
- Increase cache size: `'lookup.cache.max-rows' = '50000'`
282
- Optimize HBase region distribution
283
284
2. **Cache Miss Rate**:
285
- Analyze lookup key distribution
286
- Adjust cache size based on working set
287
- Consider TTL reduction for frequently changing data
288
289
3. **Connection Failures**:
290
- Increase retry count: `'lookup.max-retries' = '10'`
291
- Verify Zookeeper connectivity
292
- Check HBase region server health
293
294
4. **Memory Issues**:
295
- Reduce cache size: `'lookup.cache.max-rows' = '1000'`
296
- Monitor task heap usage
297
- Consider cache TTL reduction
298
299
**Monitoring SQL Example**:
300
```sql
301
-- Monitor lookup join performance
302
SELECT
303
window_start,
304
window_end,
305
COUNT(*) AS total_lookups,
306
COUNT(lookup_table.rowkey) AS successful_lookups,
307
(COUNT(lookup_table.rowkey) * 100.0 / COUNT(*)) AS success_rate
308
FROM TABLE(
309
TUMBLE(TABLE stream_with_lookups, DESCRIPTOR(proc_time), INTERVAL '1' MINUTE)
310
) s
311
LEFT JOIN lookup_table FOR SYSTEM_TIME AS OF s.proc_time AS lookup_table
312
ON s.lookup_key = lookup_table.rowkey
313
GROUP BY window_start, window_end;
314
```