or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

connection-config.mddata-types.mdindex.mdlookup-operations.mdsink-operations.mdsql-ddl.md

lookup-operations.mddocs/

0

# Lookup Operations and Caching

1

2

Temporal table join functionality with both synchronous and asynchronous lookup modes, including comprehensive caching strategies for performance optimization.

3

4

## Capabilities

5

6

### Temporal Table Joins

7

8

Performs lookups against HBase table for enriching streaming data using temporal join semantics.

9

10

```sql { .api }

11

SELECT

12

stream_table.field,

13

hbase_table.column_family.qualifier

14

FROM stream_table s

15

JOIN hbase_table FOR SYSTEM_TIME AS OF s.proc_time AS h

16

ON s.lookup_key = h.rowkey

17

WHERE additional_conditions;

18

```

19

20

**Requirements**:

21

- Lookup join must use `FOR SYSTEM_TIME AS OF` syntax

22

- Join condition must be based on row key equality

23

- Processing time field (`proc_time`) must be available in stream

24

25

**Usage Examples**:

26

27

```sql

28

-- User profile enrichment

29

SELECT

30

event.user_id,

31

event.action,

32

event.timestamp,

33

profile.info.name AS user_name,

34

profile.info.email AS user_email,

35

profile.preferences.language

36

FROM user_events event

37

JOIN user_profiles FOR SYSTEM_TIME AS OF event.proc_time AS profile

38

ON event.user_id = profile.user_id;

39

40

-- Product catalog lookup for sales

41

SELECT

42

sale.transaction_id,

43

sale.quantity,

44

catalog.basic_info.name AS product_name,

45

catalog.basic_info.price AS unit_price,

46

catalog.basic_info.price * sale.quantity AS total_amount

47

FROM sales_stream sale

48

JOIN product_catalog FOR SYSTEM_TIME AS OF sale.proc_time AS catalog

49

ON sale.product_id = catalog.product_id

50

WHERE catalog.inventory.stock_count > 0;

51

```

52

53

### Synchronous Lookup Mode

54

55

Default lookup mode that performs blocking lookups with immediate results.

56

57

```sql { .api }

58

WITH (

59

'lookup.async' = 'false' -- Synchronous mode (default)

60

)

61

```

62

63

**Characteristics**:

64

- Blocking operation that waits for HBase response

65

- Lower throughput but predictable latency

66

- Simpler error handling and debugging

67

- Suitable for low-to-medium volume streams

68

69

**Usage Examples**:

70

71

```sql

72

-- Synchronous lookup table for reference data

73

CREATE TABLE reference_lookup (

74

code STRING,

75

reference ROW<description STRING, category STRING, active BOOLEAN>,

76

PRIMARY KEY (code) NOT ENFORCED

77

) WITH (

78

'connector' = 'hbase-2.2',

79

'table-name' = 'reference_codes',

80

'zookeeper.quorum' = 'localhost:2181',

81

'lookup.async' = 'false',

82

'lookup.max-retries' = '3'

83

);

84

85

-- Using synchronous lookup in join

86

SELECT

87

transaction.id,

88

transaction.amount,

89

ref.reference.description AS transaction_type

90

FROM transaction_stream transaction

91

JOIN reference_lookup FOR SYSTEM_TIME AS OF transaction.proc_time AS ref

92

ON transaction.type_code = ref.code;

93

```

94

95

### Asynchronous Lookup Mode

96

97

High-performance lookup mode using non-blocking operations for improved throughput.

98

99

```sql { .api }

100

WITH (

101

'lookup.async' = 'true' -- Asynchronous mode

102

)

103

```

104

105

**Characteristics**:

106

- Non-blocking operations with callback-based results

107

- Higher throughput for high-volume streams

108

- More complex error handling

109

- Requires proper backpressure management

110

111

**Usage Examples**:

112

113

```sql

114

-- High-throughput async lookup table

115

CREATE TABLE async_user_lookup (

116

user_id STRING,

117

profile ROW<name STRING, email STRING, tier STRING>,

118

activity ROW<last_login TIMESTAMP(3), total_orders INT>,

119

PRIMARY KEY (user_id) NOT ENFORCED

120

) WITH (

121

'connector' = 'hbase-2.2',

122

'table-name' = 'user_profiles',

123

'zookeeper.quorum' = 'localhost:2181',

124

'lookup.async' = 'true',

125

'lookup.max-retries' = '5'

126

);

127

128

-- Using async lookup for high-volume stream

129

SELECT

130

click.event_id,

131

click.page_url,

132

click.timestamp,

133

user.profile.name AS user_name,

134

user.profile.tier AS user_tier

135

FROM clickstream click

136

JOIN async_user_lookup FOR SYSTEM_TIME AS OF click.proc_time AS user

137

ON click.user_id = user.user_id;

138

```

139

140

### Caching Configuration

141

142

Caching strategies to reduce HBase lookup load and improve performance.

143

144

```sql { .api }

145

WITH (

146

'lookup.cache.max-rows' = '10000', -- Maximum cached entries

147

'lookup.cache.ttl' = '300s' -- Cache time-to-live

148

)

149

```

150

151

**Parameters**:

152

- `lookup.cache.max-rows`: Maximum number of lookup results to cache (-1 disables caching)

153

- `lookup.cache.ttl`: Cache entry expiration time (0 means no expiration)

154

155

**Cache Behavior**:

156

- LRU (Least Recently Used) eviction when cache is full

157

- TTL-based expiration for data freshness

158

- Cache is per lookup function instance (per task)

159

- Memory usage proportional to cache size and record size

160

161

**Usage Examples**:

162

163

```sql

164

-- Long-lived reference data with large cache

165

CREATE TABLE reference_cache (

166

country_code STRING,

167

info ROW<name STRING, currency STRING, timezone STRING>,

168

PRIMARY KEY (country_code) NOT ENFORCED

169

) WITH (

170

'connector' = 'hbase-2.2',

171

'table-name' = 'country_reference',

172

'zookeeper.quorum' = 'localhost:2181',

173

'lookup.cache.max-rows' = '50000',

174

'lookup.cache.ttl' = '3600s' -- 1 hour cache

175

);

176

177

-- Frequently changing data with short cache

178

CREATE TABLE price_cache (

179

symbol STRING,

180

pricing ROW<current_price DECIMAL(10,4), last_update TIMESTAMP(3)>,

181

PRIMARY KEY (symbol) NOT ENFORCED

182

) WITH (

183

'connector' = 'hbase-2.2',

184

'table-name' = 'stock_prices',

185

'zookeeper.quorum' = 'localhost:2181',

186

'lookup.cache.max-rows' = '1000',

187

'lookup.cache.ttl' = '30s' -- 30 second cache

188

);

189

```

190

191

### Retry Configuration

192

193

Error handling and retry logic for failed lookup operations.

194

195

```sql { .api }

196

WITH (

197

'lookup.max-retries' = '3' -- Maximum retry attempts (default: 3)

198

)

199

```

200

201

**Retry Behavior**:

202

- Exponential backoff between retry attempts

203

- Retries are attempted for transient failures (connection issues, timeouts)

204

- Non-retryable errors (table not found, invalid row key) fail immediately

205

- Failed lookups after all retries result in null values

206

207

**Usage Examples**:

208

209

```sql

210

-- Robust lookup with extensive retry

211

CREATE TABLE robust_lookup (

212

id STRING,

213

data ROW<value STRING, timestamp BIGINT>,

214

PRIMARY KEY (id) NOT ENFORCED

215

) WITH (

216

'connector' = 'hbase-2.2',

217

'table-name' = 'critical_data',

218

'zookeeper.quorum' = 'localhost:2181',

219

'lookup.max-retries' = '10' -- High retry count for critical data

220

);

221

222

-- Fast-fail lookup for non-critical data

223

CREATE TABLE optional_lookup (

224

ref_id STRING,

225

optional_data ROW<description STRING>,

226

PRIMARY KEY (ref_id) NOT ENFORCED

227

) WITH (

228

'connector' = 'hbase-2.2',

229

'table-name' = 'optional_reference',

230

'zookeeper.quorum' = 'localhost:2181',

231

'lookup.max-retries' = '1' -- Minimal retry for optional data

232

);

233

```

234

235

### Lookup Performance Optimization

236

237

Strategies for optimizing lookup performance in different scenarios.

238

239

**High Cache Hit Rate Scenario**:

240

```sql

241

-- Optimize for frequently accessed reference data

242

CREATE TABLE frequent_lookup (

243

key STRING,

244

data ROW<value STRING>,

245

PRIMARY KEY (key) NOT ENFORCED

246

) WITH (

247

'connector' = 'hbase-2.2',

248

'table-name' = 'frequent_data',

249

'zookeeper.quorum' = 'localhost:2181',

250

'lookup.async' = 'true', -- Async for high throughput

251

'lookup.cache.max-rows' = '100000', -- Large cache

252

'lookup.cache.ttl' = '1800s', -- 30 min TTL

253

'lookup.max-retries' = '3'

254

);

255

```

256

257

**Low Cache Hit Rate Scenario**:

258

```sql

259

-- Optimize for unique lookups with minimal caching

260

CREATE TABLE unique_lookup (

261

uuid STRING,

262

session_data ROW<user_id STRING, created_at TIMESTAMP(3)>,

263

PRIMARY KEY (uuid) NOT ENFORCED

264

) WITH (

265

'connector' = 'hbase-2.2',

266

'table-name' = 'session_store',

267

'zookeeper.quorum' = 'localhost:2181',

268

'lookup.async' = 'true', -- Async for throughput

269

'lookup.cache.max-rows' = '1000', -- Small cache

270

'lookup.cache.ttl' = '60s', -- Short TTL

271

'lookup.max-retries' = '5' -- More retries for network issues

272

);

273

```

274

275

### Lookup Monitoring and Troubleshooting

276

277

**Common Issues and Solutions**:

278

279

1. **High Lookup Latency**:

280

- Enable async mode: `'lookup.async' = 'true'`

281

- Increase cache size: `'lookup.cache.max-rows' = '50000'`

282

- Optimize HBase region distribution

283

284

2. **Cache Miss Rate**:

285

- Analyze lookup key distribution

286

- Adjust cache size based on working set

287

- Consider TTL reduction for frequently changing data

288

289

3. **Connection Failures**:

290

- Increase retry count: `'lookup.max-retries' = '10'`

291

- Verify Zookeeper connectivity

292

- Check HBase region server health

293

294

4. **Memory Issues**:

295

- Reduce cache size: `'lookup.cache.max-rows' = '1000'`

296

- Monitor task heap usage

297

- Consider cache TTL reduction

298

299

**Monitoring SQL Example**:

300

```sql

301

-- Monitor lookup join performance

302

SELECT

303

window_start,

304

window_end,

305

COUNT(*) AS total_lookups,

306

COUNT(lookup_table.rowkey) AS successful_lookups,

307

(COUNT(lookup_table.rowkey) * 100.0 / COUNT(*)) AS success_rate

308

FROM TABLE(

309

TUMBLE(TABLE stream_with_lookups, DESCRIPTOR(proc_time), INTERVAL '1' MINUTE)

310

) s

311

LEFT JOIN lookup_table FOR SYSTEM_TIME AS OF s.proc_time AS lookup_table

312

ON s.lookup_key = lookup_table.rowkey

313

GROUP BY window_start, window_end;

314

```