or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

client-interface.mdcompression.mddata-types.mddbapi-interface.mderror-handling.mdindex.mdresults-processing.md

compression.mddocs/

0

# Compression and Performance

1

2

Built-in support for data compression algorithms including LZ4 and ZSTD with configurable block sizes, plus optional Cython extensions for performance-critical operations. Compression significantly reduces network traffic and can improve query performance for network-bound workloads.

3

4

## Capabilities

5

6

### Compression Algorithm Support

7

8

Multiple compression algorithms with different performance and compression ratio characteristics.

9

10

```python { .api }

11

# Compression algorithms (require optional dependencies)

12

# LZ4: Fast compression with good performance

13

# LZ4HC: Higher compression ratio, slower than LZ4

14

# ZSTD: Excellent compression ratio with good performance

15

16

# Installation requirements

17

# pip install clickhouse-driver[lz4] # For LZ4 support

18

# pip install clickhouse-driver[zstd] # For ZSTD support

19

# pip install clickhouse-driver[lz4,zstd] # For both algorithms

20

```

21

22

### Client Compression Configuration

23

24

Configure compression at client level for all connections and queries.

25

26

```python { .api }

27

# Enable compression in client constructor

28

client = Client(

29

host='localhost',

30

compression=True, # Enable compression (auto-detect algorithm)

31

compress_block_size=1048576, # Compression block size (1MB default)

32

compression_algorithm='lz4' # Specific algorithm: 'lz4', 'lz4hc', 'zstd'

33

)

34

35

# Alternative configuration styles

36

client = Client('localhost', compression='lz4') # Enable LZ4

37

client = Client('localhost', compression='zstd') # Enable ZSTD

38

client = Client('localhost', compression='lz4hc') # Enable LZ4HC

39

```

40

41

### Query-Level Compression Settings

42

43

Override compression settings for individual queries through ClickHouse settings.

44

45

```python { .api }

46

# Query-specific compression settings

47

result = client.execute(

48

'SELECT * FROM large_table',

49

settings={

50

'network_compression_method': 'zstd', # Algorithm for this query

51

'network_zstd_compression_level': 3, # ZSTD compression level (1-22)

52

'compress': 1, # Enable compression

53

'decompress': 1 # Enable decompression

54

}

55

)

56

57

# Available compression settings

58

compression_settings = {

59

'network_compression_method': 'lz4|lz4hc|zstd', # Algorithm choice

60

'network_zstd_compression_level': 1, # ZSTD level (1-22, default: 1)

61

'compress': 1, # Enable compression (0/1)

62

'decompress': 1 # Enable decompression (0/1)

63

}

64

```

65

66

### Compression Algorithm Classes

67

68

Low-level compression interfaces for advanced usage (not typically needed for normal operations).

69

70

```python { .api }

71

# Base compression interfaces (internal use)

72

class Compressor:

73

"""Base compressor interface."""

74

75

def compress(self, data):

76

"""

77

Compress data block.

78

79

Parameters:

80

- data: bytes to compress

81

82

Returns:

83

- bytes: compressed data

84

"""

85

86

class Decompressor:

87

"""Base decompressor interface."""

88

89

def decompress(self, data):

90

"""

91

Decompress data block.

92

93

Parameters:

94

- data: compressed bytes

95

96

Returns:

97

- bytes: decompressed data

98

"""

99

100

# Algorithm-specific implementations

101

# LZ4Compressor, LZ4Decompressor

102

# LZ4HCCompressor, LZ4HCDecompressor

103

# ZSTDCompressor, ZSTDDecompressor

104

```

105

106

### Performance Optimization Features

107

108

Optional Cython extensions and performance tuning for high-throughput workloads.

109

110

```python { .api }

111

# Cython extensions (automatically used if available)

112

# Built during installation for performance-critical operations:

113

# - bufferedreader: Fast binary data reading

114

# - bufferedwriter: Fast binary data writing

115

# - varint: Variable integer encoding/decoding

116

# - columns.largeint: Large integer processing

117

118

# Performance settings

119

client = Client(

120

'localhost',

121

compress_block_size=4194304, # Larger blocks: better compression, more memory

122

send_receive_timeout=300, # Longer timeout for large compressed data

123

sync_request_timeout=60 # Timeout for synchronous operations

124

)

125

```

126

127

## Compression Performance Characteristics

128

129

### Algorithm Comparison

130

131

| Algorithm | Compression Speed | Decompression Speed | Compression Ratio | Use Case |

132

|-----------|------------------|---------------------|-------------------|----------|

133

| LZ4 | Very Fast | Very Fast | Good | Real-time, low latency |

134

| LZ4HC | Moderate | Very Fast | Better | Balanced performance |

135

| ZSTD | Fast | Fast | Excellent | Best overall choice |

136

137

### Block Size Impact

138

139

```python

140

# Small blocks (64KB - 256KB)

141

# - Lower memory usage

142

# - Faster response times

143

# - Less compression efficiency

144

145

client_small_blocks = Client(

146

'localhost',

147

compression='lz4',

148

compress_block_size=65536 # 64KB blocks

149

)

150

151

# Large blocks (1MB - 4MB)

152

# - Better compression ratios

153

# - Higher memory usage

154

# - Potential latency increase

155

156

client_large_blocks = Client(

157

'localhost',

158

compression='zstd',

159

compress_block_size=4194304 # 4MB blocks

160

)

161

```

162

163

## Usage Examples

164

165

### Basic Compression Setup

166

167

```python

168

from clickhouse_driver import Client

169

170

# Enable LZ4 compression (requires: pip install clickhouse-driver[lz4])

171

client = Client(

172

host='remote-server.example.com',

173

compression='lz4',

174

compress_block_size=1048576 # 1MB blocks

175

)

176

177

# Query with compression (automatically applied)

178

result = client.execute('SELECT * FROM large_table LIMIT 10000')

179

print(f"Retrieved {len(result)} rows with LZ4 compression")

180

181

client.disconnect()

182

```

183

184

### ZSTD High Compression

185

186

```python

187

# Enable ZSTD for best compression ratio (requires: pip install clickhouse-driver[zstd])

188

client = Client(

189

host='slow-network-server.example.com',

190

compression='zstd',

191

compress_block_size=2097152 # 2MB blocks for better compression

192

)

193

194

# Large data transfer with high compression

195

result = client.execute('''

196

SELECT user_id, event_data, timestamp, metadata

197

FROM user_events

198

WHERE date >= today() - 30

199

''', settings={

200

'network_zstd_compression_level': 6 # Higher compression level

201

})

202

203

print(f"Retrieved {len(result)} events with ZSTD compression")

204

```

205

206

### Adaptive Compression Strategy

207

208

```python

209

import time

210

from clickhouse_driver import Client

211

212

def create_optimized_client(server_type='local'):

213

"""Create client with compression optimized for server type."""

214

215

if server_type == 'local':

216

# Local server: minimal compression for lowest latency

217

return Client(

218

'localhost',

219

compression=False # No compression overhead

220

)

221

elif server_type == 'remote_fast':

222

# Fast remote connection: balanced compression

223

return Client(

224

'remote-server.example.com',

225

compression='lz4',

226

compress_block_size=1048576

227

)

228

elif server_type == 'remote_slow':

229

# Slow/expensive connection: maximum compression

230

return Client(

231

'slow-server.example.com',

232

compression='zstd',

233

compress_block_size=4194304,

234

settings={

235

'network_zstd_compression_level': 9

236

}

237

)

238

239

# Usage based on deployment

240

client = create_optimized_client('remote_slow')

241

```

242

243

### Compression Performance Measurement

244

245

```python

246

import time

247

from clickhouse_driver import Client

248

249

def benchmark_compression(query, algorithms=['none', 'lz4', 'zstd']):

250

"""Benchmark query performance with different compression algorithms."""

251

252

results = {}

253

254

for algorithm in algorithms:

255

if algorithm == 'none':

256

client = Client('remote-server.example.com', compression=False)

257

else:

258

client = Client('remote-server.example.com', compression=algorithm)

259

260

start_time = time.time()

261

result = client.execute(query)

262

end_time = time.time()

263

264

results[algorithm] = {

265

'duration': end_time - start_time,

266

'rows': len(result),

267

'rows_per_second': len(result) / (end_time - start_time)

268

}

269

270

client.disconnect()

271

272

return results

273

274

# Benchmark large query

275

query = 'SELECT * FROM large_table WHERE date >= today() - 7'

276

benchmark_results = benchmark_compression(query)

277

278

for algorithm, metrics in benchmark_results.items():

279

print(f"{algorithm}: {metrics['duration']:.2f}s, "

280

f"{metrics['rows_per_second']:.0f} rows/sec")

281

```

282

283

### Streaming with Compression

284

285

```python

286

# Large streaming query with compression

287

client = Client(

288

'remote-server.example.com',

289

compression='zstd',

290

compress_block_size=2097152 # 2MB blocks

291

)

292

293

total_rows = 0

294

start_time = time.time()

295

296

# Stream large dataset with compression

297

for block in client.execute_iter('''

298

SELECT user_id, action, timestamp, details

299

FROM user_activity_log

300

WHERE date >= today() - 90

301

'''):

302

# Process each compressed block

303

for row in block:

304

process_user_activity(row)

305

total_rows += 1

306

307

if total_rows % 100000 == 0:

308

elapsed = time.time() - start_time

309

rate = total_rows / elapsed

310

print(f"Processed {total_rows:,} rows at {rate:.0f} rows/sec")

311

312

print(f"Total: {total_rows:,} rows processed with ZSTD compression")

313

```

314

315

### INSERT Performance with Compression

316

317

```python

318

import random

319

from datetime import datetime, timedelta

320

321

# Large INSERT with compression

322

client = Client(

323

'remote-server.example.com',

324

compression='lz4', # LZ4 for faster INSERT performance

325

compress_block_size=1048576

326

)

327

328

# Generate large dataset

329

def generate_sample_data(count):

330

base_date = datetime.now() - timedelta(days=30)

331

332

for i in range(count):

333

yield (

334

i,

335

f"user_{random.randint(1000, 9999)}",

336

base_date + timedelta(seconds=random.randint(0, 2592000)),

337

random.uniform(10.0, 1000.0),

338

random.choice(['A', 'B', 'C', 'D'])

339

)

340

341

# Create table

342

client.execute('''

343

CREATE TABLE IF NOT EXISTS performance_test (

344

id UInt32,

345

username String,

346

created_at DateTime,

347

value Float64,

348

category Enum8('A'=1, 'B'=2, 'C'=3, 'D'=4)

349

) ENGINE = MergeTree()

350

ORDER BY (id, created_at)

351

''')

352

353

# Bulk insert with compression

354

print("Starting bulk insert with LZ4 compression...")

355

start_time = time.time()

356

357

# Insert in batches for optimal performance

358

batch_size = 100000

359

total_inserted = 0

360

361

for batch_start in range(0, 1000000, batch_size):

362

batch_data = list(generate_sample_data(batch_size))

363

364

client.execute(

365

'INSERT INTO performance_test VALUES',

366

batch_data,

367

settings={'async_insert': 1} # Async inserts for better performance

368

)

369

370

total_inserted += len(batch_data)

371

elapsed = time.time() - start_time

372

rate = total_inserted / elapsed

373

374

print(f"Inserted {total_inserted:,} rows at {rate:.0f} rows/sec")

375

376

print(f"Insert completed: {total_inserted:,} rows in {elapsed:.2f}s")

377

```

378

379

### Connection URL with Compression

380

381

```python

382

# Enable compression via connection URL

383

client = Client.from_url(

384

'clickhouse://user:pass@remote-server.example.com:9000/mydb'

385

'?compression=zstd&compress_block_size=2097152'

386

)

387

388

# URL parameters for compression

389

# compression=lz4|lz4hc|zstd

390

# compress_block_size=1048576

391

# secure=1 (for SSL + compression)

392

```

393

394

### Troubleshooting Compression Issues

395

396

```python

397

from clickhouse_driver import Client

398

from clickhouse_driver.errors import UnknownCompressionMethod

399

400

def test_compression_support():

401

"""Test which compression algorithms are available."""

402

403

algorithms = ['lz4', 'lz4hc', 'zstd']

404

supported = []

405

406

for algorithm in algorithms:

407

try:

408

client = Client('localhost', compression=algorithm)

409

client.execute('SELECT 1')

410

supported.append(algorithm)

411

client.disconnect()

412

print(f"✓ {algorithm} compression supported")

413

414

except UnknownCompressionMethod:

415

print(f"✗ {algorithm} compression not available")

416

print(f" Install with: pip install clickhouse-driver[{algorithm}]")

417

except Exception as e:

418

print(f"? {algorithm} test failed: {e}")

419

420

return supported

421

422

# Check compression support

423

supported_algorithms = test_compression_support()

424

print(f"Supported compression algorithms: {supported_algorithms}")

425

426

# Fall back to uncompressed if needed

427

if supported_algorithms:

428

best_algorithm = supported_algorithms[0] # Use first available

429

client = Client('remote-server.example.com', compression=best_algorithm)

430

else:

431

client = Client('remote-server.example.com', compression=False)

432

print("Using uncompressed connection")

433

```