0
# Compression and Performance
1
2
Built-in support for data compression algorithms including LZ4 and ZSTD with configurable block sizes, plus optional Cython extensions for performance-critical operations. Compression significantly reduces network traffic and can improve query performance for network-bound workloads.
3
4
## Capabilities
5
6
### Compression Algorithm Support
7
8
Multiple compression algorithms with different performance and compression ratio characteristics.
9
10
```python { .api }
11
# Compression algorithms (require optional dependencies)
12
# LZ4: Fast compression with good performance
13
# LZ4HC: Higher compression ratio, slower than LZ4
14
# ZSTD: Excellent compression ratio with good performance
15
16
# Installation requirements
17
# pip install clickhouse-driver[lz4] # For LZ4 support
18
# pip install clickhouse-driver[zstd] # For ZSTD support
19
# pip install clickhouse-driver[lz4,zstd] # For both algorithms
20
```
21
22
### Client Compression Configuration
23
24
Configure compression at client level for all connections and queries.
25
26
```python { .api }
27
# Enable compression in client constructor
28
client = Client(
29
host='localhost',
30
compression=True, # Enable compression (auto-detect algorithm)
31
compress_block_size=1048576, # Compression block size (1MB default)
32
compression_algorithm='lz4' # Specific algorithm: 'lz4', 'lz4hc', 'zstd'
33
)
34
35
# Alternative configuration styles
36
client = Client('localhost', compression='lz4') # Enable LZ4
37
client = Client('localhost', compression='zstd') # Enable ZSTD
38
client = Client('localhost', compression='lz4hc') # Enable LZ4HC
39
```
40
41
### Query-Level Compression Settings
42
43
Override compression settings for individual queries through ClickHouse settings.
44
45
```python { .api }
46
# Query-specific compression settings
47
result = client.execute(
48
'SELECT * FROM large_table',
49
settings={
50
'network_compression_method': 'zstd', # Algorithm for this query
51
'network_zstd_compression_level': 3, # ZSTD compression level (1-22)
52
'compress': 1, # Enable compression
53
'decompress': 1 # Enable decompression
54
}
55
)
56
57
# Available compression settings
58
compression_settings = {
59
'network_compression_method': 'lz4|lz4hc|zstd', # Algorithm choice
60
'network_zstd_compression_level': 1, # ZSTD level (1-22, default: 1)
61
'compress': 1, # Enable compression (0/1)
62
'decompress': 1 # Enable decompression (0/1)
63
}
64
```
65
66
### Compression Algorithm Classes
67
68
Low-level compression interfaces for advanced usage (not typically needed for normal operations).
69
70
```python { .api }
71
# Base compression interfaces (internal use)
72
class Compressor:
73
"""Base compressor interface."""
74
75
def compress(self, data):
76
"""
77
Compress data block.
78
79
Parameters:
80
- data: bytes to compress
81
82
Returns:
83
- bytes: compressed data
84
"""
85
86
class Decompressor:
87
"""Base decompressor interface."""
88
89
def decompress(self, data):
90
"""
91
Decompress data block.
92
93
Parameters:
94
- data: compressed bytes
95
96
Returns:
97
- bytes: decompressed data
98
"""
99
100
# Algorithm-specific implementations
101
# LZ4Compressor, LZ4Decompressor
102
# LZ4HCCompressor, LZ4HCDecompressor
103
# ZSTDCompressor, ZSTDDecompressor
104
```
105
106
### Performance Optimization Features
107
108
Optional Cython extensions and performance tuning for high-throughput workloads.
109
110
```python { .api }
111
# Cython extensions (automatically used if available)
112
# Built during installation for performance-critical operations:
113
# - bufferedreader: Fast binary data reading
114
# - bufferedwriter: Fast binary data writing
115
# - varint: Variable integer encoding/decoding
116
# - columns.largeint: Large integer processing
117
118
# Performance settings
119
client = Client(
120
'localhost',
121
compress_block_size=4194304, # Larger blocks: better compression, more memory
122
send_receive_timeout=300, # Longer timeout for large compressed data
123
sync_request_timeout=60 # Timeout for synchronous operations
124
)
125
```
126
127
## Compression Performance Characteristics
128
129
### Algorithm Comparison
130
131
| Algorithm | Compression Speed | Decompression Speed | Compression Ratio | Use Case |
132
|-----------|------------------|---------------------|-------------------|----------|
133
| LZ4 | Very Fast | Very Fast | Good | Real-time, low latency |
134
| LZ4HC | Moderate | Very Fast | Better | Balanced performance |
135
| ZSTD | Fast | Fast | Excellent | Best overall choice |
136
137
### Block Size Impact
138
139
```python
140
# Small blocks (64KB - 256KB)
141
# - Lower memory usage
142
# - Faster response times
143
# - Less compression efficiency
144
145
client_small_blocks = Client(
146
'localhost',
147
compression='lz4',
148
compress_block_size=65536 # 64KB blocks
149
)
150
151
# Large blocks (1MB - 4MB)
152
# - Better compression ratios
153
# - Higher memory usage
154
# - Potential latency increase
155
156
client_large_blocks = Client(
157
'localhost',
158
compression='zstd',
159
compress_block_size=4194304 # 4MB blocks
160
)
161
```
162
163
## Usage Examples
164
165
### Basic Compression Setup
166
167
```python
168
from clickhouse_driver import Client
169
170
# Enable LZ4 compression (requires: pip install clickhouse-driver[lz4])
171
client = Client(
172
host='remote-server.example.com',
173
compression='lz4',
174
compress_block_size=1048576 # 1MB blocks
175
)
176
177
# Query with compression (automatically applied)
178
result = client.execute('SELECT * FROM large_table LIMIT 10000')
179
print(f"Retrieved {len(result)} rows with LZ4 compression")
180
181
client.disconnect()
182
```
183
184
### ZSTD High Compression
185
186
```python
187
# Enable ZSTD for best compression ratio (requires: pip install clickhouse-driver[zstd])
188
client = Client(
189
host='slow-network-server.example.com',
190
compression='zstd',
191
compress_block_size=2097152 # 2MB blocks for better compression
192
)
193
194
# Large data transfer with high compression
195
result = client.execute('''
196
SELECT user_id, event_data, timestamp, metadata
197
FROM user_events
198
WHERE date >= today() - 30
199
''', settings={
200
'network_zstd_compression_level': 6 # Higher compression level
201
})
202
203
print(f"Retrieved {len(result)} events with ZSTD compression")
204
```
205
206
### Adaptive Compression Strategy
207
208
```python
209
import time
210
from clickhouse_driver import Client
211
212
def create_optimized_client(server_type='local'):
213
"""Create client with compression optimized for server type."""
214
215
if server_type == 'local':
216
# Local server: minimal compression for lowest latency
217
return Client(
218
'localhost',
219
compression=False # No compression overhead
220
)
221
elif server_type == 'remote_fast':
222
# Fast remote connection: balanced compression
223
return Client(
224
'remote-server.example.com',
225
compression='lz4',
226
compress_block_size=1048576
227
)
228
elif server_type == 'remote_slow':
229
# Slow/expensive connection: maximum compression
230
return Client(
231
'slow-server.example.com',
232
compression='zstd',
233
compress_block_size=4194304,
234
settings={
235
'network_zstd_compression_level': 9
236
}
237
)
238
239
# Usage based on deployment
240
client = create_optimized_client('remote_slow')
241
```
242
243
### Compression Performance Measurement
244
245
```python
246
import time
247
from clickhouse_driver import Client
248
249
def benchmark_compression(query, algorithms=['none', 'lz4', 'zstd']):
250
"""Benchmark query performance with different compression algorithms."""
251
252
results = {}
253
254
for algorithm in algorithms:
255
if algorithm == 'none':
256
client = Client('remote-server.example.com', compression=False)
257
else:
258
client = Client('remote-server.example.com', compression=algorithm)
259
260
start_time = time.time()
261
result = client.execute(query)
262
end_time = time.time()
263
264
results[algorithm] = {
265
'duration': end_time - start_time,
266
'rows': len(result),
267
'rows_per_second': len(result) / (end_time - start_time)
268
}
269
270
client.disconnect()
271
272
return results
273
274
# Benchmark large query
275
query = 'SELECT * FROM large_table WHERE date >= today() - 7'
276
benchmark_results = benchmark_compression(query)
277
278
for algorithm, metrics in benchmark_results.items():
279
print(f"{algorithm}: {metrics['duration']:.2f}s, "
280
f"{metrics['rows_per_second']:.0f} rows/sec")
281
```
282
283
### Streaming with Compression
284
285
```python
286
# Large streaming query with compression
287
client = Client(
288
'remote-server.example.com',
289
compression='zstd',
290
compress_block_size=2097152 # 2MB blocks
291
)
292
293
total_rows = 0
294
start_time = time.time()
295
296
# Stream large dataset with compression
297
for block in client.execute_iter('''
298
SELECT user_id, action, timestamp, details
299
FROM user_activity_log
300
WHERE date >= today() - 90
301
'''):
302
# Process each compressed block
303
for row in block:
304
process_user_activity(row)
305
total_rows += 1
306
307
if total_rows % 100000 == 0:
308
elapsed = time.time() - start_time
309
rate = total_rows / elapsed
310
print(f"Processed {total_rows:,} rows at {rate:.0f} rows/sec")
311
312
print(f"Total: {total_rows:,} rows processed with ZSTD compression")
313
```
314
315
### INSERT Performance with Compression
316
317
```python
318
import random
319
from datetime import datetime, timedelta
320
321
# Large INSERT with compression
322
client = Client(
323
'remote-server.example.com',
324
compression='lz4', # LZ4 for faster INSERT performance
325
compress_block_size=1048576
326
)
327
328
# Generate large dataset
329
def generate_sample_data(count):
330
base_date = datetime.now() - timedelta(days=30)
331
332
for i in range(count):
333
yield (
334
i,
335
f"user_{random.randint(1000, 9999)}",
336
base_date + timedelta(seconds=random.randint(0, 2592000)),
337
random.uniform(10.0, 1000.0),
338
random.choice(['A', 'B', 'C', 'D'])
339
)
340
341
# Create table
342
client.execute('''
343
CREATE TABLE IF NOT EXISTS performance_test (
344
id UInt32,
345
username String,
346
created_at DateTime,
347
value Float64,
348
category Enum8('A'=1, 'B'=2, 'C'=3, 'D'=4)
349
) ENGINE = MergeTree()
350
ORDER BY (id, created_at)
351
''')
352
353
# Bulk insert with compression
354
print("Starting bulk insert with LZ4 compression...")
355
start_time = time.time()
356
357
# Insert in batches for optimal performance
358
batch_size = 100000
359
total_inserted = 0
360
361
for batch_start in range(0, 1000000, batch_size):
362
batch_data = list(generate_sample_data(batch_size))
363
364
client.execute(
365
'INSERT INTO performance_test VALUES',
366
batch_data,
367
settings={'async_insert': 1} # Async inserts for better performance
368
)
369
370
total_inserted += len(batch_data)
371
elapsed = time.time() - start_time
372
rate = total_inserted / elapsed
373
374
print(f"Inserted {total_inserted:,} rows at {rate:.0f} rows/sec")
375
376
print(f"Insert completed: {total_inserted:,} rows in {elapsed:.2f}s")
377
```
378
379
### Connection URL with Compression
380
381
```python
382
# Enable compression via connection URL
383
client = Client.from_url(
384
'clickhouse://user:pass@remote-server.example.com:9000/mydb'
385
'?compression=zstd&compress_block_size=2097152'
386
)
387
388
# URL parameters for compression
389
# compression=lz4|lz4hc|zstd
390
# compress_block_size=1048576
391
# secure=1 (for SSL + compression)
392
```
393
394
### Troubleshooting Compression Issues
395
396
```python
397
from clickhouse_driver import Client
398
from clickhouse_driver.errors import UnknownCompressionMethod
399
400
def test_compression_support():
401
"""Test which compression algorithms are available."""
402
403
algorithms = ['lz4', 'lz4hc', 'zstd']
404
supported = []
405
406
for algorithm in algorithms:
407
try:
408
client = Client('localhost', compression=algorithm)
409
client.execute('SELECT 1')
410
supported.append(algorithm)
411
client.disconnect()
412
print(f"✓ {algorithm} compression supported")
413
414
except UnknownCompressionMethod:
415
print(f"✗ {algorithm} compression not available")
416
print(f" Install with: pip install clickhouse-driver[{algorithm}]")
417
except Exception as e:
418
print(f"? {algorithm} test failed: {e}")
419
420
return supported
421
422
# Check compression support
423
supported_algorithms = test_compression_support()
424
print(f"Supported compression algorithms: {supported_algorithms}")
425
426
# Fall back to uncompressed if needed
427
if supported_algorithms:
428
best_algorithm = supported_algorithms[0] # Use first available
429
client = Client('remote-server.example.com', compression=best_algorithm)
430
else:
431
client = Client('remote-server.example.com', compression=False)
432
print("Using uncompressed connection")
433
```