0
# Disk Serialization
1
2
DiskCache provides flexible serialization engines that handle the conversion between Python objects and disk storage. The Disk class provides the base functionality with pickle-based serialization, while JSONDisk offers JSON serialization with compression for better compatibility and human-readable storage.
3
4
## Capabilities
5
6
### Disk - Base Serialization Engine
7
8
The base serialization class that handles conversion between Python objects and disk storage using pickle and multiple storage modes.
9
10
```python { .api }
11
class Disk:
12
def __init__(self, directory, min_file_size=0, pickle_protocol=0):
13
"""
14
Initialize disk serialization engine.
15
16
Args:
17
directory (str): Directory path for file storage
18
min_file_size (int): Minimum size for file storage. Default 0.
19
Values smaller than this are stored in database.
20
pickle_protocol (int): Pickle protocol version. Default 0 (most compatible).
21
"""
22
23
@property
24
def directory(self):
25
"""Directory path for file storage."""
26
27
@property
28
def min_file_size(self):
29
"""Minimum file size threshold for disk storage."""
30
31
@property
32
def pickle_protocol(self):
33
"""Pickle protocol version used for serialization."""
34
```
35
36
#### Key Serialization
37
38
Methods for serializing and deserializing cache keys.
39
40
```python { .api }
41
def hash(self, key):
42
"""
43
Compute portable hash for cache key.
44
45
Args:
46
key: Cache key (must be hashable)
47
48
Returns:
49
int: Hash value for the key
50
"""
51
52
def put(self, key):
53
"""
54
Serialize key for database storage.
55
56
Args:
57
key: Cache key to serialize
58
59
Returns:
60
Tuple of (database_key, raw_flag) where:
61
- database_key: Serialized key for database storage
62
- raw_flag: Boolean indicating if key is stored raw
63
"""
64
65
def get(self, key, raw):
66
"""
67
Deserialize key from database storage.
68
69
Args:
70
key: Serialized key from database
71
raw (bool): Whether key was stored raw
72
73
Returns:
74
Original Python key object
75
"""
76
```
77
78
#### Value Serialization
79
80
Methods for serializing and deserializing cache values with multiple storage modes.
81
82
```python { .api }
83
def store(self, value, read, key=UNKNOWN):
84
"""
85
Serialize value for storage.
86
87
Determines the best storage mode and location (database vs file)
88
based on value type and size.
89
90
Args:
91
value: Python value to serialize
92
read (bool): Whether value should be stored for file reading
93
key: Cache key (for filename generation)
94
95
Returns:
96
Tuple of (size, mode, filename, db_value) where:
97
- size: Storage size in bytes
98
- mode: Storage mode (0=none, 1=raw, 2=binary, 3=text, 4=pickle)
99
- filename: File path if stored as file, else None
100
- db_value: Serialized value for database storage
101
"""
102
103
def fetch(self, mode, filename, value, read):
104
"""
105
Deserialize value from storage.
106
107
Args:
108
mode (int): Storage mode used during store()
109
filename (str): File path if value stored as file
110
value: Database-stored value
111
read (bool): Whether to return file handle instead of value
112
113
Returns:
114
Original Python value, or file handle if read=True
115
"""
116
```
117
118
#### File Management
119
120
Methods for managing file storage and cleanup.
121
122
```python { .api }
123
def filename(self, key=UNKNOWN, value=UNKNOWN):
124
"""
125
Generate filename and full path for storage.
126
127
Args:
128
key: Cache key (optional, for unique naming)
129
value: Value to store (optional, for type-based naming)
130
131
Returns:
132
Tuple of (filename, full_path) where:
133
- filename: Generated filename
134
- full_path: Complete file path in directory
135
"""
136
137
def remove(self, file_path):
138
"""
139
Safely remove file from storage.
140
141
Args:
142
file_path (str): Path to file to remove
143
144
Returns:
145
bool: True if file was removed, False if it didn't exist
146
"""
147
```
148
149
### JSONDisk - JSON Serialization Engine
150
151
Enhanced serialization engine that uses JSON with optional compression, providing better compatibility and human-readable storage.
152
153
```python { .api }
154
class JSONDisk(Disk):
155
def __init__(self, directory, compress_level=1, **kwargs):
156
"""
157
Initialize JSON disk serialization engine.
158
159
Args:
160
directory (str): Directory path for file storage
161
compress_level (int): zlib compression level (0-9). Default 1.
162
0 = no compression, 9 = maximum compression
163
**kwargs: Additional arguments passed to Disk constructor
164
"""
165
166
@property
167
def compress_level(self):
168
"""zlib compression level (0-9)."""
169
170
@compress_level.setter
171
def compress_level(self, value):
172
"""Set zlib compression level."""
173
```
174
175
#### JSON Key Serialization
176
177
JSON-specific key serialization with compression.
178
179
```python { .api }
180
def put(self, key):
181
"""
182
Serialize key using JSON and optional compression.
183
184
Args:
185
key: Cache key to serialize (must be JSON-serializable)
186
187
Returns:
188
Tuple of (compressed_json_key, raw_flag)
189
190
Raises:
191
TypeError: If key is not JSON-serializable
192
"""
193
194
def get(self, key, raw):
195
"""
196
Deserialize key from compressed JSON.
197
198
Args:
199
key: Compressed JSON key from database
200
raw (bool): Whether key was stored raw
201
202
Returns:
203
Original Python key object
204
"""
205
```
206
207
#### JSON Value Serialization
208
209
JSON-specific value serialization with compression.
210
211
```python { .api }
212
def store(self, value, read, key=UNKNOWN):
213
"""
214
Serialize value using JSON and optional compression.
215
216
Args:
217
value: Python value to serialize (must be JSON-serializable)
218
read (bool): Whether value should be stored for file reading
219
key: Cache key (for filename generation)
220
221
Returns:
222
Tuple of (size, mode, filename, compressed_json_value)
223
224
Raises:
225
TypeError: If value is not JSON-serializable
226
"""
227
228
def fetch(self, mode, filename, value, read):
229
"""
230
Deserialize value from compressed JSON.
231
232
Args:
233
mode (int): Storage mode used during store()
234
filename (str): File path if value stored as file
235
value: Compressed JSON value from database
236
read (bool): Whether to return file handle instead of value
237
238
Returns:
239
Original Python value, or file handle if read=True
240
"""
241
```
242
243
## Storage Modes
244
245
DiskCache uses different storage modes based on value type and size:
246
247
- **Mode 0 (MODE_NONE)**: No value stored (used for keys without values)
248
- **Mode 1 (MODE_RAW)**: Raw bytes stored directly
249
- **Mode 2 (MODE_BINARY)**: Binary data stored as file
250
- **Mode 3 (MODE_TEXT)**: Text data stored as file
251
- **Mode 4 (MODE_PICKLE)**: Pickled objects stored in database or as file
252
253
## Usage Examples
254
255
### Basic Disk Usage
256
257
```python
258
import diskcache
259
260
# Create cache with default Disk serialization
261
cache = diskcache.Cache('/tmp/pickle_cache')
262
263
# Store various Python objects
264
cache.set('string', 'Hello, World!')
265
cache.set('number', 42)
266
cache.set('list', [1, 2, 3, 4, 5])
267
cache.set('dict', {'key': 'value', 'nested': {'a': 1}})
268
269
# Custom objects work with pickle
270
class Person:
271
def __init__(self, name, age):
272
self.name = name
273
self.age = age
274
275
def __repr__(self):
276
return f"Person('{self.name}', {self.age})"
277
278
cache.set('person', Person('Alice', 30))
279
280
# Retrieve objects
281
print(cache.get('string')) # 'Hello, World!'
282
print(cache.get('person')) # Person('Alice', 30)
283
```
284
285
### Custom Disk Configuration
286
287
```python
288
import diskcache
289
import pickle
290
291
# Custom Disk with specific settings
292
custom_disk = diskcache.Disk(
293
directory='/tmp/custom_serialization',
294
min_file_size=1024, # Store values >= 1KB as files
295
pickle_protocol=pickle.HIGHEST_PROTOCOL # Use latest pickle protocol
296
)
297
298
cache = diskcache.Cache('/tmp/custom_cache', disk=custom_disk)
299
300
# Small values stored in database
301
cache.set('small', 'small value')
302
303
# Large values stored as files
304
large_data = 'x' * 2000 # 2KB string
305
cache.set('large', large_data)
306
307
print(f"Small value: {cache.get('small')}")
308
print(f"Large value length: {len(cache.get('large'))}")
309
```
310
311
### JSONDisk Usage
312
313
```python
314
import diskcache
315
316
# Create cache with JSON serialization
317
json_disk = diskcache.JSONDisk('/tmp/json_serialization', compress_level=6)
318
cache = diskcache.Cache('/tmp/json_cache', disk=json_disk)
319
320
# Store JSON-compatible data
321
cache.set('config', {
322
'debug': True,
323
'max_connections': 100,
324
'allowed_ips': ['192.168.1.1', '10.0.0.1'],
325
'settings': {
326
'timeout': 30,
327
'retries': 3
328
}
329
})
330
331
cache.set('metrics', [
332
{'timestamp': 1609459200, 'value': 42.5},
333
{'timestamp': 1609459260, 'value': 38.2},
334
{'timestamp': 1609459320, 'value': 45.1}
335
])
336
337
# Retrieve and use data
338
config = cache.get('config')
339
print(f"Debug mode: {config['debug']}")
340
print(f"Max connections: {config['max_connections']}")
341
342
metrics = cache.get('metrics')
343
print(f"Latest metric: {metrics[-1]}")
344
```
345
346
### Compression Comparison
347
348
```python
349
import diskcache
350
import json
351
352
# Test different compression levels
353
test_data = {
354
'users': [{'id': i, 'name': f'user_{i}', 'data': 'x' * 100} for i in range(100)]
355
}
356
357
# No compression
358
disk_no_compress = diskcache.JSONDisk('/tmp/no_compress', compress_level=0)
359
cache_no_compress = diskcache.Cache('/tmp/cache_no_compress', disk=disk_no_compress)
360
361
# Maximum compression
362
disk_max_compress = diskcache.JSONDisk('/tmp/max_compress', compress_level=9)
363
cache_max_compress = diskcache.Cache('/tmp/cache_max_compress', disk=disk_max_compress)
364
365
# Store same data in both caches
366
cache_no_compress.set('data', test_data)
367
cache_max_compress.set('data', test_data)
368
369
# Compare storage sizes
370
size_no_compress = cache_no_compress.volume()
371
size_max_compress = cache_max_compress.volume()
372
373
print(f"No compression: {size_no_compress} bytes")
374
print(f"Max compression: {size_max_compress} bytes")
375
print(f"Compression ratio: {size_no_compress / size_max_compress:.2f}x")
376
```
377
378
### File-based Storage
379
380
```python
381
import diskcache
382
383
# Configure for file-based storage of large items
384
disk = diskcache.Disk('/tmp/file_storage', min_file_size=100) # Store items >= 100 bytes as files
385
cache = diskcache.Cache('/tmp/file_cache', disk=disk)
386
387
# Small item - stored in database
388
cache.set('small', 'tiny')
389
390
# Large item - stored as file
391
large_content = 'This is a large content string. ' * 10 # > 100 bytes
392
cache.set('large', large_content)
393
394
# Read mode - store as file for direct file access
395
with open('/tmp/sample.txt', 'w') as f:
396
f.write('Sample file content for reading')
397
398
with open('/tmp/sample.txt', 'rb') as f:
399
file_content = f.read()
400
401
cache.set('file_data', file_content, read=True)
402
403
# Get file handle instead of content
404
file_handle = cache.get('file_data', read=True)
405
if file_handle:
406
content = file_handle.read()
407
print(f"File content: {content.decode()}")
408
file_handle.close()
409
```
410
411
### Direct Disk Operations
412
413
```python
414
import diskcache
415
416
# Create disk instance directly
417
disk = diskcache.Disk('/tmp/direct_disk')
418
419
# Manual serialization operations
420
test_key = 'my_key'
421
test_value = {'data': [1, 2, 3], 'timestamp': 1609459200}
422
423
# Serialize key
424
db_key, raw_flag = disk.put(test_key)
425
print(f"Serialized key: {db_key}, raw: {raw_flag}")
426
427
# Serialize value
428
size, mode, filename, db_value = disk.store(test_value, read=False)
429
print(f"Value size: {size}, mode: {mode}, filename: {filename}")
430
431
# Deserialize key
432
original_key = disk.get(db_key, raw_flag)
433
print(f"Deserialized key: {original_key}")
434
435
# Deserialize value
436
original_value = disk.fetch(mode, filename, db_value, read=False)
437
print(f"Deserialized value: {original_value}")
438
439
# Generate filename
440
fname, full_path = disk.filename(key=test_key, value=test_value)
441
print(f"Generated filename: {fname}")
442
print(f"Full path: {full_path}")
443
```
444
445
### Custom Serialization
446
447
```python
448
import diskcache
449
import json
450
import pickle
451
452
class CustomDisk(diskcache.Disk):
453
"""Custom serialization that prefers JSON when possible, falls back to pickle."""
454
455
def store(self, value, read, key=diskcache.UNKNOWN):
456
# Try JSON first
457
try:
458
json_data = json.dumps(value, separators=(',', ':'))
459
# Store as text mode with custom marker
460
return len(json_data), 3, None, json_data.encode('utf-8')
461
except (TypeError, ValueError):
462
# Fall back to pickle for non-JSON-serializable objects
463
return super().store(value, read, key)
464
465
def fetch(self, mode, filename, value, read):
466
if mode == 3 and filename is None:
467
# Our custom JSON format
468
try:
469
json_str = value.decode('utf-8')
470
return json.loads(json_str)
471
except (UnicodeDecodeError, json.JSONDecodeError):
472
pass
473
474
# Fall back to parent implementation
475
return super().fetch(mode, filename, value, read)
476
477
# Use custom disk
478
custom_disk = CustomDisk('/tmp/custom_disk')
479
cache = diskcache.Cache('/tmp/custom_cache', disk=custom_disk)
480
481
# JSON-serializable data uses JSON
482
cache.set('json_data', {'numbers': [1, 2, 3], 'text': 'hello'})
483
484
# Non-JSON data uses pickle
485
class CustomClass:
486
def __init__(self, value):
487
self.value = value
488
def __repr__(self):
489
return f"CustomClass({self.value})"
490
491
cache.set('pickle_data', CustomClass(42))
492
493
# Retrieve both
494
json_result = cache.get('json_data')
495
pickle_result = cache.get('pickle_data')
496
497
print(f"JSON data: {json_result}")
498
print(f"Pickle data: {pickle_result}")
499
```
500
501
## Best Practices
502
503
### Choosing Serialization Method
504
505
```python
506
# Use Disk for maximum compatibility and Python object support
507
disk_cache = diskcache.Cache('/tmp/python_objects', disk=diskcache.Disk())
508
509
# Use JSONDisk for cross-language compatibility and human-readable storage
510
json_cache = diskcache.Cache('/tmp/json_data',
511
disk=diskcache.JSONDisk(compress_level=3))
512
513
# Use appropriate compression levels
514
# - Level 1: Fast compression, good for temporary data
515
# - Level 6: Balanced compression/speed, good for general use
516
# - Level 9: Maximum compression, good for long-term storage
517
```
518
519
### File Size Optimization
520
521
```python
522
# Configure file threshold based on your use case
523
# Small threshold: More items stored as files (faster access, more files)
524
small_file_disk = diskcache.Disk('/tmp/small_files', min_file_size=512)
525
526
# Large threshold: More items in database (fewer files, may be slower for large items)
527
large_file_disk = diskcache.Disk('/tmp/large_files', min_file_size=10240)
528
```
529
530
### Error Handling
531
532
```python
533
import diskcache
534
535
try:
536
# JSONDisk with data that can't be JSON-serialized
537
json_cache = diskcache.Cache('/tmp/json_test',
538
disk=diskcache.JSONDisk(compress_level=1))
539
540
# This will work
541
json_cache.set('good_data', {'key': 'value'})
542
543
# This will raise TypeError
544
json_cache.set('bad_data', set([1, 2, 3])) # Sets aren't JSON-serializable
545
546
except TypeError as e:
547
print(f"JSON serialization error: {e}")
548
549
# Fall back to pickle-based cache
550
pickle_cache = diskcache.Cache('/tmp/pickle_fallback')
551
pickle_cache.set('bad_data', set([1, 2, 3])) # This works with pickle
552
```