or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

core-caching.mddisk-serialization.mddjango-integration.mdfanout-cache.mdindex.mdpersistent-data-structures.mdrecipe-functions.mdsynchronization-primitives.md

disk-serialization.mddocs/

0

# Disk Serialization

1

2

DiskCache provides flexible serialization engines that handle the conversion between Python objects and disk storage. The Disk class provides the base functionality with pickle-based serialization, while JSONDisk offers JSON serialization with compression for better compatibility and human-readable storage.

3

4

## Capabilities

5

6

### Disk - Base Serialization Engine

7

8

The base serialization class that handles conversion between Python objects and disk storage using pickle and multiple storage modes.

9

10

```python { .api }

11

class Disk:

12

def __init__(self, directory, min_file_size=0, pickle_protocol=0):

13

"""

14

Initialize disk serialization engine.

15

16

Args:

17

directory (str): Directory path for file storage

18

min_file_size (int): Minimum size for file storage. Default 0.

19

Values smaller than this are stored in database.

20

pickle_protocol (int): Pickle protocol version. Default 0 (most compatible).

21

"""

22

23

@property

24

def directory(self):

25

"""Directory path for file storage."""

26

27

@property

28

def min_file_size(self):

29

"""Minimum file size threshold for disk storage."""

30

31

@property

32

def pickle_protocol(self):

33

"""Pickle protocol version used for serialization."""

34

```

35

36

#### Key Serialization

37

38

Methods for serializing and deserializing cache keys.

39

40

```python { .api }

41

def hash(self, key):

42

"""

43

Compute portable hash for cache key.

44

45

Args:

46

key: Cache key (must be hashable)

47

48

Returns:

49

int: Hash value for the key

50

"""

51

52

def put(self, key):

53

"""

54

Serialize key for database storage.

55

56

Args:

57

key: Cache key to serialize

58

59

Returns:

60

Tuple of (database_key, raw_flag) where:

61

- database_key: Serialized key for database storage

62

- raw_flag: Boolean indicating if key is stored raw

63

"""

64

65

def get(self, key, raw):

66

"""

67

Deserialize key from database storage.

68

69

Args:

70

key: Serialized key from database

71

raw (bool): Whether key was stored raw

72

73

Returns:

74

Original Python key object

75

"""

76

```

77

78

#### Value Serialization

79

80

Methods for serializing and deserializing cache values with multiple storage modes.

81

82

```python { .api }

83

def store(self, value, read, key=UNKNOWN):

84

"""

85

Serialize value for storage.

86

87

Determines the best storage mode and location (database vs file)

88

based on value type and size.

89

90

Args:

91

value: Python value to serialize

92

read (bool): Whether value should be stored for file reading

93

key: Cache key (for filename generation)

94

95

Returns:

96

Tuple of (size, mode, filename, db_value) where:

97

- size: Storage size in bytes

98

- mode: Storage mode (0=none, 1=raw, 2=binary, 3=text, 4=pickle)

99

- filename: File path if stored as file, else None

100

- db_value: Serialized value for database storage

101

"""

102

103

def fetch(self, mode, filename, value, read):

104

"""

105

Deserialize value from storage.

106

107

Args:

108

mode (int): Storage mode used during store()

109

filename (str): File path if value stored as file

110

value: Database-stored value

111

read (bool): Whether to return file handle instead of value

112

113

Returns:

114

Original Python value, or file handle if read=True

115

"""

116

```

117

118

#### File Management

119

120

Methods for managing file storage and cleanup.

121

122

```python { .api }

123

def filename(self, key=UNKNOWN, value=UNKNOWN):

124

"""

125

Generate filename and full path for storage.

126

127

Args:

128

key: Cache key (optional, for unique naming)

129

value: Value to store (optional, for type-based naming)

130

131

Returns:

132

Tuple of (filename, full_path) where:

133

- filename: Generated filename

134

- full_path: Complete file path in directory

135

"""

136

137

def remove(self, file_path):

138

"""

139

Safely remove file from storage.

140

141

Args:

142

file_path (str): Path to file to remove

143

144

Returns:

145

bool: True if file was removed, False if it didn't exist

146

"""

147

```

148

149

### JSONDisk - JSON Serialization Engine

150

151

Enhanced serialization engine that uses JSON with optional compression, providing better compatibility and human-readable storage.

152

153

```python { .api }

154

class JSONDisk(Disk):

155

def __init__(self, directory, compress_level=1, **kwargs):

156

"""

157

Initialize JSON disk serialization engine.

158

159

Args:

160

directory (str): Directory path for file storage

161

compress_level (int): zlib compression level (0-9). Default 1.

162

0 = no compression, 9 = maximum compression

163

**kwargs: Additional arguments passed to Disk constructor

164

"""

165

166

@property

167

def compress_level(self):

168

"""zlib compression level (0-9)."""

169

170

@compress_level.setter

171

def compress_level(self, value):

172

"""Set zlib compression level."""

173

```

174

175

#### JSON Key Serialization

176

177

JSON-specific key serialization with compression.

178

179

```python { .api }

180

def put(self, key):

181

"""

182

Serialize key using JSON and optional compression.

183

184

Args:

185

key: Cache key to serialize (must be JSON-serializable)

186

187

Returns:

188

Tuple of (compressed_json_key, raw_flag)

189

190

Raises:

191

TypeError: If key is not JSON-serializable

192

"""

193

194

def get(self, key, raw):

195

"""

196

Deserialize key from compressed JSON.

197

198

Args:

199

key: Compressed JSON key from database

200

raw (bool): Whether key was stored raw

201

202

Returns:

203

Original Python key object

204

"""

205

```

206

207

#### JSON Value Serialization

208

209

JSON-specific value serialization with compression.

210

211

```python { .api }

212

def store(self, value, read, key=UNKNOWN):

213

"""

214

Serialize value using JSON and optional compression.

215

216

Args:

217

value: Python value to serialize (must be JSON-serializable)

218

read (bool): Whether value should be stored for file reading

219

key: Cache key (for filename generation)

220

221

Returns:

222

Tuple of (size, mode, filename, compressed_json_value)

223

224

Raises:

225

TypeError: If value is not JSON-serializable

226

"""

227

228

def fetch(self, mode, filename, value, read):

229

"""

230

Deserialize value from compressed JSON.

231

232

Args:

233

mode (int): Storage mode used during store()

234

filename (str): File path if value stored as file

235

value: Compressed JSON value from database

236

read (bool): Whether to return file handle instead of value

237

238

Returns:

239

Original Python value, or file handle if read=True

240

"""

241

```

242

243

## Storage Modes

244

245

DiskCache uses different storage modes based on value type and size:

246

247

- **Mode 0 (MODE_NONE)**: No value stored (used for keys without values)

248

- **Mode 1 (MODE_RAW)**: Raw bytes stored directly

249

- **Mode 2 (MODE_BINARY)**: Binary data stored as file

250

- **Mode 3 (MODE_TEXT)**: Text data stored as file

251

- **Mode 4 (MODE_PICKLE)**: Pickled objects stored in database or as file

252

253

## Usage Examples

254

255

### Basic Disk Usage

256

257

```python

258

import diskcache

259

260

# Create cache with default Disk serialization

261

cache = diskcache.Cache('/tmp/pickle_cache')

262

263

# Store various Python objects

264

cache.set('string', 'Hello, World!')

265

cache.set('number', 42)

266

cache.set('list', [1, 2, 3, 4, 5])

267

cache.set('dict', {'key': 'value', 'nested': {'a': 1}})

268

269

# Custom objects work with pickle

270

class Person:

271

def __init__(self, name, age):

272

self.name = name

273

self.age = age

274

275

def __repr__(self):

276

return f"Person('{self.name}', {self.age})"

277

278

cache.set('person', Person('Alice', 30))

279

280

# Retrieve objects

281

print(cache.get('string')) # 'Hello, World!'

282

print(cache.get('person')) # Person('Alice', 30)

283

```

284

285

### Custom Disk Configuration

286

287

```python

288

import diskcache

289

import pickle

290

291

# Custom Disk with specific settings

292

custom_disk = diskcache.Disk(

293

directory='/tmp/custom_serialization',

294

min_file_size=1024, # Store values >= 1KB as files

295

pickle_protocol=pickle.HIGHEST_PROTOCOL # Use latest pickle protocol

296

)

297

298

cache = diskcache.Cache('/tmp/custom_cache', disk=custom_disk)

299

300

# Small values stored in database

301

cache.set('small', 'small value')

302

303

# Large values stored as files

304

large_data = 'x' * 2000 # 2KB string

305

cache.set('large', large_data)

306

307

print(f"Small value: {cache.get('small')}")

308

print(f"Large value length: {len(cache.get('large'))}")

309

```

310

311

### JSONDisk Usage

312

313

```python

314

import diskcache

315

316

# Create cache with JSON serialization

317

json_disk = diskcache.JSONDisk('/tmp/json_serialization', compress_level=6)

318

cache = diskcache.Cache('/tmp/json_cache', disk=json_disk)

319

320

# Store JSON-compatible data

321

cache.set('config', {

322

'debug': True,

323

'max_connections': 100,

324

'allowed_ips': ['192.168.1.1', '10.0.0.1'],

325

'settings': {

326

'timeout': 30,

327

'retries': 3

328

}

329

})

330

331

cache.set('metrics', [

332

{'timestamp': 1609459200, 'value': 42.5},

333

{'timestamp': 1609459260, 'value': 38.2},

334

{'timestamp': 1609459320, 'value': 45.1}

335

])

336

337

# Retrieve and use data

338

config = cache.get('config')

339

print(f"Debug mode: {config['debug']}")

340

print(f"Max connections: {config['max_connections']}")

341

342

metrics = cache.get('metrics')

343

print(f"Latest metric: {metrics[-1]}")

344

```

345

346

### Compression Comparison

347

348

```python

349

import diskcache

350

import json

351

352

# Test different compression levels

353

test_data = {

354

'users': [{'id': i, 'name': f'user_{i}', 'data': 'x' * 100} for i in range(100)]

355

}

356

357

# No compression

358

disk_no_compress = diskcache.JSONDisk('/tmp/no_compress', compress_level=0)

359

cache_no_compress = diskcache.Cache('/tmp/cache_no_compress', disk=disk_no_compress)

360

361

# Maximum compression

362

disk_max_compress = diskcache.JSONDisk('/tmp/max_compress', compress_level=9)

363

cache_max_compress = diskcache.Cache('/tmp/cache_max_compress', disk=disk_max_compress)

364

365

# Store same data in both caches

366

cache_no_compress.set('data', test_data)

367

cache_max_compress.set('data', test_data)

368

369

# Compare storage sizes

370

size_no_compress = cache_no_compress.volume()

371

size_max_compress = cache_max_compress.volume()

372

373

print(f"No compression: {size_no_compress} bytes")

374

print(f"Max compression: {size_max_compress} bytes")

375

print(f"Compression ratio: {size_no_compress / size_max_compress:.2f}x")

376

```

377

378

### File-based Storage

379

380

```python

381

import diskcache

382

383

# Configure for file-based storage of large items

384

disk = diskcache.Disk('/tmp/file_storage', min_file_size=100) # Store items >= 100 bytes as files

385

cache = diskcache.Cache('/tmp/file_cache', disk=disk)

386

387

# Small item - stored in database

388

cache.set('small', 'tiny')

389

390

# Large item - stored as file

391

large_content = 'This is a large content string. ' * 10 # > 100 bytes

392

cache.set('large', large_content)

393

394

# Read mode - store as file for direct file access

395

with open('/tmp/sample.txt', 'w') as f:

396

f.write('Sample file content for reading')

397

398

with open('/tmp/sample.txt', 'rb') as f:

399

file_content = f.read()

400

401

cache.set('file_data', file_content, read=True)

402

403

# Get file handle instead of content

404

file_handle = cache.get('file_data', read=True)

405

if file_handle:

406

content = file_handle.read()

407

print(f"File content: {content.decode()}")

408

file_handle.close()

409

```

410

411

### Direct Disk Operations

412

413

```python

414

import diskcache

415

416

# Create disk instance directly

417

disk = diskcache.Disk('/tmp/direct_disk')

418

419

# Manual serialization operations

420

test_key = 'my_key'

421

test_value = {'data': [1, 2, 3], 'timestamp': 1609459200}

422

423

# Serialize key

424

db_key, raw_flag = disk.put(test_key)

425

print(f"Serialized key: {db_key}, raw: {raw_flag}")

426

427

# Serialize value

428

size, mode, filename, db_value = disk.store(test_value, read=False)

429

print(f"Value size: {size}, mode: {mode}, filename: {filename}")

430

431

# Deserialize key

432

original_key = disk.get(db_key, raw_flag)

433

print(f"Deserialized key: {original_key}")

434

435

# Deserialize value

436

original_value = disk.fetch(mode, filename, db_value, read=False)

437

print(f"Deserialized value: {original_value}")

438

439

# Generate filename

440

fname, full_path = disk.filename(key=test_key, value=test_value)

441

print(f"Generated filename: {fname}")

442

print(f"Full path: {full_path}")

443

```

444

445

### Custom Serialization

446

447

```python

448

import diskcache

449

import json

450

import pickle

451

452

class CustomDisk(diskcache.Disk):

453

"""Custom serialization that prefers JSON when possible, falls back to pickle."""

454

455

def store(self, value, read, key=diskcache.UNKNOWN):

456

# Try JSON first

457

try:

458

json_data = json.dumps(value, separators=(',', ':'))

459

# Store as text mode with custom marker

460

return len(json_data), 3, None, json_data.encode('utf-8')

461

except (TypeError, ValueError):

462

# Fall back to pickle for non-JSON-serializable objects

463

return super().store(value, read, key)

464

465

def fetch(self, mode, filename, value, read):

466

if mode == 3 and filename is None:

467

# Our custom JSON format

468

try:

469

json_str = value.decode('utf-8')

470

return json.loads(json_str)

471

except (UnicodeDecodeError, json.JSONDecodeError):

472

pass

473

474

# Fall back to parent implementation

475

return super().fetch(mode, filename, value, read)

476

477

# Use custom disk

478

custom_disk = CustomDisk('/tmp/custom_disk')

479

cache = diskcache.Cache('/tmp/custom_cache', disk=custom_disk)

480

481

# JSON-serializable data uses JSON

482

cache.set('json_data', {'numbers': [1, 2, 3], 'text': 'hello'})

483

484

# Non-JSON data uses pickle

485

class CustomClass:

486

def __init__(self, value):

487

self.value = value

488

def __repr__(self):

489

return f"CustomClass({self.value})"

490

491

cache.set('pickle_data', CustomClass(42))

492

493

# Retrieve both

494

json_result = cache.get('json_data')

495

pickle_result = cache.get('pickle_data')

496

497

print(f"JSON data: {json_result}")

498

print(f"Pickle data: {pickle_result}")

499

```

500

501

## Best Practices

502

503

### Choosing Serialization Method

504

505

```python

506

# Use Disk for maximum compatibility and Python object support

507

disk_cache = diskcache.Cache('/tmp/python_objects', disk=diskcache.Disk())

508

509

# Use JSONDisk for cross-language compatibility and human-readable storage

510

json_cache = diskcache.Cache('/tmp/json_data',

511

disk=diskcache.JSONDisk(compress_level=3))

512

513

# Use appropriate compression levels

514

# - Level 1: Fast compression, good for temporary data

515

# - Level 6: Balanced compression/speed, good for general use

516

# - Level 9: Maximum compression, good for long-term storage

517

```

518

519

### File Size Optimization

520

521

```python

522

# Configure file threshold based on your use case

523

# Small threshold: More items stored as files (faster access, more files)

524

small_file_disk = diskcache.Disk('/tmp/small_files', min_file_size=512)

525

526

# Large threshold: More items in database (fewer files, may be slower for large items)

527

large_file_disk = diskcache.Disk('/tmp/large_files', min_file_size=10240)

528

```

529

530

### Error Handling

531

532

```python

533

import diskcache

534

535

try:

536

# JSONDisk with data that can't be JSON-serialized

537

json_cache = diskcache.Cache('/tmp/json_test',

538

disk=diskcache.JSONDisk(compress_level=1))

539

540

# This will work

541

json_cache.set('good_data', {'key': 'value'})

542

543

# This will raise TypeError

544

json_cache.set('bad_data', set([1, 2, 3])) # Sets aren't JSON-serializable

545

546

except TypeError as e:

547

print(f"JSON serialization error: {e}")

548

549

# Fall back to pickle-based cache

550

pickle_cache = diskcache.Cache('/tmp/pickle_fallback')

551

pickle_cache.set('bad_data', set([1, 2, 3])) # This works with pickle

552

```