or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

caching.mdcallbacks.mdcompression.mdcore-operations.mdfilesystem-interface.mdindex.mdmapping.mdregistry.mdutilities.md

caching.mddocs/

0

# Caching System

1

2

Multiple caching strategies for optimizing filesystem access patterns, including memory mapping, block caching, read-ahead caching, and background prefetching. The caching system improves performance with remote storage by reducing network requests and providing intelligent data prefetching.

3

4

## Capabilities

5

6

### Base Cache Class

7

8

Abstract base class that defines the interface for all caching implementations.

9

10

```python { .api }

11

class BaseCache:

12

"""Base class for caching implementations."""

13

14

def __init__(self, blocksize, fetcher, size, **kwargs):

15

"""

16

Initialize cache.

17

18

Parameters:

19

- blocksize: int, size of cache blocks

20

- fetcher: callable, function to fetch data

21

- size: int, total size of cached object

22

- **kwargs: additional cache-specific options

23

"""

24

25

def _fetch(self, start, end):

26

"""

27

Fetch data range.

28

29

Parameters:

30

- start: int, start byte offset

31

- end: int, end byte offset

32

33

Returns:

34

bytes, fetched data

35

"""

36

37

def _read_cache(self, start, end):

38

"""

39

Read from cache if available.

40

41

Parameters:

42

- start: int, start byte offset

43

- end: int, end byte offset

44

45

Returns:

46

bytes or None, cached data or None if not cached

47

"""

48

```

49

50

### Memory-Mapped Cache

51

52

Uses memory mapping for efficient access to cached files, particularly useful for large files with random access patterns.

53

54

```python { .api }

55

class MMapCache(BaseCache):

56

"""Memory-mapped file cache for efficient random access."""

57

58

def __init__(self, blocksize, fetcher, size, location=None, blocks=None):

59

"""

60

Initialize memory-mapped cache.

61

62

Parameters:

63

- blocksize: int, size of cache blocks

64

- fetcher: callable, function to fetch data

65

- size: int, total size of cached object

66

- location: str, local file path for memory mapping

67

- blocks: set, specific blocks to cache

68

"""

69

```

70

71

### Read-Ahead Cache

72

73

Implements read-ahead caching strategy that prefetches data based on sequential access patterns.

74

75

```python { .api }

76

class ReadAheadCache(BaseCache):

77

"""Read-ahead cache optimized for sequential access patterns."""

78

79

def __init__(self, blocksize, fetcher, size, maxblocks=32):

80

"""

81

Initialize read-ahead cache.

82

83

Parameters:

84

- blocksize: int, size of cache blocks

85

- fetcher: callable, function to fetch data

86

- size: int, total size of cached object

87

- maxblocks: int, maximum number of blocks to cache

88

"""

89

```

90

91

### Block Cache

92

93

LRU-based block caching with configurable cache size and eviction policies.

94

95

```python { .api }

96

class BlockCache(BaseCache):

97

"""Block-based cache with LRU eviction policy."""

98

99

def __init__(self, blocksize, fetcher, size, maxblocks=32):

100

"""

101

Initialize block cache.

102

103

Parameters:

104

- blocksize: int, size of cache blocks

105

- fetcher: callable, function to fetch data

106

- size: int, total size of cached object

107

- maxblocks: int, maximum number of blocks to keep in cache

108

"""

109

```

110

111

### Bytes Cache

112

113

Simple in-memory cache that stores entire file contents as bytes.

114

115

```python { .api }

116

class BytesCache(BaseCache):

117

"""In-memory bytes cache for small files."""

118

119

def __init__(self, blocksize, fetcher, size, **kwargs):

120

"""

121

Initialize bytes cache.

122

123

Parameters:

124

- blocksize: int, size of cache blocks

125

- fetcher: callable, function to fetch data

126

- size: int, total size of cached object

127

"""

128

```

129

130

### Background Block Cache

131

132

Advanced block cache with background prefetching for improved performance with predictable access patterns.

133

134

```python { .api }

135

class BackgroundBlockCache(BaseCache):

136

"""Block cache with background prefetching capabilities."""

137

138

def __init__(self, blocksize, fetcher, size, maxblocks=32):

139

"""

140

Initialize background block cache.

141

142

Parameters:

143

- blocksize: int, size of cache blocks

144

- fetcher: callable, function to fetch data

145

- size: int, total size of cached object

146

- maxblocks: int, maximum number of blocks to cache

147

"""

148

```

149

150

### Cache Registry

151

152

Dictionary of available cache implementations that can be selected by name.

153

154

```python { .api }

155

caches: dict

156

"""

157

Mapping of cache names to cache classes.

158

159

Available caches:

160

- 'mmap': MMapCache

161

- 'readahead': ReadAheadCache

162

- 'blockcache': BlockCache

163

- 'bytes': BytesCache

164

- 'background': BackgroundBlockCache

165

"""

166

```

167

168

## Usage Patterns

169

170

### Specifying Cache Type in File Opening

171

172

```python

173

# Use specific cache type when opening files

174

with fsspec.open('s3://bucket/large-file.dat', cache_type='mmap') as f:

175

# File uses memory-mapped caching

176

data = f.read(1024)

177

178

# Use block cache with custom parameters

179

with fsspec.open('s3://bucket/file.dat',

180

cache_type='blockcache',

181

block_size=1024*1024,

182

maxblocks=64) as f:

183

data = f.read()

184

```

185

186

### Cache Configuration for Different Access Patterns

187

188

```python

189

# Sequential reading - use read-ahead cache

190

with fsspec.open('s3://bucket/log-file.txt',

191

cache_type='readahead',

192

block_size=64*1024) as f:

193

for line in f:

194

process_line(line)

195

196

# Random access - use memory-mapped cache

197

with fsspec.open('s3://bucket/database.dat',

198

cache_type='mmap',

199

block_size=4096) as f:

200

# Jump to different positions efficiently

201

f.seek(1000000)

202

data1 = f.read(100)

203

f.seek(5000000)

204

data2 = f.read(100)

205

206

# Small files - use bytes cache

207

with fsspec.open('s3://bucket/config.json',

208

cache_type='bytes') as f:

209

config = json.load(f)

210

```

211

212

### Background Prefetching

213

214

```python

215

# Use background cache for predictable access patterns

216

with fsspec.open('s3://bucket/time-series.dat',

217

cache_type='background',

218

block_size=1024*1024,

219

maxblocks=16) as f:

220

# Cache will prefetch subsequent blocks in background

221

for i in range(0, file_size, chunk_size):

222

f.seek(i)

223

chunk = f.read(chunk_size)

224

process_chunk(chunk)

225

```

226

227

### Filesystem-Level Cache Configuration

228

229

```python

230

# Configure caching at filesystem level

231

s3 = fsspec.filesystem('s3',

232

key='ACCESS_KEY',

233

secret='SECRET_KEY',

234

default_cache_type='blockcache',

235

default_block_size=1024*1024)

236

237

# All files opened through this filesystem use the cache settings

238

with s3.open('bucket/file1.dat') as f:

239

data1 = f.read()

240

241

with s3.open('bucket/file2.dat') as f:

242

data2 = f.read()

243

```

244

245

### Cache Performance Tuning

246

247

```python

248

# Tune cache parameters for specific workloads

249

250

# Large files with sequential access

251

large_file_cache = {

252

'cache_type': 'readahead',

253

'block_size': 8 * 1024 * 1024, # 8MB blocks

254

'maxblocks': 4 # Keep 32MB in memory

255

}

256

257

# Database-like files with random access

258

random_access_cache = {

259

'cache_type': 'mmap',

260

'block_size': 64 * 1024, # 64KB blocks

261

'maxblocks': 256 # Keep 16MB in memory

262

}

263

264

# Many small files

265

small_files_cache = {

266

'cache_type': 'bytes' # Cache entire file

267

}

268

269

# Open files with appropriate cache settings

270

with fsspec.open('s3://bucket/large.dat', **large_file_cache) as f:

271

process_large_file(f)

272

273

with fsspec.open('s3://bucket/index.db', **random_access_cache) as f:

274

lookup_data(f)

275

276

with fsspec.open('s3://bucket/config.json', **small_files_cache) as f:

277

config = json.load(f)

278

```

279

280

### Monitoring Cache Performance

281

282

```python

283

# Access cache statistics (implementation-dependent)

284

with fsspec.open('s3://bucket/file.dat', cache_type='blockcache') as f:

285

# Perform operations

286

data = f.read(1024*1024)

287

288

# Some cache implementations provide statistics

289

if hasattr(f.cache, 'hit_count'):

290

print(f"Cache hits: {f.cache.hit_count}")

291

print(f"Cache misses: {f.cache.miss_count}")

292

print(f"Hit ratio: {f.cache.hit_count / (f.cache.hit_count + f.cache.miss_count)}")

293

```

294

295

### Combining with Compression

296

297

```python

298

# Caching works with compression

299

with fsspec.open('s3://bucket/data.csv.gz',

300

compression='gzip',

301

cache_type='readahead',

302

block_size=1024*1024) as f:

303

# Compressed data is cached, decompression happens after cache

304

df = pd.read_csv(f)

305

```

306

307

### Cache Location Control

308

309

```python

310

# Control where cache files are stored (for persistent caches)

311

import tempfile

312

313

cache_dir = tempfile.mkdtemp()

314

315

with fsspec.open('s3://bucket/large-file.dat',

316

cache_type='mmap',

317

cache_storage=cache_dir) as f:

318

# Memory-mapped cache file stored in cache_dir

319

data = f.read()

320

321

# Cache files persist after closing

322

# Subsequent opens can reuse cached data

323

```

324

325

### Cache Invalidation

326

327

```python

328

# Clear caches when needed

329

fs = fsspec.filesystem('s3')

330

331

# Clear cache for specific file

332

fs.invalidate_cache('bucket/file.dat')

333

334

# Clear all cached data for this filesystem

335

fs.invalidate_cache()

336

337

# Clear all filesystem instances (nuclear option)

338

fsspec.AbstractFileSystem.clear_instance_cache()

339

```

340

341

## Cache Selection Guidelines

342

343

### By Access Pattern

344

345

- **Sequential Reading**: `ReadAheadCache` - Prefetches next blocks automatically

346

- **Random Access**: `MMapCache` - Efficient memory mapping for jumping around

347

- **Mixed Access**: `BlockCache` - Good general-purpose LRU cache

348

- **One-time Read**: `BytesCache` - Simple for small files read once

349

- **Predictable Patterns**: `BackgroundBlockCache` - Intelligent prefetching

350

351

### By File Size

352

353

- **Small files (<1MB)**: `BytesCache` - Cache entire file in memory

354

- **Medium files (1MB-100MB)**: `BlockCache` or `ReadAheadCache`

355

- **Large files (>100MB)**: `MMapCache` for random access, `ReadAheadCache` for sequential

356

357

### By Network Conditions

358

359

- **High latency**: Larger block sizes, more aggressive prefetching

360

- **Low bandwidth**: Smaller block sizes, conservative caching

361

- **Reliable connection**: `BackgroundBlockCache` for intelligent prefetching

362

- **Unreliable connection**: `BlockCache` with smaller blocks for retry resilience