or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

cache-management.mdcore-url.mdindex.mdpath-operations.mdquery-handling.mdurl-modification.md

cache-management.mddocs/

0

# Cache Management

1

2

Performance optimization through configurable LRU caching for encoding/decoding operations. YARL uses caching to improve performance when processing many URLs, especially those with international domain names or complex encoding requirements.

3

4

## Cache Types

5

6

YARL maintains separate LRU caches for different operations to optimize performance:

7

8

- **IDNA Encoding**: International domain name encoding (Unicode to ASCII)

9

- **IDNA Decoding**: International domain name decoding (ASCII to Unicode)

10

- **Host Encoding**: Complete host encoding operations

11

12

**Deprecated Cache Types** (kept for backwards compatibility):

13

- **IP Address Validation**: No longer used, functionality moved to host encoding

14

- **Host Validation**: No longer used, functionality moved to host encoding

15

16

## Capabilities

17

18

### Cache Configuration

19

20

Configure cache sizes for different operations to balance memory usage and performance.

21

22

```python { .api }

23

def cache_configure(*,

24

idna_encode_size: int | None = 256,

25

idna_decode_size: int | None = 256,

26

ip_address_size: int | None = None,

27

host_validate_size: int | None = None,

28

encode_host_size: int | None = None) -> None:

29

"""

30

Configure LRU cache sizes for URL processing operations.

31

32

Args:

33

idna_encode_size (int | None): Cache size for IDNA encoding operations

34

idna_decode_size (int | None): Cache size for IDNA decoding operations

35

ip_address_size (int | None): DEPRECATED - kept for backwards compatibility

36

host_validate_size (int | None): DEPRECATED - kept for backwards compatibility

37

encode_host_size (int | None): Cache size for host encoding operations

38

39

Note:

40

ip_address_size and host_validate_size are deprecated and will be removed

41

in future versions. They are kept for backwards compatibility only.

42

43

Examples:

44

# Increase cache sizes for high-volume applications

45

cache_configure(

46

idna_encode_size=1024,

47

idna_decode_size=1024,

48

ip_address_size=512,

49

host_validate_size=512,

50

encode_host_size=1024

51

)

52

53

# Reduce memory usage for memory-constrained environments

54

cache_configure(

55

idna_encode_size=64,

56

idna_decode_size=64,

57

ip_address_size=32,

58

host_validate_size=32,

59

encode_host_size=64

60

)

61

"""

62

```

63

64

### Cache Information

65

66

Retrieve statistics about cache performance to monitor effectiveness and tune cache sizes.

67

68

```python { .api }

69

def cache_info() -> CacheInfo:

70

"""

71

Get cache statistics for all URL processing caches.

72

73

Returns:

74

CacheInfo: Dictionary containing cache statistics for each operation

75

76

Examples:

77

info = cache_info()

78

print(f"IDNA encode hits: {info['idna_encode'].hits}")

79

print(f"IDNA encode misses: {info['idna_encode'].misses}")

80

print(f"Cache hit ratio: {info['idna_encode'].hits / (info['idna_encode'].hits + info['idna_encode'].misses)}")

81

"""

82

83

class CacheInfo(TypedDict):

84

"""

85

Cache information structure containing statistics for each cache type.

86

87

Each cache entry contains standard functools.lru_cache statistics:

88

- hits: Number of cache hits

89

- misses: Number of cache misses

90

- maxsize: Maximum cache size

91

- currsize: Current number of cached items

92

"""

93

idna_encode: _CacheInfo

94

idna_decode: _CacheInfo

95

ip_address: _CacheInfo

96

host_validate: _CacheInfo

97

encode_host: _CacheInfo

98

```

99

100

### Cache Management

101

102

Clear caches to free memory or reset performance counters.

103

104

```python { .api }

105

def cache_clear() -> None:

106

"""

107

Clear all URL processing caches.

108

109

Removes all cached entries and resets performance counters.

110

Useful for freeing memory or starting fresh performance measurements.

111

112

Examples:

113

# Clear caches after processing a large batch

114

cache_clear()

115

116

# Reset before performance testing

117

cache_clear()

118

process_urls(test_urls)

119

stats = cache_info()

120

"""

121

```

122

123

## Usage Examples

124

125

### Basic Cache Management

126

127

```python

128

from yarl import URL, cache_info, cache_clear, cache_configure

129

130

# Check initial cache state

131

initial_info = cache_info()

132

print("Initial cache state:")

133

for cache_name, stats in initial_info.items():

134

print(f" {cache_name}: {stats.hits} hits, {stats.misses} misses")

135

136

# Process some URLs with international domains

137

urls = [

138

URL('https://café.example.com/path'),

139

URL('https://münchen.de/info'),

140

URL('https://москва.рф/news'),

141

URL('https://日本.jp/page')

142

]

143

144

# Check cache statistics after processing

145

final_info = cache_info()

146

print("\nAfter processing international domains:")

147

for cache_name, stats in final_info.items():

148

if stats.hits > 0 or stats.misses > 0:

149

hit_ratio = stats.hits / (stats.hits + stats.misses) if (stats.hits + stats.misses) > 0 else 0

150

print(f" {cache_name}: {stats.hits} hits, {stats.misses} misses (hit ratio: {hit_ratio:.2%})")

151

```

152

153

### Performance Optimization

154

155

```python

156

from yarl import URL, cache_configure, cache_info, cache_clear

157

158

# Configure larger caches for high-volume application

159

cache_configure(

160

idna_encode_size=2048,

161

idna_decode_size=2048,

162

ip_address_size=1024,

163

host_validate_size=1024,

164

encode_host_size=2048

165

)

166

167

def process_url_batch(urls):

168

"""Process a batch of URLs and return timing info."""

169

import time

170

171

start_time = time.time()

172

processed_urls = [URL(url_str) for url_str in urls]

173

end_time = time.time()

174

175

return processed_urls, end_time - start_time

176

177

# Simulate processing batches of URLs

178

url_batch = [

179

'https://café.example.com/api/v1/users',

180

'https://münchen.de/api/search',

181

'https://москва.рф/api/news',

182

] * 100 # Repeat for cache effectiveness

183

184

# First pass - populate caches

185

cache_clear() # Start fresh

186

first_batch, first_time = process_url_batch(url_batch)

187

first_stats = cache_info()

188

189

# Second pass - should benefit from caching

190

second_batch, second_time = process_url_batch(url_batch)

191

second_stats = cache_info()

192

193

print(f"First pass: {first_time:.4f}s")

194

print(f"Second pass: {second_time:.4f}s")

195

print(f"Speedup: {first_time/second_time:.2f}x")

196

197

# Analyze cache effectiveness

198

for cache_name in first_stats:

199

first = first_stats[cache_name]

200

second = second_stats[cache_name]

201

if second.hits > first.hits:

202

print(f"{cache_name}: {second.hits - first.hits} additional hits in second pass")

203

```

204

205

### Memory-Constrained Environments

206

207

```python

208

from yarl import URL, cache_configure, cache_info

209

210

# Configure smaller caches for memory-constrained environments

211

cache_configure(

212

idna_encode_size=32,

213

idna_decode_size=32,

214

ip_address_size=16,

215

host_validate_size=16,

216

encode_host_size=32

217

)

218

219

# Process URLs and monitor memory usage

220

urls_to_process = [

221

'https://example.com/api/users',

222

'https://test.org/data',

223

'https://192.168.1.1:8080/status',

224

] * 50

225

226

processed = [URL(url) for url in urls_to_process]

227

228

# Check cache utilization

229

stats = cache_info()

230

for cache_name, info in stats.items():

231

if info.currsize > 0:

232

utilization = info.currsize / info.maxsize

233

print(f"{cache_name}: {info.currsize}/{info.maxsize} ({utilization:.1%} full)")

234

```

235

236

### Cache Monitoring and Tuning

237

238

```python

239

from yarl import URL, cache_info, cache_clear, cache_configure

240

241

def analyze_cache_performance(urls):

242

"""Analyze cache performance for a given set of URLs."""

243

cache_clear() # Start with empty caches

244

245

# Process URLs twice to see caching benefit

246

first_pass = [URL(url) for url in urls]

247

first_stats = cache_info()

248

249

second_pass = [URL(url) for url in urls]

250

second_stats = cache_info()

251

252

print("Cache Performance Analysis:")

253

print("-" * 50)

254

255

for cache_name in first_stats:

256

first = first_stats[cache_name]

257

second = second_stats[cache_name]

258

259

total_ops = second.hits + second.misses

260

if total_ops > 0:

261

hit_ratio = second.hits / total_ops

262

cache_benefit = second.hits - first.hits

263

264

print(f"{cache_name}:")

265

print(f" Total operations: {total_ops}")

266

print(f" Hit ratio: {hit_ratio:.1%}")

267

print(f" Cache benefit: {cache_benefit} hits saved")

268

print(f" Current size: {second.currsize}/{second.maxsize}")

269

270

# Suggest cache size adjustments

271

if hit_ratio < 0.5 and second.currsize == second.maxsize:

272

print(f" 💡 Consider increasing {cache_name}_size")

273

elif second.currsize < second.maxsize * 0.3:

274

print(f" 💡 Consider decreasing {cache_name}_size")

275

print()

276

277

# Test with various URL patterns

278

test_urls = [

279

# International domains (benefit from IDNA caching)

280

'https://café.example.com/api',

281

'https://münchen.de/search',

282

'https://москва.рф/news',

283

284

# IP addresses (benefit from IP validation caching)

285

'https://192.168.1.1:8080/status',

286

'https://10.0.0.1/api/health',

287

288

# Regular domains (benefit from host validation caching)

289

'https://api.example.com/v1/users',

290

'https://cdn.example.org/images/logo.png',

291

] * 20 # Repeat for meaningful cache statistics

292

293

analyze_cache_performance(test_urls)

294

```

295

296

### Production Cache Configuration

297

298

```python

299

from yarl import cache_configure, cache_info

300

import os

301

302

def configure_production_cache():

303

"""Configure caches based on environment and expected load."""

304

305

# Get configuration from environment or use defaults

306

idna_encode_size = int(os.environ.get('YARL_IDNA_ENCODE_CACHE', 1024))

307

idna_decode_size = int(os.environ.get('YARL_IDNA_DECODE_CACHE', 1024))

308

ip_address_size = int(os.environ.get('YARL_IP_ADDRESS_CACHE', 512))

309

host_validate_size = int(os.environ.get('YARL_HOST_VALIDATE_CACHE', 512))

310

encode_host_size = int(os.environ.get('YARL_ENCODE_HOST_CACHE', 1024))

311

312

cache_configure(

313

idna_encode_size=idna_encode_size,

314

idna_decode_size=idna_decode_size,

315

ip_address_size=ip_address_size,

316

host_validate_size=host_validate_size,

317

encode_host_size=encode_host_size

318

)

319

320

# Log configuration

321

stats = cache_info()

322

print("YARL cache configuration:")

323

for cache_name, info in stats.items():

324

print(f" {cache_name}: maxsize={info.maxsize}")

325

326

# Call during application startup

327

configure_production_cache()

328

```

329

330

## Performance Considerations

331

332

### When Caching Helps Most

333

334

- **International Domain Names**: URLs with non-ASCII characters benefit significantly from IDNA caching

335

- **Repeated URL Processing**: Applications that process the same URLs multiple times

336

- **High-Volume Applications**: Web servers, crawlers, or API clients processing many URLs

337

- **IP Address Heavy Workloads**: Applications dealing with many IP-based URLs

338

339

### Cache Size Tuning Guidelines

340

341

- **Small Applications**: Default sizes (256) are usually sufficient

342

- **High-Volume Applications**: Increase to 1024-2048 for frequently accessed caches

343

- **Memory-Constrained**: Reduce to 32-128 to minimize memory usage

344

- **Monitoring**: Use `cache_info()` to monitor hit ratios and adjust accordingly

345

346

### Optimal Hit Ratios

347

348

- **>80%**: Excellent cache performance, consider current size appropriate

349

- **50-80%**: Good performance, monitor for opportunities to increase cache size

350

- **<50%**: Poor cache performance, consider increasing cache size or investigating URL patterns