0
# Cache Management
1
2
Performance optimization through configurable LRU caching for encoding/decoding operations. YARL uses caching to improve performance when processing many URLs, especially those with international domain names or complex encoding requirements.
3
4
## Cache Types
5
6
YARL maintains separate LRU caches for different operations to optimize performance:
7
8
- **IDNA Encoding**: International domain name encoding (Unicode to ASCII)
9
- **IDNA Decoding**: International domain name decoding (ASCII to Unicode)
10
- **Host Encoding**: Complete host encoding operations
11
12
**Deprecated Cache Types** (kept for backwards compatibility):
13
- **IP Address Validation**: No longer used, functionality moved to host encoding
14
- **Host Validation**: No longer used, functionality moved to host encoding
15
16
## Capabilities
17
18
### Cache Configuration
19
20
Configure cache sizes for different operations to balance memory usage and performance.
21
22
```python { .api }
23
def cache_configure(*,
24
idna_encode_size: int | None = 256,
25
idna_decode_size: int | None = 256,
26
ip_address_size: int | None = None,
27
host_validate_size: int | None = None,
28
encode_host_size: int | None = None) -> None:
29
"""
30
Configure LRU cache sizes for URL processing operations.
31
32
Args:
33
idna_encode_size (int | None): Cache size for IDNA encoding operations
34
idna_decode_size (int | None): Cache size for IDNA decoding operations
35
ip_address_size (int | None): DEPRECATED - kept for backwards compatibility
36
host_validate_size (int | None): DEPRECATED - kept for backwards compatibility
37
encode_host_size (int | None): Cache size for host encoding operations
38
39
Note:
40
ip_address_size and host_validate_size are deprecated and will be removed
41
in future versions. They are kept for backwards compatibility only.
42
43
Examples:
44
# Increase cache sizes for high-volume applications
45
cache_configure(
46
idna_encode_size=1024,
47
idna_decode_size=1024,
48
ip_address_size=512,
49
host_validate_size=512,
50
encode_host_size=1024
51
)
52
53
# Reduce memory usage for memory-constrained environments
54
cache_configure(
55
idna_encode_size=64,
56
idna_decode_size=64,
57
ip_address_size=32,
58
host_validate_size=32,
59
encode_host_size=64
60
)
61
"""
62
```
63
64
### Cache Information
65
66
Retrieve statistics about cache performance to monitor effectiveness and tune cache sizes.
67
68
```python { .api }
69
def cache_info() -> CacheInfo:
70
"""
71
Get cache statistics for all URL processing caches.
72
73
Returns:
74
CacheInfo: Dictionary containing cache statistics for each operation
75
76
Examples:
77
info = cache_info()
78
print(f"IDNA encode hits: {info['idna_encode'].hits}")
79
print(f"IDNA encode misses: {info['idna_encode'].misses}")
80
print(f"Cache hit ratio: {info['idna_encode'].hits / (info['idna_encode'].hits + info['idna_encode'].misses)}")
81
"""
82
83
class CacheInfo(TypedDict):
84
"""
85
Cache information structure containing statistics for each cache type.
86
87
Each cache entry contains standard functools.lru_cache statistics:
88
- hits: Number of cache hits
89
- misses: Number of cache misses
90
- maxsize: Maximum cache size
91
- currsize: Current number of cached items
92
"""
93
idna_encode: _CacheInfo
94
idna_decode: _CacheInfo
95
ip_address: _CacheInfo
96
host_validate: _CacheInfo
97
encode_host: _CacheInfo
98
```
99
100
### Cache Management
101
102
Clear caches to free memory or reset performance counters.
103
104
```python { .api }
105
def cache_clear() -> None:
106
"""
107
Clear all URL processing caches.
108
109
Removes all cached entries and resets performance counters.
110
Useful for freeing memory or starting fresh performance measurements.
111
112
Examples:
113
# Clear caches after processing a large batch
114
cache_clear()
115
116
# Reset before performance testing
117
cache_clear()
118
process_urls(test_urls)
119
stats = cache_info()
120
"""
121
```
122
123
## Usage Examples
124
125
### Basic Cache Management
126
127
```python
128
from yarl import URL, cache_info, cache_clear, cache_configure
129
130
# Check initial cache state
131
initial_info = cache_info()
132
print("Initial cache state:")
133
for cache_name, stats in initial_info.items():
134
print(f" {cache_name}: {stats.hits} hits, {stats.misses} misses")
135
136
# Process some URLs with international domains
137
urls = [
138
URL('https://café.example.com/path'),
139
URL('https://münchen.de/info'),
140
URL('https://москва.рф/news'),
141
URL('https://日本.jp/page')
142
]
143
144
# Check cache statistics after processing
145
final_info = cache_info()
146
print("\nAfter processing international domains:")
147
for cache_name, stats in final_info.items():
148
if stats.hits > 0 or stats.misses > 0:
149
hit_ratio = stats.hits / (stats.hits + stats.misses) if (stats.hits + stats.misses) > 0 else 0
150
print(f" {cache_name}: {stats.hits} hits, {stats.misses} misses (hit ratio: {hit_ratio:.2%})")
151
```
152
153
### Performance Optimization
154
155
```python
156
from yarl import URL, cache_configure, cache_info, cache_clear
157
158
# Configure larger caches for high-volume application
159
cache_configure(
160
idna_encode_size=2048,
161
idna_decode_size=2048,
162
ip_address_size=1024,
163
host_validate_size=1024,
164
encode_host_size=2048
165
)
166
167
def process_url_batch(urls):
168
"""Process a batch of URLs and return timing info."""
169
import time
170
171
start_time = time.time()
172
processed_urls = [URL(url_str) for url_str in urls]
173
end_time = time.time()
174
175
return processed_urls, end_time - start_time
176
177
# Simulate processing batches of URLs
178
url_batch = [
179
'https://café.example.com/api/v1/users',
180
'https://münchen.de/api/search',
181
'https://москва.рф/api/news',
182
] * 100 # Repeat for cache effectiveness
183
184
# First pass - populate caches
185
cache_clear() # Start fresh
186
first_batch, first_time = process_url_batch(url_batch)
187
first_stats = cache_info()
188
189
# Second pass - should benefit from caching
190
second_batch, second_time = process_url_batch(url_batch)
191
second_stats = cache_info()
192
193
print(f"First pass: {first_time:.4f}s")
194
print(f"Second pass: {second_time:.4f}s")
195
print(f"Speedup: {first_time/second_time:.2f}x")
196
197
# Analyze cache effectiveness
198
for cache_name in first_stats:
199
first = first_stats[cache_name]
200
second = second_stats[cache_name]
201
if second.hits > first.hits:
202
print(f"{cache_name}: {second.hits - first.hits} additional hits in second pass")
203
```
204
205
### Memory-Constrained Environments
206
207
```python
208
from yarl import URL, cache_configure, cache_info
209
210
# Configure smaller caches for memory-constrained environments
211
cache_configure(
212
idna_encode_size=32,
213
idna_decode_size=32,
214
ip_address_size=16,
215
host_validate_size=16,
216
encode_host_size=32
217
)
218
219
# Process URLs and monitor memory usage
220
urls_to_process = [
221
'https://example.com/api/users',
222
'https://test.org/data',
223
'https://192.168.1.1:8080/status',
224
] * 50
225
226
processed = [URL(url) for url in urls_to_process]
227
228
# Check cache utilization
229
stats = cache_info()
230
for cache_name, info in stats.items():
231
if info.currsize > 0:
232
utilization = info.currsize / info.maxsize
233
print(f"{cache_name}: {info.currsize}/{info.maxsize} ({utilization:.1%} full)")
234
```
235
236
### Cache Monitoring and Tuning
237
238
```python
239
from yarl import URL, cache_info, cache_clear, cache_configure
240
241
def analyze_cache_performance(urls):
242
"""Analyze cache performance for a given set of URLs."""
243
cache_clear() # Start with empty caches
244
245
# Process URLs twice to see caching benefit
246
first_pass = [URL(url) for url in urls]
247
first_stats = cache_info()
248
249
second_pass = [URL(url) for url in urls]
250
second_stats = cache_info()
251
252
print("Cache Performance Analysis:")
253
print("-" * 50)
254
255
for cache_name in first_stats:
256
first = first_stats[cache_name]
257
second = second_stats[cache_name]
258
259
total_ops = second.hits + second.misses
260
if total_ops > 0:
261
hit_ratio = second.hits / total_ops
262
cache_benefit = second.hits - first.hits
263
264
print(f"{cache_name}:")
265
print(f" Total operations: {total_ops}")
266
print(f" Hit ratio: {hit_ratio:.1%}")
267
print(f" Cache benefit: {cache_benefit} hits saved")
268
print(f" Current size: {second.currsize}/{second.maxsize}")
269
270
# Suggest cache size adjustments
271
if hit_ratio < 0.5 and second.currsize == second.maxsize:
272
print(f" 💡 Consider increasing {cache_name}_size")
273
elif second.currsize < second.maxsize * 0.3:
274
print(f" 💡 Consider decreasing {cache_name}_size")
275
print()
276
277
# Test with various URL patterns
278
test_urls = [
279
# International domains (benefit from IDNA caching)
280
'https://café.example.com/api',
281
'https://münchen.de/search',
282
'https://москва.рф/news',
283
284
# IP addresses (benefit from IP validation caching)
285
'https://192.168.1.1:8080/status',
286
'https://10.0.0.1/api/health',
287
288
# Regular domains (benefit from host validation caching)
289
'https://api.example.com/v1/users',
290
'https://cdn.example.org/images/logo.png',
291
] * 20 # Repeat for meaningful cache statistics
292
293
analyze_cache_performance(test_urls)
294
```
295
296
### Production Cache Configuration
297
298
```python
299
from yarl import cache_configure, cache_info
300
import os
301
302
def configure_production_cache():
303
"""Configure caches based on environment and expected load."""
304
305
# Get configuration from environment or use defaults
306
idna_encode_size = int(os.environ.get('YARL_IDNA_ENCODE_CACHE', 1024))
307
idna_decode_size = int(os.environ.get('YARL_IDNA_DECODE_CACHE', 1024))
308
ip_address_size = int(os.environ.get('YARL_IP_ADDRESS_CACHE', 512))
309
host_validate_size = int(os.environ.get('YARL_HOST_VALIDATE_CACHE', 512))
310
encode_host_size = int(os.environ.get('YARL_ENCODE_HOST_CACHE', 1024))
311
312
cache_configure(
313
idna_encode_size=idna_encode_size,
314
idna_decode_size=idna_decode_size,
315
ip_address_size=ip_address_size,
316
host_validate_size=host_validate_size,
317
encode_host_size=encode_host_size
318
)
319
320
# Log configuration
321
stats = cache_info()
322
print("YARL cache configuration:")
323
for cache_name, info in stats.items():
324
print(f" {cache_name}: maxsize={info.maxsize}")
325
326
# Call during application startup
327
configure_production_cache()
328
```
329
330
## Performance Considerations
331
332
### When Caching Helps Most
333
334
- **International Domain Names**: URLs with non-ASCII characters benefit significantly from IDNA caching
335
- **Repeated URL Processing**: Applications that process the same URLs multiple times
336
- **High-Volume Applications**: Web servers, crawlers, or API clients processing many URLs
337
- **IP Address Heavy Workloads**: Applications dealing with many IP-based URLs
338
339
### Cache Size Tuning Guidelines
340
341
- **Small Applications**: Default sizes (256) are usually sufficient
342
- **High-Volume Applications**: Increase to 1024-2048 for frequently accessed caches
343
- **Memory-Constrained**: Reduce to 32-128 to minimize memory usage
344
- **Monitoring**: Use `cache_info()` to monitor hit ratios and adjust accordingly
345
346
### Optimal Hit Ratios
347
348
- **>80%**: Excellent cache performance, consider current size appropriate
349
- **50-80%**: Good performance, monitor for opportunities to increase cache size
350
- **<50%**: Poor cache performance, consider increasing cache size or investigating URL patterns