Tessl Tile for pypi/yarl@1.20.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

cache-management.md core-url.md index.md path-operations.md query-handling.md url-modification.md

cache-management.mddocs/

0
# Cache Management
1

2
Performance optimization through configurable LRU caching for encoding/decoding operations. YARL uses caching to improve performance when processing many URLs, especially those with international domain names or complex encoding requirements.
3

4
## Cache Types
5

6
YARL maintains separate LRU caches for different operations to optimize performance:
7

8
- **IDNA Encoding**: International domain name encoding (Unicode to ASCII)
9
- **IDNA Decoding**: International domain name decoding (ASCII to Unicode)  
10
- **Host Encoding**: Complete host encoding operations
11

12
**Deprecated Cache Types** (kept for backwards compatibility):
13
- **IP Address Validation**: No longer used, functionality moved to host encoding
14
- **Host Validation**: No longer used, functionality moved to host encoding
15

16
## Capabilities
17

18
### Cache Configuration
19

20
Configure cache sizes for different operations to balance memory usage and performance.
21

22
```python { .api }
23
def cache_configure(*,
24
                   idna_encode_size: int | None = 256,
25
                   idna_decode_size: int | None = 256,
26
                   ip_address_size: int | None = None,
27
                   host_validate_size: int | None = None,
28
                   encode_host_size: int | None = None) -> None:
29
    """
30
    Configure LRU cache sizes for URL processing operations.
31
    
32
    Args:
33
        idna_encode_size (int | None): Cache size for IDNA encoding operations
34
        idna_decode_size (int | None): Cache size for IDNA decoding operations
35
        ip_address_size (int | None): DEPRECATED - kept for backwards compatibility
36
        host_validate_size (int | None): DEPRECATED - kept for backwards compatibility
37
        encode_host_size (int | None): Cache size for host encoding operations
38
    
39
    Note:
40
        ip_address_size and host_validate_size are deprecated and will be removed
41
        in future versions. They are kept for backwards compatibility only.
42
        
43
    Examples:
44
        # Increase cache sizes for high-volume applications
45
        cache_configure(
46
            idna_encode_size=1024,
47
            idna_decode_size=1024,
48
            ip_address_size=512,
49
            host_validate_size=512,
50
            encode_host_size=1024
51
        )
52
        
53
        # Reduce memory usage for memory-constrained environments
54
        cache_configure(
55
            idna_encode_size=64,
56
            idna_decode_size=64,
57
            ip_address_size=32,
58
            host_validate_size=32,
59
            encode_host_size=64
60
        )
61
    """
62
```
63

64
### Cache Information
65

66
Retrieve statistics about cache performance to monitor effectiveness and tune cache sizes.
67

68
```python { .api }
69
def cache_info() -> CacheInfo:
70
    """
71
    Get cache statistics for all URL processing caches.
72
    
73
    Returns:
74
        CacheInfo: Dictionary containing cache statistics for each operation
75
        
76
    Examples:
77
        info = cache_info()
78
        print(f"IDNA encode hits: {info['idna_encode'].hits}")
79
        print(f"IDNA encode misses: {info['idna_encode'].misses}")
80
        print(f"Cache hit ratio: {info['idna_encode'].hits / (info['idna_encode'].hits + info['idna_encode'].misses)}")
81
    """
82

83
class CacheInfo(TypedDict):
84
    """
85
    Cache information structure containing statistics for each cache type.
86
    
87
    Each cache entry contains standard functools.lru_cache statistics:
88
    - hits: Number of cache hits
89
    - misses: Number of cache misses  
90
    - maxsize: Maximum cache size
91
    - currsize: Current number of cached items
92
    """
93
    idna_encode: _CacheInfo
94
    idna_decode: _CacheInfo
95
    ip_address: _CacheInfo
96
    host_validate: _CacheInfo
97
    encode_host: _CacheInfo
98
```
99

100
### Cache Management
101

102
Clear caches to free memory or reset performance counters.
103

104
```python { .api }
105
def cache_clear() -> None:
106
    """
107
    Clear all URL processing caches.
108
    
109
    Removes all cached entries and resets performance counters.
110
    Useful for freeing memory or starting fresh performance measurements.
111
    
112
    Examples:
113
        # Clear caches after processing a large batch
114
        cache_clear()
115
        
116
        # Reset before performance testing
117
        cache_clear()
118
        process_urls(test_urls)
119
        stats = cache_info()
120
    """
121
```
122

123
## Usage Examples
124

125
### Basic Cache Management
126

127
```python
128
from yarl import URL, cache_info, cache_clear, cache_configure
129

130
# Check initial cache state
131
initial_info = cache_info()
132
print("Initial cache state:")
133
for cache_name, stats in initial_info.items():
134
    print(f"  {cache_name}: {stats.hits} hits, {stats.misses} misses")
135

136
# Process some URLs with international domains
137
urls = [
138
    URL('https://café.example.com/path'),
139
    URL('https://münchen.de/info'),
140
    URL('https://москва.рф/news'),
141
    URL('https://日本.jp/page')
142
]
143

144
# Check cache statistics after processing
145
final_info = cache_info()
146
print("\nAfter processing international domains:")
147
for cache_name, stats in final_info.items():
148
    if stats.hits > 0 or stats.misses > 0:
149
        hit_ratio = stats.hits / (stats.hits + stats.misses) if (stats.hits + stats.misses) > 0 else 0
150
        print(f"  {cache_name}: {stats.hits} hits, {stats.misses} misses (hit ratio: {hit_ratio:.2%})")
151
```
152

153
### Performance Optimization
154

155
```python
156
from yarl import URL, cache_configure, cache_info, cache_clear
157

158
# Configure larger caches for high-volume application
159
cache_configure(
160
    idna_encode_size=2048,
161
    idna_decode_size=2048, 
162
    ip_address_size=1024,
163
    host_validate_size=1024,
164
    encode_host_size=2048
165
)
166

167
def process_url_batch(urls):
168
    """Process a batch of URLs and return timing info."""
169
    import time
170
    
171
    start_time = time.time()
172
    processed_urls = [URL(url_str) for url_str in urls]
173
    end_time = time.time()
174
    
175
    return processed_urls, end_time - start_time
176

177
# Simulate processing batches of URLs
178
url_batch = [
179
    'https://café.example.com/api/v1/users',
180
    'https://münchen.de/api/search',
181
    'https://москва.рф/api/news',
182
] * 100  # Repeat for cache effectiveness
183

184
# First pass - populate caches
185
cache_clear()  # Start fresh
186
first_batch, first_time = process_url_batch(url_batch)
187
first_stats = cache_info()
188

189
# Second pass - should benefit from caching
190
second_batch, second_time = process_url_batch(url_batch)
191
second_stats = cache_info()
192

193
print(f"First pass: {first_time:.4f}s")
194
print(f"Second pass: {second_time:.4f}s")
195
print(f"Speedup: {first_time/second_time:.2f}x")
196

197
# Analyze cache effectiveness
198
for cache_name in first_stats:
199
    first = first_stats[cache_name]
200
    second = second_stats[cache_name]
201
    if second.hits > first.hits:
202
        print(f"{cache_name}: {second.hits - first.hits} additional hits in second pass")
203
```
204

205
### Memory-Constrained Environments
206

207
```python
208
from yarl import URL, cache_configure, cache_info
209

210
# Configure smaller caches for memory-constrained environments
211
cache_configure(
212
    idna_encode_size=32,
213
    idna_decode_size=32,
214
    ip_address_size=16,
215
    host_validate_size=16, 
216
    encode_host_size=32
217
)
218

219
# Process URLs and monitor memory usage
220
urls_to_process = [
221
    'https://example.com/api/users',
222
    'https://test.org/data',
223
    'https://192.168.1.1:8080/status',
224
] * 50
225

226
processed = [URL(url) for url in urls_to_process]
227

228
# Check cache utilization
229
stats = cache_info()
230
for cache_name, info in stats.items():
231
    if info.currsize > 0:
232
        utilization = info.currsize / info.maxsize
233
        print(f"{cache_name}: {info.currsize}/{info.maxsize} ({utilization:.1%} full)")
234
```
235

236
### Cache Monitoring and Tuning
237

238
```python
239
from yarl import URL, cache_info, cache_clear, cache_configure
240

241
def analyze_cache_performance(urls):
242
    """Analyze cache performance for a given set of URLs."""
243
    cache_clear()  # Start with empty caches
244
    
245
    # Process URLs twice to see caching benefit
246
    first_pass = [URL(url) for url in urls]
247
    first_stats = cache_info()
248
    
249
    second_pass = [URL(url) for url in urls]  
250
    second_stats = cache_info()
251
    
252
    print("Cache Performance Analysis:")
253
    print("-" * 50)
254
    
255
    for cache_name in first_stats:
256
        first = first_stats[cache_name]
257
        second = second_stats[cache_name]
258
        
259
        total_ops = second.hits + second.misses
260
        if total_ops > 0:
261
            hit_ratio = second.hits / total_ops
262
            cache_benefit = second.hits - first.hits
263
            
264
            print(f"{cache_name}:")
265
            print(f"  Total operations: {total_ops}")
266
            print(f"  Hit ratio: {hit_ratio:.1%}")
267
            print(f"  Cache benefit: {cache_benefit} hits saved")
268
            print(f"  Current size: {second.currsize}/{second.maxsize}")
269
            
270
            # Suggest cache size adjustments
271
            if hit_ratio < 0.5 and second.currsize == second.maxsize:
272
                print(f"  💡 Consider increasing {cache_name}_size")
273
            elif second.currsize < second.maxsize * 0.3:
274
                print(f"  💡 Consider decreasing {cache_name}_size")
275
            print()
276

277
# Test with various URL patterns
278
test_urls = [
279
    # International domains (benefit from IDNA caching)
280
    'https://café.example.com/api',
281
    'https://münchen.de/search', 
282
    'https://москва.рф/news',
283
    
284
    # IP addresses (benefit from IP validation caching)
285
    'https://192.168.1.1:8080/status',
286
    'https://10.0.0.1/api/health',
287
    
288
    # Regular domains (benefit from host validation caching)
289
    'https://api.example.com/v1/users',
290
    'https://cdn.example.org/images/logo.png',
291
] * 20  # Repeat for meaningful cache statistics
292

293
analyze_cache_performance(test_urls)
294
```
295

296
### Production Cache Configuration
297

298
```python
299
from yarl import cache_configure, cache_info
300
import os
301

302
def configure_production_cache():
303
    """Configure caches based on environment and expected load."""
304
    
305
    # Get configuration from environment or use defaults
306
    idna_encode_size = int(os.environ.get('YARL_IDNA_ENCODE_CACHE', 1024))
307
    idna_decode_size = int(os.environ.get('YARL_IDNA_DECODE_CACHE', 1024))
308
    ip_address_size = int(os.environ.get('YARL_IP_ADDRESS_CACHE', 512))
309
    host_validate_size = int(os.environ.get('YARL_HOST_VALIDATE_CACHE', 512))
310
    encode_host_size = int(os.environ.get('YARL_ENCODE_HOST_CACHE', 1024))
311
    
312
    cache_configure(
313
        idna_encode_size=idna_encode_size,
314
        idna_decode_size=idna_decode_size,
315
        ip_address_size=ip_address_size,
316
        host_validate_size=host_validate_size,
317
        encode_host_size=encode_host_size
318
    )
319
    
320
    # Log configuration
321
    stats = cache_info()
322
    print("YARL cache configuration:")
323
    for cache_name, info in stats.items():
324
        print(f"  {cache_name}: maxsize={info.maxsize}")
325

326
# Call during application startup
327
configure_production_cache()
328
```
329

330
## Performance Considerations
331

332
### When Caching Helps Most
333

334
- **International Domain Names**: URLs with non-ASCII characters benefit significantly from IDNA caching
335
- **Repeated URL Processing**: Applications that process the same URLs multiple times
336
- **High-Volume Applications**: Web servers, crawlers, or API clients processing many URLs
337
- **IP Address Heavy Workloads**: Applications dealing with many IP-based URLs
338

339
### Cache Size Tuning Guidelines
340

341
- **Small Applications**: Default sizes (256) are usually sufficient
342
- **High-Volume Applications**: Increase to 1024-2048 for frequently accessed caches
343
- **Memory-Constrained**: Reduce to 32-128 to minimize memory usage
344
- **Monitoring**: Use `cache_info()` to monitor hit ratios and adjust accordingly
345

346
### Optimal Hit Ratios
347

348
- **>80%**: Excellent cache performance, consider current size appropriate
349
- **50-80%**: Good performance, monitor for opportunities to increase cache size
350
- **<50%**: Poor cache performance, consider increasing cache size or investigating URL patterns

Version

Tile

Files

cache-management.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

cache-management.mddocs/