0
# Caching
1
2
File-based HTTP caching with Cache-Control header support, ETag validation, and Last-Modified handling. Caching improves performance by storing responses locally and respecting HTTP cache semantics, reducing network requests and server load.
3
4
## Capabilities
5
6
### FileCache Class
7
8
File-based cache implementation that stores HTTP responses on disk with proper cache validation and expiration handling.
9
10
```python { .api }
11
class FileCache:
12
"""File-based HTTP cache implementation."""
13
14
def __init__(self, cache, safe=None):
15
"""
16
Initialize file cache.
17
18
Args:
19
cache (str): Directory path for cache storage
20
safe (callable): Function to generate safe filenames from cache keys
21
(default: uses built-in safe filename generation)
22
"""
23
24
def get(self, key):
25
"""
26
Retrieve cached content by key.
27
28
Args:
29
key (str): Cache key
30
31
Returns:
32
bytes: Cached content, or None if not found
33
"""
34
35
def set(self, key, value):
36
"""
37
Store content in cache.
38
39
Args:
40
key (str): Cache key
41
value (bytes): Content to cache
42
"""
43
44
def delete(self, key):
45
"""
46
Remove cached content.
47
48
Args:
49
key (str): Cache key to remove
50
"""
51
```
52
53
### Cache Integration
54
55
HTTP caching is integrated into the Http client and follows standard HTTP caching semantics.
56
57
```python { .api }
58
# Http constructor with caching
59
h = httplib2.Http(cache=cache_object_or_path)
60
61
# Cache can be:
62
# - None: No caching
63
# - str: Directory path (creates FileCache automatically)
64
# - FileCache instance: Custom cache implementation
65
# - Custom object: Must implement get/set/delete methods
66
```
67
68
### Usage Examples
69
70
#### Basic Caching Setup
71
72
```python
73
import httplib2
74
75
# Create Http client with file-based caching
76
h = httplib2.Http(".cache")
77
78
# First request - fetches from server and stores in cache
79
(resp, content) = h.request("http://example.org/data.json")
80
print(f"First request status: {resp.status}")
81
82
# Second request - may use cached version if still valid
83
(resp, content) = h.request("http://example.org/data.json")
84
if 'fromcache' in resp:
85
print("Response served from cache")
86
else:
87
print("Response fetched from server")
88
```
89
90
#### Cache Control Headers
91
92
```python
93
import httplib2
94
95
h = httplib2.Http(".cache")
96
97
# Force fresh request, bypassing cache
98
headers = {'cache-control': 'no-cache'}
99
(resp, content) = h.request("http://example.org/", headers=headers)
100
101
# Request with specific max-age
102
headers = {'cache-control': 'max-age=3600'} # 1 hour
103
(resp, content) = h.request("http://example.org/", headers=headers)
104
105
# Check server's cache control directives
106
cache_control = resp.get('cache-control', '')
107
if 'no-cache' in cache_control:
108
print("Server says not to cache this response")
109
elif 'max-age=' in cache_control:
110
max_age = cache_control.split('max-age=')[1].split(',')[0]
111
print(f"Server allows caching for {max_age} seconds")
112
```
113
114
#### ETag Validation
115
116
```python
117
import httplib2
118
119
h = httplib2.Http(".cache")
120
121
# First request stores ETag
122
(resp, content) = h.request("http://example.org/document")
123
etag = resp.get('etag')
124
if etag:
125
print(f"Document ETag: {etag}")
126
127
# Subsequent requests use If-None-Match header automatically
128
(resp, content) = h.request("http://example.org/document")
129
if resp.status == 304:
130
print("Document hasn't changed (304 Not Modified)")
131
elif resp.status == 200:
132
print("Document was updated")
133
```
134
135
#### Last-Modified Validation
136
137
```python
138
import httplib2
139
140
h = httplib2.Http(".cache")
141
142
# First request stores Last-Modified header
143
(resp, content) = h.request("http://example.org/file.txt")
144
last_modified = resp.get('last-modified')
145
if last_modified:
146
print(f"File last modified: {last_modified}")
147
148
# Subsequent requests use If-Modified-Since header automatically
149
(resp, content) = h.request("http://example.org/file.txt")
150
if resp.status == 304:
151
print("File hasn't been modified")
152
```
153
154
#### Custom Cache Implementation
155
156
```python
157
import httplib2
158
159
class MemoryCache:
160
"""Simple in-memory cache implementation."""
161
162
def __init__(self):
163
self.cache = {}
164
165
def get(self, key):
166
return self.cache.get(key)
167
168
def set(self, key, value):
169
self.cache[key] = value
170
171
def delete(self, key):
172
self.cache.pop(key, None)
173
174
# Use custom cache
175
memory_cache = MemoryCache()
176
h = httplib2.Http(cache=memory_cache)
177
178
(resp, content) = h.request("http://example.org/")
179
```
180
181
#### Cache Directory Management
182
183
```python
184
import httplib2
185
import os
186
187
# Create cache in specific directory
188
cache_dir = "/tmp/http_cache"
189
h = httplib2.Http(cache_dir)
190
191
# Check cache directory
192
if os.path.exists(cache_dir):
193
cache_files = os.listdir(cache_dir)
194
print(f"Cache contains {len(cache_files)} files")
195
196
# Clear cache manually if needed
197
import shutil
198
shutil.rmtree(cache_dir)
199
```
200
201
### Cache Validation
202
203
httplib2 implements proper HTTP cache validation:
204
205
#### Conditional Requests
206
207
- **If-None-Match**: Sent with ETag for validation
208
- **If-Modified-Since**: Sent with Last-Modified date
209
- **If-Match**: Used for safe updates (PUT with ETag)
210
211
#### Cache-Control Directives
212
213
**Request Directives:**
214
- `no-cache`: Force revalidation with server
215
- `no-store`: Don't cache request or response
216
- `max-age=<seconds>`: Maximum acceptable age
217
- `max-stale=<seconds>`: Accept stale responses
218
219
**Response Directives:**
220
- `no-cache`: Must revalidate before use
221
- `no-store`: Don't cache this response
222
- `private`: Only cache in private caches
223
- `public`: May cache in shared caches
224
- `max-age=<seconds>`: Maximum cache lifetime
225
- `must-revalidate`: Revalidate when stale
226
227
#### Expiration Handling
228
229
```python
230
import httplib2
231
from datetime import datetime, timedelta
232
233
h = httplib2.Http(".cache")
234
235
# Response with explicit expiration
236
(resp, content) = h.request("http://example.org/news")
237
238
# Check expiration information
239
expires = resp.get('expires')
240
cache_control = resp.get('cache-control', '')
241
242
if 'max-age=' in cache_control:
243
max_age = int(cache_control.split('max-age=')[1].split(',')[0])
244
print(f"Cached for {max_age} seconds")
245
elif expires:
246
print(f"Expires: {expires}")
247
```
248
249
### Cache Behavior
250
251
#### GET Requests
252
- Cached by default if response allows caching
253
- Validated using ETag/Last-Modified on subsequent requests
254
- Returns cached version if still valid
255
256
#### HEAD Requests
257
- Used for cache validation
258
- Updates cached headers without body
259
- Efficient for checking resource status
260
261
#### POST/PUT/DELETE Requests
262
- Generally not cached
263
- May invalidate related cached entries
264
- PUT requests include ETag for lost update protection
265
266
#### Redirect Caching
267
- 301 (Permanent) redirects are cached
268
- Subsequent requests to same URL skip redirect
269
- Improves performance for moved resources
270
271
### Cache Key Generation
272
273
Cache keys are generated from:
274
- Request URL (normalized)
275
- Request method
276
- Relevant headers (Vary header consideration)
277
278
Safe filename generation:
279
- URL-unsafe characters are encoded
280
- Long URLs are hashed
281
- Collision avoidance
282
- Cross-platform compatibility
283
284
### Performance Benefits
285
286
- **Reduced Network Traffic**: Fewer requests to server
287
- **Lower Latency**: Local cache access is faster
288
- **Bandwidth Savings**: Cached responses don't use bandwidth
289
- **Server Load Reduction**: Fewer requests to handle
290
- **Offline Capability**: Cached content available when offline (if still valid)
291
292
### Cache Management
293
294
```python
295
import httplib2
296
import os
297
import time
298
299
# Monitor cache usage
300
cache_dir = ".cache"
301
h = httplib2.Http(cache_dir)
302
303
def cache_stats(cache_path):
304
"""Get cache statistics."""
305
if not os.path.exists(cache_path):
306
return {"files": 0, "size": 0}
307
308
files = os.listdir(cache_path)
309
total_size = sum(
310
os.path.getsize(os.path.join(cache_path, f))
311
for f in files
312
)
313
314
return {
315
"files": len(files),
316
"size": total_size,
317
"size_mb": total_size / (1024 * 1024)
318
}
319
320
# Before requests
321
stats_before = cache_stats(cache_dir)
322
print(f"Cache before: {stats_before}")
323
324
# Make some requests
325
for url in ["http://example.org/", "http://example.org/about", "http://example.org/contact"]:
326
(resp, content) = h.request(url)
327
328
# After requests
329
stats_after = cache_stats(cache_dir)
330
print(f"Cache after: {stats_after}")
331
```