0
# Serialization
1
2
Serialization in requests-cache handles the conversion of HTTP response objects to and from storage formats. The system supports multiple serializers including pickle variants, JSON with binary support, YAML, and BSON, each optimized for different use cases and storage backends.
3
4
## Capabilities
5
6
### Serializer Pipeline System
7
8
The core serialization architecture uses a pipeline of stages to transform response data through multiple processing steps.
9
10
```python { .api }
11
class SerializerPipeline:
12
"""
13
Chain of serialization stages that process response data.
14
15
Pipelines combine multiple stages like base conversion (cattrs),
16
format-specific preprocessing, and final serialization.
17
"""
18
19
def __init__(
20
self,
21
stages: List[Stage],
22
name: Optional[str] = None
23
):
24
"""
25
Create serializer pipeline.
26
27
Parameters:
28
- stages: List of processing stages in order
29
- name: Optional name for the pipeline
30
"""
31
32
def dumps(self, obj: Any) -> bytes:
33
"""
34
Serialize object to bytes.
35
36
Parameters:
37
- obj: Object to serialize (typically CachedResponse)
38
39
Returns:
40
Serialized data as bytes
41
"""
42
43
def loads(self, data: bytes) -> Any:
44
"""
45
Deserialize bytes to object.
46
47
Parameters:
48
- data: Serialized data as bytes
49
50
Returns:
51
Deserialized object
52
"""
53
54
def set_decode_content(self, decode_content: bool) -> None:
55
"""
56
Configure whether to decode response content during serialization.
57
58
Parameters:
59
- decode_content: If True, decode binary content to text when possible
60
"""
61
62
class Stage:
63
"""
64
Individual processing stage within a serialization pipeline.
65
66
Stages can perform data transformation, format conversion,
67
or final serialization steps.
68
"""
69
70
def dumps(self, obj: Any) -> Any:
71
"""Process object for serialization."""
72
73
def loads(self, data: Any) -> Any:
74
"""Process data during deserialization."""
75
76
def init_serializer(
77
serializer: Optional[SerializerType],
78
decode_content: bool
79
) -> Optional[SerializerPipeline]:
80
"""
81
Initialize serializer by name or instance.
82
83
Parameters:
84
- serializer: Serializer name, Stage, or SerializerPipeline
85
- decode_content: Whether to decode response content
86
87
Returns:
88
Configured SerializerPipeline or None for no serialization
89
"""
90
```
91
92
### Cattrs Integration Stage
93
94
Core stage that handles conversion between response objects and serializable data structures using the cattrs library.
95
96
```python { .api }
97
class CattrStage(Stage):
98
"""
99
Serialization stage using cattrs converter.
100
101
Handles the complex conversion between CachedResponse objects
102
and dictionary representations that can be processed by
103
format-specific serializers.
104
"""
105
106
def __init__(
107
self,
108
factory: Optional[Callable] = None,
109
decode_content: bool = True,
110
**kwargs
111
):
112
"""
113
Create cattrs stage.
114
115
Parameters:
116
- factory: Optional factory function for converter creation
117
- decode_content: Whether to decode binary response content
118
- **kwargs: Additional converter configuration
119
"""
120
121
def dumps(self, response: CachedResponse) -> Dict[str, Any]:
122
"""Convert CachedResponse to serializable dict."""
123
124
def loads(self, data: Dict[str, Any]) -> CachedResponse:
125
"""Convert dict back to CachedResponse."""
126
```
127
128
### Available Serializers
129
130
Pre-configured serializer pipelines for different formats and use cases.
131
132
```python { .api }
133
# Pickle-based serializers
134
pickle_serializer: SerializerPipeline
135
"""
136
Standard pickle serialization for Python objects.
137
Fast and preserves all Python types but not cross-language compatible.
138
"""
139
140
safe_pickle_serializer: SerializerPipeline
141
"""
142
Pickle serialization with signature validation using itsdangerous.
143
Provides tamper detection for cached data.
144
"""
145
146
# JSON serializers
147
json_serializer: SerializerPipeline
148
"""
149
JSON serialization with binary content encoding.
150
Cross-language compatible, handles binary data via base64 encoding.
151
Uses ujson if available for better performance.
152
"""
153
154
# YAML serializer
155
yaml_serializer: SerializerPipeline
156
"""
157
YAML serialization for human-readable cached data.
158
Useful for debugging and manual cache inspection.
159
"""
160
161
# BSON serializers
162
bson_serializer: SerializerPipeline
163
"""
164
BSON (Binary JSON) serialization for MongoDB compatibility.
165
Efficient binary format with rich type support.
166
"""
167
168
bson_document_serializer: SerializerPipeline
169
"""
170
BSON serialization optimized for document storage.
171
Uses BSON-native types where possible.
172
"""
173
174
# Backend-specific serializers
175
dynamodb_document_serializer: SerializerPipeline
176
"""
177
Serialization optimized for DynamoDB document format.
178
Handles DynamoDB type constraints and limits.
179
"""
180
181
# Partial serializers
182
dict_serializer: SerializerPipeline
183
"""
184
Converts responses to dictionaries without final serialization.
185
Used internally and for debugging.
186
"""
187
188
utf8_encoder: Stage
189
"""
190
UTF-8 encoding stage for text-based formats.
191
Converts strings to bytes using UTF-8 encoding.
192
"""
193
```
194
195
#### Usage Examples
196
197
Basic serializer selection:
198
199
```python
200
from requests_cache import CachedSession
201
202
# Default pickle serializer (fast, Python-only)
203
session = CachedSession('cache', serializer='pickle')
204
205
# JSON serializer (cross-language compatible)
206
session = CachedSession('cache', serializer='json')
207
208
# YAML serializer (human readable)
209
session = CachedSession('cache', serializer='yaml')
210
211
# BSON serializer (binary JSON)
212
session = CachedSession('cache', serializer='bson')
213
```
214
215
Backend-optimized serializers:
216
217
```python
218
from requests_cache import CachedSession
219
220
# Optimized for DynamoDB backend
221
session = CachedSession(
222
'cache',
223
backend='dynamodb',
224
serializer='json' # DynamoDB works well with JSON
225
)
226
227
# Optimized for MongoDB/GridFS
228
session = CachedSession(
229
'cache',
230
backend='mongodb',
231
serializer='bson' # Native BSON support
232
)
233
234
# Safe pickle with signature validation
235
session = CachedSession(
236
'cache',
237
serializer=requests_cache.safe_pickle_serializer
238
)
239
```
240
241
### Custom Serialization Pipelines
242
243
Creating custom serializers for specific requirements:
244
245
```python { .api }
246
# Example: Custom pipeline creation
247
from requests_cache.serializers import SerializerPipeline, CattrStage
248
import json
249
250
class CustomJSONStage(Stage):
251
"""Custom JSON stage with specific formatting."""
252
253
def dumps(self, obj: Dict[str, Any]) -> bytes:
254
return json.dumps(obj, indent=2, sort_keys=True).encode('utf-8')
255
256
def loads(self, data: bytes) -> Dict[str, Any]:
257
return json.loads(data.decode('utf-8'))
258
259
# Create custom pipeline
260
custom_serializer = SerializerPipeline([
261
CattrStage(decode_content=True),
262
CustomJSONStage()
263
], name='custom_json')
264
265
# Use custom serializer
266
session = CachedSession(
267
'cache',
268
serializer=custom_serializer
269
)
270
```
271
272
### Serializer Configuration
273
274
Global configuration for available serializers:
275
276
```python { .api }
277
SERIALIZERS = {
278
'bson': bson_serializer,
279
'json': json_serializer,
280
'pickle': pickle_serializer,
281
'yaml': yaml_serializer,
282
}
283
"""Mapping of serializer names to pipeline instances."""
284
```
285
286
#### Usage Examples
287
288
Serializer feature comparison:
289
290
```python
291
from requests_cache import CachedSession
292
import time
293
294
# Test different serializers
295
serializers = ['pickle', 'json', 'yaml', 'bson']
296
297
for serializer_name in serializers:
298
session = CachedSession(f'{serializer_name}_cache', serializer=serializer_name)
299
300
start_time = time.time()
301
response = session.get('https://httpbin.org/json')
302
serialize_time = time.time() - start_time
303
304
print(f"{serializer_name:6} - Time: {serialize_time:.3f}s, Size: {response.size} bytes")
305
306
# All serializers preserve the same interface
307
assert response.json()['slideshow']['title'] == 'Sample Slide Show'
308
```
309
310
Content decoding configuration:
311
312
```python
313
from requests_cache import CachedSession
314
315
# Decode binary content to text when possible (default)
316
session = CachedSession('cache', serializer='json')
317
318
# Keep binary content as-is (faster, but larger cache files)
319
from requests_cache.serializers import init_serializer
320
serializer = init_serializer('json', decode_content=False)
321
session = CachedSession('cache', serializer=serializer)
322
```
323
324
### Performance Considerations
325
326
Different serializers have different performance characteristics:
327
328
- **Pickle**: Fastest serialization/deserialization, smallest size, Python-only
329
- **JSON**: Good cross-language compatibility, moderate performance, larger size for binary data
330
- **BSON**: Good performance, compact binary format, rich type support
331
- **YAML**: Slowest, largest size, but human-readable for debugging
332
333
```python
334
# Performance-optimized configuration
335
session = CachedSession(
336
'cache',
337
backend='sqlite', # Fast local storage
338
serializer='pickle', # Fastest serialization
339
)
340
341
# Cross-platform compatibility
342
session = CachedSession(
343
'cache',
344
backend='redis', # Shared storage
345
serializer='json', # Cross-language support
346
)
347
348
# Debugging-friendly setup
349
session = CachedSession(
350
'cache',
351
backend='filesystem', # Easy file access
352
serializer='yaml', # Human-readable output
353
)
354
```
355
356
### Binary Data Handling
357
358
Different serializers handle binary response content differently:
359
360
```python
361
from requests_cache import CachedSession
362
363
# Download binary content
364
session = CachedSession('cache', serializer='json')
365
response = session.get('https://httpbin.org/image/png')
366
367
# JSON serializer base64-encodes binary data
368
print(f"Content type: {response.headers['content-type']}")
369
print(f"Content length: {len(response.content)} bytes")
370
print(f"From cache: {response.from_cache}")
371
372
# Binary data is preserved across cache storage
373
assert response.content[:8] == b'\x89PNG\r\n\x1a\n' # PNG header
374
```
375
376
### Error Handling
377
378
Serialization errors and fallback behavior:
379
380
```python
381
from requests_cache import CachedSession
382
import requests_cache.serializers
383
384
# Serializers handle various edge cases
385
session = CachedSession('cache', serializer='json')
386
387
try:
388
# Some responses may have serialization challenges
389
response = session.get('https://httpbin.org/encoding/utf8')
390
print("Serialization successful")
391
except Exception as e:
392
print(f"Serialization error: {e}")
393
394
# Fallback to pickle for problematic responses
395
session.serializer = requests_cache.serializers.pickle_serializer
396
response = session.get('https://httpbin.org/encoding/utf8')
397
print("Fallback serialization successful")
398
```
399
400
## Types
401
402
```python { .api }
403
# Serializer type specifications
404
SerializerType = Union[str, SerializerPipeline, Stage]
405
"""Union type for serializer specifications (name, pipeline, or stage)."""
406
407
# Stage processing types
408
StageInput = Any # Input to serialization stage
409
StageOutput = Any # Output from serialization stage
410
```