0
# Loaders and Dumpers
1
2
Comprehensive set of loader and dumper classes providing different security levels and performance characteristics. Choose the appropriate loader/dumper based on your security requirements and performance needs.
3
4
## Capabilities
5
6
### Loader Classes
7
8
Different loader classes provide varying levels of security and functionality when parsing YAML content.
9
10
```python { .api }
11
class BaseLoader(Reader, Scanner, Parser, Composer, BaseConstructor, BaseResolver):
12
"""
13
Base loader with minimal functionality.
14
15
Provides basic YAML parsing without advanced type construction.
16
Only constructs basic Python types (str, int, float, bool, list, dict, None).
17
"""
18
19
class SafeLoader(Reader, Scanner, Parser, Composer, SafeConstructor, Resolver):
20
"""
21
Safe loader for untrusted input.
22
23
Constructs only basic YAML types and standard scalar types.
24
Cannot execute arbitrary Python code or access dangerous functionality.
25
Recommended for processing YAML from untrusted sources.
26
"""
27
28
class FullLoader(Reader, Scanner, Parser, Composer, FullConstructor, Resolver):
29
"""
30
Full loader with security restrictions.
31
32
Constructs most YAML types but prevents known dangerous operations.
33
Good balance between functionality and security.
34
Recommended for most use cases with trusted or semi-trusted input.
35
"""
36
37
class Loader(Reader, Scanner, Parser, Composer, Constructor, Resolver):
38
"""
39
Full-featured loader without security restrictions.
40
41
Can construct arbitrary Python objects and execute Python code.
42
Provides complete YAML functionality but is unsafe for untrusted input.
43
Identical to UnsafeLoader.
44
"""
45
46
class UnsafeLoader(Reader, Scanner, Parser, Composer, Constructor, Resolver):
47
"""
48
Explicitly unsafe loader.
49
50
Identical to Loader but with a name that clearly indicates the security risk.
51
Can execute arbitrary Python code during loading.
52
Only use with completely trusted input.
53
"""
54
```
55
56
### C Extension Loaders
57
58
High-performance C-based loaders available when LibYAML is installed:
59
60
```python { .api }
61
class CBaseLoader:
62
"""C-based BaseLoader implementation."""
63
64
class CSafeLoader:
65
"""C-based SafeLoader implementation."""
66
67
class CFullLoader:
68
"""C-based FullLoader implementation."""
69
70
class CLoader:
71
"""C-based Loader implementation."""
72
73
class CUnsafeLoader:
74
"""C-based UnsafeLoader implementation."""
75
```
76
77
### Dumper Classes
78
79
Different dumper classes provide varying levels of functionality and output compatibility.
80
81
```python { .api }
82
class BaseDumper(Emitter, Serializer, BaseRepresenter, BaseResolver):
83
"""
84
Base dumper with minimal functionality.
85
86
Can represent basic Python types using standard YAML tags.
87
Produces output that is compatible with any YAML parser.
88
"""
89
90
class SafeDumper(Emitter, Serializer, SafeRepresenter, Resolver):
91
"""
92
Safe dumper producing basic YAML output.
93
94
Represents only basic Python types and standard scalars.
95
Output is guaranteed to be safe for any YAML parser to consume.
96
Recommended for configuration files and data exchange.
97
"""
98
99
class Dumper(Emitter, Serializer, Representer, Resolver):
100
"""
101
Full-featured dumper with Python object support.
102
103
Can represent arbitrary Python objects using Python-specific YAML tags.
104
Output may not be readable by non-Python YAML parsers.
105
Use when preserving exact Python object types is important.
106
"""
107
```
108
109
### C Extension Dumpers
110
111
High-performance C-based dumpers available when LibYAML is installed:
112
113
```python { .api }
114
class CBaseDumper:
115
"""C-based BaseDumper implementation."""
116
117
class CSafeDumper:
118
"""C-based SafeDumper implementation."""
119
120
class CDumper:
121
"""C-based Dumper implementation."""
122
```
123
124
## Usage Examples
125
126
### Choosing the Right Loader
127
128
```python
129
import yaml
130
131
yaml_content = """
132
name: John Doe
133
birth_date: 1990-01-15
134
scores: [85, 92, 78]
135
metadata:
136
created: 2023-01-01T10:00:00Z
137
tags: !!python/list [tag1, tag2]
138
"""
139
140
# SafeLoader - only basic types, ignores Python-specific tags
141
try:
142
data_safe = yaml.load(yaml_content, yaml.SafeLoader)
143
print(f"birth_date type: {type(data_safe['birth_date'])}") # str
144
print(f"tags: {data_safe['metadata'].get('tags', 'Missing')}") # Missing
145
except yaml.ConstructorError as e:
146
print(f"SafeLoader error: {e}")
147
148
# FullLoader - more types but still restricted
149
data_full = yaml.load(yaml_content, yaml.FullLoader)
150
print(f"birth_date type: {type(data_full['birth_date'])}") # datetime.date
151
print(f"created type: {type(data_full['metadata']['created'])}") # datetime.datetime
152
153
# UnsafeLoader - can handle Python-specific tags (dangerous!)
154
data_unsafe = yaml.load(yaml_content, yaml.UnsafeLoader)
155
print(f"tags type: {type(data_unsafe['metadata']['tags'])}") # list
156
```
157
158
### Performance with C Extensions
159
160
```python
161
import yaml
162
import time
163
164
large_data = {'items': [{'id': i, 'value': f'item_{i}'} for i in range(10000)]}
165
166
# Check if C extensions are available
167
if yaml.__with_libyaml__:
168
print("LibYAML C extensions available")
169
170
# Benchmark Python vs C dumping
171
start = time.time()
172
yaml_py = yaml.dump(large_data, Dumper=yaml.Dumper)
173
py_time = time.time() - start
174
175
start = time.time()
176
yaml_c = yaml.dump(large_data, Dumper=yaml.CDumper)
177
c_time = time.time() - start
178
179
print(f"Python dumper: {py_time:.3f}s")
180
print(f"C dumper: {c_time:.3f}s")
181
print(f"Speedup: {py_time/c_time:.1f}x")
182
183
# Benchmark loading
184
start = time.time()
185
data_py = yaml.load(yaml_c, Loader=yaml.Loader)
186
py_load_time = time.time() - start
187
188
start = time.time()
189
data_c = yaml.load(yaml_c, Loader=yaml.CLoader)
190
c_load_time = time.time() - start
191
192
print(f"Python loader: {py_load_time:.3f}s")
193
print(f"C loader: {c_load_time:.3f}s")
194
print(f"Load speedup: {py_load_time/c_load_time:.1f}x")
195
else:
196
print("LibYAML C extensions not available")
197
```
198
199
### Creating Custom Loaders and Dumpers
200
201
```python
202
import yaml
203
from datetime import datetime
204
205
# Custom loader with additional constructor
206
class CustomLoader(yaml.SafeLoader):
207
pass
208
209
def timestamp_constructor(loader, node):
210
"""Custom constructor for timestamp format."""
211
value = loader.construct_scalar(node)
212
return datetime.fromisoformat(value.replace('Z', '+00:00'))
213
214
# Register custom constructor
215
CustomLoader.add_constructor('!timestamp', timestamp_constructor)
216
217
# Custom dumper with additional representer
218
class CustomDumper(yaml.SafeDumper):
219
pass
220
221
def timestamp_representer(dumper, data):
222
"""Custom representer for datetime objects."""
223
return dumper.represent_scalar('!timestamp', data.isoformat() + 'Z')
224
225
# Register custom representer
226
CustomDumper.add_representer(datetime, timestamp_representer)
227
228
# Usage
229
yaml_with_custom = """
230
created: !timestamp 2023-01-01T10:00:00Z
231
updated: !timestamp 2023-12-15T14:30:00Z
232
"""
233
234
data = yaml.load(yaml_with_custom, CustomLoader)
235
print(f"Created: {data['created']} ({type(data['created'])})")
236
237
# Dump back with custom format
238
output = yaml.dump(data, CustomDumper)
239
print(output)
240
```
241
242
## Security Comparison
243
244
| Loader | Security | Features | Use Cases |
245
|--------|----------|----------|-----------|
246
| SafeLoader | Highest | Basic types only | Untrusted input, config files |
247
| FullLoader | High | Most types, restricted | Semi-trusted input, data exchange |
248
| Loader/UnsafeLoader | None | All features | Trusted input, object persistence |
249
250
### Type Support by Loader
251
252
| Python Type | SafeLoader | FullLoader | Loader/UnsafeLoader |
253
|-------------|------------|------------|---------------------|
254
| str, int, float, bool, None | ✓ | ✓ | ✓ |
255
| list, dict | ✓ | ✓ | ✓ |
256
| datetime.date | ✗ | ✓ | ✓ |
257
| datetime.datetime | ✗ | ✓ | ✓ |
258
| set, tuple | ✗ | ✓ | ✓ |
259
| Arbitrary Python objects | ✗ | ✗ | ✓ |
260
| Function calls | ✗ | ✗ | ✓ |
261
262
## Component Architecture
263
264
Loaders and dumpers are composed of multiple processing components:
265
266
### Loader Components
267
268
- **Reader**: Input stream handling and encoding detection
269
- **Scanner**: Tokenization (character stream → tokens)
270
- **Parser**: Syntax analysis (tokens → events)
271
- **Composer**: Tree building (events → representation nodes)
272
- **Constructor**: Object construction (nodes → Python objects)
273
- **Resolver**: Tag resolution and type detection
274
275
### Dumper Components
276
277
- **Representer**: Object representation (Python objects → nodes)
278
- **Serializer**: Tree serialization (nodes → events)
279
- **Emitter**: Text generation (events → YAML text)
280
- **Resolver**: Tag resolution for output
281
282
### Inheritance Hierarchy
283
284
```python
285
# Example of how loaders combine components
286
class SafeLoader(
287
Reader, # Input handling
288
Scanner, # Tokenization
289
Parser, # Parsing
290
Composer, # Tree composition
291
SafeConstructor, # Safe object construction
292
Resolver # Tag resolution
293
):
294
pass
295
```
296
297
This modular design allows for:
298
- Easy customization by inheriting and overriding specific components
299
- Mix-and-match functionality from different security levels
300
- Adding custom constructors and representers
301
- Fine-grained control over processing pipeline
302
303
## Best Practices
304
305
### Security Guidelines
306
307
1. **Default to SafeLoader** for any external input
308
2. **Use FullLoader** for internal configuration with known structure
309
3. **Only use Loader/UnsafeLoader** with completely trusted input
310
4. **Never use unsafe loaders** with user-provided data
311
312
### Performance Guidelines
313
314
1. **Use C extensions** when available for large documents
315
2. **Choose appropriate loader** - don't use more features than needed
316
3. **Stream processing** for very large documents
317
4. **Reuse loader instances** when processing multiple similar documents
318
319
### Compatibility Guidelines
320
321
1. **Use SafeDumper output** for maximum compatibility
322
2. **Avoid Python-specific tags** in exchanged data
323
3. **Test with different parsers** if targeting non-Python consumers
324
4. **Document loader requirements** when distributing YAML files