0
# Customization and Extension
1
2
Advanced customization capabilities for extending YAML processing with custom constructors, representers, and resolvers. Tailor PyYAML behavior to handle custom data types and implement domain-specific YAML formats.
3
4
## Capabilities
5
6
### Constructor Management
7
8
Add custom constructors to handle specific YAML tags and convert them to Python objects during loading.
9
10
```python { .api }
11
def add_constructor(tag, constructor, Loader=None):
12
"""
13
Add a constructor for the given tag.
14
15
Args:
16
tag (str): YAML tag to handle (e.g., '!custom', 'tag:example.com,2000:app/custom')
17
constructor (Callable): Function that accepts (loader, node) and returns Python object
18
Loader (type, optional): Specific loader class to add to. If None, adds to multiple loaders.
19
20
Constructor Function Signature:
21
def constructor(loader: BaseLoader, node: Node) -> Any
22
"""
23
24
def add_multi_constructor(tag_prefix, multi_constructor, Loader=None):
25
"""
26
Add a multi-constructor for the given tag prefix.
27
28
Multi-constructor is called for any tag that starts with the specified prefix.
29
30
Args:
31
tag_prefix (str): Tag prefix to match (e.g., '!custom:', 'tag:example.com,2000:app/')
32
multi_constructor (Callable): Function that accepts (loader, tag_suffix, node)
33
Loader (type, optional): Specific loader class to add to
34
35
Multi-Constructor Function Signature:
36
def multi_constructor(loader: BaseLoader, tag_suffix: str, node: Node) -> Any
37
"""
38
```
39
40
### Representer Management
41
42
Add custom representers to control how Python objects are converted to YAML during dumping.
43
44
```python { .api }
45
def add_representer(data_type, representer, Dumper=Dumper):
46
"""
47
Add a representer for the given type.
48
49
Args:
50
data_type (type): Python type to represent
51
representer (Callable): Function that accepts (dumper, data) and returns Node
52
Dumper (type, optional): Dumper class to add to (default: Dumper)
53
54
Representer Function Signature:
55
def representer(dumper: BaseDumper, data: Any) -> Node
56
"""
57
58
def add_multi_representer(data_type, multi_representer, Dumper=Dumper):
59
"""
60
Add a representer for the given type and its subclasses.
61
62
Multi-representer handles the specified type and all its subclasses.
63
64
Args:
65
data_type (type): Base Python type to represent
66
multi_representer (Callable): Function that accepts (dumper, data) and returns Node
67
Dumper (type, optional): Dumper class to add to
68
69
Multi-Representer Function Signature:
70
def multi_representer(dumper: BaseDumper, data: Any) -> Node
71
"""
72
```
73
74
### Resolver Management
75
76
Add custom resolvers to automatically detect and tag scalar values based on patterns.
77
78
```python { .api }
79
def add_implicit_resolver(tag, regexp, first=None, Loader=None, Dumper=Dumper):
80
"""
81
Add an implicit scalar detector.
82
83
If a scalar value matches the given regexp, the corresponding tag is assigned.
84
85
Args:
86
tag (str): YAML tag to assign when pattern matches
87
regexp (re.Pattern): Regular expression to match scalar values
88
first (str, optional): Sequence of possible first characters for optimization
89
Loader (type, optional): Loader class to add to
90
Dumper (type, optional): Dumper class to add to
91
"""
92
93
def add_path_resolver(tag, path, kind=None, Loader=None, Dumper=Dumper):
94
"""
95
Add a path-based resolver for the given tag.
96
97
A path is a list of keys that forms a path to a node in the representation tree.
98
99
Args:
100
tag (str): YAML tag to assign when path matches
101
path (list): List of keys forming path to node (strings, integers, or None)
102
kind (type, optional): Node type to match (ScalarNode, SequenceNode, MappingNode)
103
Loader (type, optional): Loader class to add to
104
Dumper (type, optional): Dumper class to add to
105
"""
106
```
107
108
## Usage Examples
109
110
### Custom Data Types
111
112
```python
113
import yaml
114
from decimal import Decimal
115
from datetime import datetime
116
import re
117
118
# Custom constructor for Decimal type
119
def decimal_constructor(loader, node):
120
"""Convert YAML scalar to Decimal."""
121
value = loader.construct_scalar(node)
122
return Decimal(value)
123
124
# Custom representer for Decimal type
125
def decimal_representer(dumper, data):
126
"""Convert Decimal to YAML scalar."""
127
return dumper.represent_scalar('!decimal', str(data))
128
129
# Register custom handlers
130
yaml.add_constructor('!decimal', decimal_constructor)
131
yaml.add_representer(Decimal, decimal_representer)
132
133
# Usage
134
yaml_content = """
135
price: !decimal 19.99
136
tax_rate: !decimal 0.08
137
"""
138
139
data = yaml.load(yaml_content, yaml.Loader)
140
print(f"Price: {data['price']} ({type(data['price'])})") # Decimal
141
142
# Dump back to YAML
143
output_data = {'total': Decimal('27.50'), 'discount': Decimal('5.00')}
144
yaml_output = yaml.dump(output_data)
145
print(yaml_output)
146
# discount: !decimal 5.00
147
# total: !decimal 27.50
148
```
149
150
### Multi-Constructor Example
151
152
```python
153
import yaml
154
155
def env_constructor(loader, tag_suffix, node):
156
"""Constructor for environment variables with different types."""
157
value = loader.construct_scalar(node)
158
159
if tag_suffix == 'str':
160
return str(value)
161
elif tag_suffix == 'int':
162
return int(value)
163
elif tag_suffix == 'bool':
164
return value.lower() in ('true', '1', 'yes', 'on')
165
elif tag_suffix == 'list':
166
return value.split(',')
167
else:
168
return value
169
170
# Register multi-constructor for !env: prefix
171
yaml.add_multi_constructor('!env:', env_constructor)
172
173
yaml_content = """
174
database_host: !env:str localhost
175
database_port: !env:int 5432
176
debug_mode: !env:bool true
177
allowed_hosts: !env:list host1,host2,host3
178
"""
179
180
data = yaml.load(yaml_content, yaml.Loader)
181
print(f"Port: {data['database_port']} ({type(data['database_port'])})") # int
182
print(f"Debug: {data['debug_mode']} ({type(data['debug_mode'])})") # bool
183
print(f"Hosts: {data['allowed_hosts']}") # ['host1', 'host2', 'host3']
184
```
185
186
### Implicit Resolvers
187
188
```python
189
import yaml
190
import re
191
192
# Add resolver for email addresses
193
email_pattern = re.compile(r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$')
194
yaml.add_implicit_resolver('!email', email_pattern, ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z'])
195
196
# Constructor for email addresses
197
def email_constructor(loader, node):
198
value = loader.construct_scalar(node)
199
return {'email': value, 'domain': value.split('@')[1]}
200
201
yaml.add_constructor('!email', email_constructor)
202
203
yaml_content = """
204
admin: admin@example.com
205
support: support@company.org
206
"""
207
208
data = yaml.load(yaml_content, yaml.Loader)
209
print(f"Admin: {data['admin']}") # {'email': 'admin@example.com', 'domain': 'example.com'}
210
```
211
212
### Path Resolvers
213
214
```python
215
import yaml
216
from yaml.nodes import ScalarNode
217
218
# Add path resolver for configuration values
219
yaml.add_path_resolver('!config', ['config', None], ScalarNode)
220
221
def config_constructor(loader, node):
222
"""Special handling for config values."""
223
value = loader.construct_scalar(node)
224
return f"CONFIG:{value}"
225
226
yaml.add_constructor('!config', config_constructor)
227
228
yaml_content = """
229
config:
230
database_url: postgresql://localhost/myapp
231
api_key: secret123
232
timeout: 30
233
"""
234
235
data = yaml.load(yaml_content, yaml.Loader)
236
print(data['config']['database_url']) # CONFIG:postgresql://localhost/myapp
237
```
238
239
### Custom Loader and Dumper Classes
240
241
```python
242
import yaml
243
from datetime import datetime
244
import json
245
246
class ApplicationLoader(yaml.SafeLoader):
247
"""Custom loader for application-specific YAML."""
248
pass
249
250
class ApplicationDumper(yaml.SafeDumper):
251
"""Custom dumper for application-specific YAML."""
252
pass
253
254
# JSON constructor
255
def json_constructor(loader, node):
256
"""Parse JSON embedded in YAML."""
257
value = loader.construct_scalar(node)
258
return json.loads(value)
259
260
# JSON representer
261
def json_representer(dumper, data):
262
"""Represent dict as embedded JSON."""
263
return dumper.represent_scalar('!json', json.dumps(data))
264
265
# Register with custom classes
266
ApplicationLoader.add_constructor('!json', json_constructor)
267
ApplicationDumper.add_representer(dict, json_representer)
268
269
# Timestamp constructor
270
def timestamp_constructor(loader, node):
271
value = loader.construct_scalar(node)
272
return datetime.fromisoformat(value)
273
274
ApplicationLoader.add_constructor('!timestamp', timestamp_constructor)
275
276
yaml_content = """
277
metadata: !json {"version": "1.0", "author": "Developer"}
278
created: !timestamp 2023-01-01T10:00:00
279
"""
280
281
data = yaml.load(yaml_content, ApplicationLoader)
282
print(f"Metadata: {data['metadata']}") # {'version': '1.0', 'author': 'Developer'}
283
print(f"Created: {data['created']}") # datetime object
284
```
285
286
## Advanced Customization Patterns
287
288
### YAMLObject Base Class
289
290
Create self-serializing objects using the YAMLObject base class:
291
292
```python
293
import yaml
294
295
class Person(yaml.YAMLObject):
296
yaml_tag = '!Person'
297
yaml_loader = yaml.Loader
298
yaml_dumper = yaml.Dumper
299
300
def __init__(self, name, age, email):
301
self.name = name
302
self.age = age
303
self.email = email
304
305
def __repr__(self):
306
return f"Person(name={self.name!r}, age={self.age!r}, email={self.email!r})"
307
308
# Usage - automatic registration
309
yaml_content = """
310
person: !Person
311
name: John Doe
312
age: 30
313
email: john@example.com
314
"""
315
316
data = yaml.load(yaml_content, yaml.Loader)
317
print(data['person']) # Person(name='John Doe', age=30, email='john@example.com')
318
319
# Automatic dumping
320
person = Person("Jane Smith", 25, "jane@example.com")
321
yaml_output = yaml.dump({'employee': person})
322
print(yaml_output)
323
```
324
325
### State-Aware Constructors
326
327
```python
328
import yaml
329
330
class DatabaseConfig:
331
def __init__(self, host, port, database):
332
self.host = host
333
self.port = port
334
self.database = database
335
self.connection_string = f"postgresql://{host}:{port}/{database}"
336
337
def database_constructor(loader, node):
338
"""Constructor that maintains parsing state."""
339
# Get the mapping as a dictionary
340
config = loader.construct_mapping(node, deep=True)
341
342
# Validate required fields
343
required = ['host', 'port', 'database']
344
missing = [field for field in required if field not in config]
345
if missing:
346
raise yaml.ConstructorError(
347
None, None,
348
f"Missing required fields: {missing}",
349
node.start_mark
350
)
351
352
return DatabaseConfig(
353
host=config['host'],
354
port=config['port'],
355
database=config['database']
356
)
357
358
yaml.add_constructor('!database', database_constructor)
359
360
yaml_content = """
361
prod_db: !database
362
host: prod.example.com
363
port: 5432
364
database: production
365
"""
366
367
data = yaml.load(yaml_content, yaml.Loader)
368
print(data['prod_db'].connection_string)
369
```
370
371
### Dynamic Tag Generation
372
373
```python
374
import yaml
375
376
class VersionedData:
377
def __init__(self, version, data):
378
self.version = version
379
self.data = data
380
381
def versioned_multi_constructor(loader, tag_suffix, node):
382
"""Handle versioned data tags like !v1.0, !v2.0, etc."""
383
version = tag_suffix
384
data = loader.construct_mapping(node, deep=True)
385
return VersionedData(version, data)
386
387
def versioned_representer(dumper, data):
388
"""Represent versioned data with appropriate tag."""
389
tag = f'!v{data.version}'
390
return dumper.represent_mapping(tag, data.data)
391
392
yaml.add_multi_constructor('!v', versioned_multi_constructor)
393
yaml.add_representer(VersionedData, versioned_representer)
394
395
yaml_content = """
396
config: !v1.2
397
api_endpoint: /api/v1
398
features: [auth, logging]
399
"""
400
401
data = yaml.load(yaml_content, yaml.Loader)
402
print(f"Version: {data['config'].version}") # 1.2
403
print(f"Features: {data['config'].data['features']}") # ['auth', 'logging']
404
```
405
406
## Best Practices
407
408
### Security Considerations
409
410
1. **Validate input** in custom constructors
411
2. **Use SafeLoader as base** for custom loaders when possible
412
3. **Avoid dangerous operations** in constructors (file I/O, subprocess, etc.)
413
4. **Sanitize tag names** to prevent injection attacks
414
415
### Performance Tips
416
417
1. **Use first parameter** in implicit resolvers for optimization
418
2. **Cache compiled regexes** in resolver functions
419
3. **Minimize object creation** in frequently-used constructors
420
4. **Prefer multi-constructors** over many individual constructors
421
422
### Maintainability
423
424
1. **Document custom tags** and their expected format
425
2. **Provide validation** in constructors with clear error messages
426
3. **Use descriptive tag names** that indicate purpose
427
4. **Group related customizations** in custom loader/dumper classes
428
429
### Compatibility
430
431
1. **Test with different PyYAML versions** when using advanced features
432
2. **Provide fallbacks** for missing custom tags
433
3. **Document dependencies** when sharing customized YAML files
434
4. **Consider standard YAML tags** before creating custom ones