CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/pypi-pyyaml

YAML parser and emitter for Python with complete YAML 1.1 support, Unicode handling, and optional LibYAML bindings for high performance

Pending

Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

Overview
Eval results
Files

loaders-dumpers.mddocs/

Loaders and Dumpers

Comprehensive set of loader and dumper classes providing different security levels and performance characteristics. Choose the appropriate loader/dumper based on your security requirements and performance needs.

Capabilities

Loader Classes

Different loader classes provide varying levels of security and functionality when parsing YAML content.

class BaseLoader(Reader, Scanner, Parser, Composer, BaseConstructor, BaseResolver):
    """
    Base loader with minimal functionality.
    
    Provides basic YAML parsing without advanced type construction.
    Only constructs basic Python types (str, int, float, bool, list, dict, None).
    """

class SafeLoader(Reader, Scanner, Parser, Composer, SafeConstructor, Resolver):
    """
    Safe loader for untrusted input.
    
    Constructs only basic YAML types and standard scalar types.
    Cannot execute arbitrary Python code or access dangerous functionality.
    Recommended for processing YAML from untrusted sources.
    """

class FullLoader(Reader, Scanner, Parser, Composer, FullConstructor, Resolver):
    """
    Full loader with security restrictions.
    
    Constructs most YAML types but prevents known dangerous operations.
    Good balance between functionality and security.
    Recommended for most use cases with trusted or semi-trusted input.
    """

class Loader(Reader, Scanner, Parser, Composer, Constructor, Resolver):
    """
    Full-featured loader without security restrictions.
    
    Can construct arbitrary Python objects and execute Python code.
    Provides complete YAML functionality but is unsafe for untrusted input.
    Identical to UnsafeLoader.
    """

class UnsafeLoader(Reader, Scanner, Parser, Composer, Constructor, Resolver):
    """
    Explicitly unsafe loader.
    
    Identical to Loader but with a name that clearly indicates the security risk.
    Can execute arbitrary Python code during loading.
    Only use with completely trusted input.
    """

C Extension Loaders

High-performance C-based loaders available when LibYAML is installed:

class CBaseLoader:
    """C-based BaseLoader implementation."""

class CSafeLoader:
    """C-based SafeLoader implementation."""

class CFullLoader:
    """C-based FullLoader implementation."""

class CLoader:
    """C-based Loader implementation."""

class CUnsafeLoader:
    """C-based UnsafeLoader implementation."""

Dumper Classes

Different dumper classes provide varying levels of functionality and output compatibility.

class BaseDumper(Emitter, Serializer, BaseRepresenter, BaseResolver):
    """
    Base dumper with minimal functionality.
    
    Can represent basic Python types using standard YAML tags.
    Produces output that is compatible with any YAML parser.
    """

class SafeDumper(Emitter, Serializer, SafeRepresenter, Resolver):
    """
    Safe dumper producing basic YAML output.
    
    Represents only basic Python types and standard scalars.
    Output is guaranteed to be safe for any YAML parser to consume.
    Recommended for configuration files and data exchange.
    """

class Dumper(Emitter, Serializer, Representer, Resolver):
    """
    Full-featured dumper with Python object support.
    
    Can represent arbitrary Python objects using Python-specific YAML tags.
    Output may not be readable by non-Python YAML parsers.
    Use when preserving exact Python object types is important.
    """

C Extension Dumpers

High-performance C-based dumpers available when LibYAML is installed:

class CBaseDumper:
    """C-based BaseDumper implementation."""

class CSafeDumper:
    """C-based SafeDumper implementation."""

class CDumper:
    """C-based Dumper implementation."""

Usage Examples

Choosing the Right Loader

import yaml

yaml_content = """
name: John Doe
birth_date: 1990-01-15
scores: [85, 92, 78]
metadata:
  created: 2023-01-01T10:00:00Z
  tags: !!python/list [tag1, tag2]
"""

# SafeLoader - only basic types, ignores Python-specific tags
try:
    data_safe = yaml.load(yaml_content, yaml.SafeLoader)
    print(f"birth_date type: {type(data_safe['birth_date'])}")  # str
    print(f"tags: {data_safe['metadata'].get('tags', 'Missing')}")  # Missing
except yaml.ConstructorError as e:
    print(f"SafeLoader error: {e}")

# FullLoader - more types but still restricted
data_full = yaml.load(yaml_content, yaml.FullLoader)
print(f"birth_date type: {type(data_full['birth_date'])}")  # datetime.date
print(f"created type: {type(data_full['metadata']['created'])}")  # datetime.datetime

# UnsafeLoader - can handle Python-specific tags (dangerous!)
data_unsafe = yaml.load(yaml_content, yaml.UnsafeLoader)
print(f"tags type: {type(data_unsafe['metadata']['tags'])}")  # list

Performance with C Extensions

import yaml
import time

large_data = {'items': [{'id': i, 'value': f'item_{i}'} for i in range(10000)]}

# Check if C extensions are available
if yaml.__with_libyaml__:
    print("LibYAML C extensions available")
    
    # Benchmark Python vs C dumping
    start = time.time()
    yaml_py = yaml.dump(large_data, Dumper=yaml.Dumper)
    py_time = time.time() - start
    
    start = time.time()
    yaml_c = yaml.dump(large_data, Dumper=yaml.CDumper)
    c_time = time.time() - start
    
    print(f"Python dumper: {py_time:.3f}s")
    print(f"C dumper: {c_time:.3f}s")
    print(f"Speedup: {py_time/c_time:.1f}x")
    
    # Benchmark loading
    start = time.time()
    data_py = yaml.load(yaml_c, Loader=yaml.Loader)
    py_load_time = time.time() - start
    
    start = time.time()
    data_c = yaml.load(yaml_c, Loader=yaml.CLoader)
    c_load_time = time.time() - start
    
    print(f"Python loader: {py_load_time:.3f}s")
    print(f"C loader: {c_load_time:.3f}s")
    print(f"Load speedup: {py_load_time/c_load_time:.1f}x")
else:
    print("LibYAML C extensions not available")

Creating Custom Loaders and Dumpers

import yaml
from datetime import datetime

# Custom loader with additional constructor
class CustomLoader(yaml.SafeLoader):
    pass

def timestamp_constructor(loader, node):
    """Custom constructor for timestamp format."""
    value = loader.construct_scalar(node)
    return datetime.fromisoformat(value.replace('Z', '+00:00'))

# Register custom constructor
CustomLoader.add_constructor('!timestamp', timestamp_constructor)

# Custom dumper with additional representer
class CustomDumper(yaml.SafeDumper):
    pass

def timestamp_representer(dumper, data):
    """Custom representer for datetime objects."""
    return dumper.represent_scalar('!timestamp', data.isoformat() + 'Z')

# Register custom representer
CustomDumper.add_representer(datetime, timestamp_representer)

# Usage
yaml_with_custom = """
created: !timestamp 2023-01-01T10:00:00Z
updated: !timestamp 2023-12-15T14:30:00Z
"""

data = yaml.load(yaml_with_custom, CustomLoader)
print(f"Created: {data['created']} ({type(data['created'])})")

# Dump back with custom format
output = yaml.dump(data, CustomDumper)
print(output)

Security Comparison

LoaderSecurityFeaturesUse Cases
SafeLoaderHighestBasic types onlyUntrusted input, config files
FullLoaderHighMost types, restrictedSemi-trusted input, data exchange
Loader/UnsafeLoaderNoneAll featuresTrusted input, object persistence

Type Support by Loader

Python TypeSafeLoaderFullLoaderLoader/UnsafeLoader
str, int, float, bool, None
list, dict
datetime.date
datetime.datetime
set, tuple
Arbitrary Python objects
Function calls

Component Architecture

Loaders and dumpers are composed of multiple processing components:

Loader Components

  • Reader: Input stream handling and encoding detection
  • Scanner: Tokenization (character stream → tokens)
  • Parser: Syntax analysis (tokens → events)
  • Composer: Tree building (events → representation nodes)
  • Constructor: Object construction (nodes → Python objects)
  • Resolver: Tag resolution and type detection

Dumper Components

  • Representer: Object representation (Python objects → nodes)
  • Serializer: Tree serialization (nodes → events)
  • Emitter: Text generation (events → YAML text)
  • Resolver: Tag resolution for output

Inheritance Hierarchy

# Example of how loaders combine components
class SafeLoader(
    Reader,          # Input handling
    Scanner,         # Tokenization  
    Parser,          # Parsing
    Composer,        # Tree composition
    SafeConstructor, # Safe object construction
    Resolver         # Tag resolution
):
    pass

This modular design allows for:

  • Easy customization by inheriting and overriding specific components
  • Mix-and-match functionality from different security levels
  • Adding custom constructors and representers
  • Fine-grained control over processing pipeline

Best Practices

Security Guidelines

  1. Default to SafeLoader for any external input
  2. Use FullLoader for internal configuration with known structure
  3. Only use Loader/UnsafeLoader with completely trusted input
  4. Never use unsafe loaders with user-provided data

Performance Guidelines

  1. Use C extensions when available for large documents
  2. Choose appropriate loader - don't use more features than needed
  3. Stream processing for very large documents
  4. Reuse loader instances when processing multiple similar documents

Compatibility Guidelines

  1. Use SafeDumper output for maximum compatibility
  2. Avoid Python-specific tags in exchanged data
  3. Test with different parsers if targeting non-Python consumers
  4. Document loader requirements when distributing YAML files

Install with Tessl CLI

npx tessl i tessl/pypi-pyyaml

docs

customization.md

dumping-serialization.md

error-handling.md

index.md

loaders-dumpers.md

loading-parsing.md

safe-operations.md

tile.json