CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/pypi-pyfury

Blazingly fast multi-language serialization framework powered by JIT and zero-copy

Pending
Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Pending

The risk profile of this skill

Overview
Eval results
Files

PyFury

PyFury is the Python implementation of Apache Fury, a blazingly fast multi-language serialization framework powered by JIT compilation and zero-copy techniques. PyFury provides high-performance serialization for Python objects with support for cross-language compatibility, reference tracking, and row format operations.

Package Information

  • Package Name: pyfury
  • Package Type: pypi
  • Language: Python
  • Installation: pip install pyfury

Core Imports

import pyfury
from pyfury import Fury, Language

For row format operations:

from pyfury.format import encoder, RowData

Basic Usage

import pyfury

# Create Fury instance
fury = pyfury.Fury(ref_tracking=True)

# Register classes for cross-language serialization
fury.register_class(SomeClass, type_tag="example.SomeClass")

# Serialize object
obj = SomeClass()
bytes_data = fury.serialize(obj)

# Deserialize object
restored_obj = fury.deserialize(bytes_data)

# Cross-language serialization
xlang_fury = pyfury.Fury(language=pyfury.Language.XLANG, ref_tracking=True)
xlang_fury.register_class(SomeClass, type_tag="example.SomeClass")
xlang_bytes = xlang_fury.serialize(obj)

# Row format encoding for zero-copy operations
from dataclasses import dataclass

@dataclass
class DataObject:
    field1: int
    field2: str

row_encoder = pyfury.encoder(DataObject)
data_obj = DataObject(field1=42, field2="hello")
row_data = row_encoder.to_row(data_obj)

Architecture

PyFury is built around several key components:

  • Fury Engine: Core Python serialization engine with configurable language modes
  • Language Support: Python-native and cross-language (XLANG) serialization modes
  • Reference Tracking: Optional circular reference and shared object support
  • Class Registration: Security-focused type system with allowlists
  • Serializer Framework: Extensible system for custom types and built-in Python types
  • Row Format: Zero-copy columnar data format with Arrow integration
  • Buffer Management: Efficient binary I/O with memory buffer pooling
  • Meta Strings: Optimized string encoding and meta compression

Capabilities

Core Serialization

Primary serialization operations for converting Python objects to/from binary format with optional reference tracking and circular reference support.

class Fury:
    def __init__(
        self,
        language: Language = Language.XLANG,
        ref_tracking: bool = False,
        require_class_registration: bool = True,
    ): ...
    
    def serialize(
        self,
        obj,
        buffer: Buffer = None,
        buffer_callback=None,
        unsupported_callback=None,
    ) -> Union[Buffer, bytes]: ...
    
    def deserialize(
        self,
        buffer: Union[Buffer, bytes],
        buffers: Iterable = None,
        unsupported_objects: Iterable = None,
    ): ...

Core Serialization

Class Registration and Type System

Type registration system for security and cross-language compatibility with support for custom type tags and class IDs.

class Fury:
    def register_class(
        self, 
        cls, 
        *, 
        class_id: int = None, 
        type_tag: str = None
    ): ...
    
    def register_serializer(self, cls: type, serializer): ...

class Language(enum.Enum):
    XLANG = 0
    JAVA = 1

class OpaqueObject:
    def __init__(self, type_id: int, data: bytes): ...

Type System

Row Format and Arrow Integration

Zero-copy row format encoding with PyArrow integration for efficient columnar data operations and cross-language data exchange.

@dataclass
class RowData:
    def __init__(self, schema, data: bytes): ...

def encoder(cls_or_schema):
    """Create row encoder for a dataclass or Arrow schema."""

class Encoder:
    def to_row(self, obj) -> RowData: ...
    def from_row(self, row_data: RowData): ...
    @property
    def schema(self): ...

class ArrowWriter:
    def write(self, obj): ...

Serializer Framework

Extensible serializer system for handling custom types, built-in Python types, and cross-language compatible serialization.

class Serializer:
    def __init__(self, fury, cls): ...
    def write(self, buffer, value): ...
    def read(self, buffer): ...

class CrossLanguageCompatibleSerializer(Serializer):
    """Base class for serializers that support cross-language serialization."""

class BufferObject:
    """Interface for objects that can provide buffer data."""
    def to_buffer(self) -> bytes: ...

Custom Serializers

Buffer and Memory Management

High-performance binary buffer operations with efficient memory allocation and platform-specific optimizations.

class Buffer:
    @staticmethod
    def allocate(size: int) -> Buffer: ...
    
    def write_byte(self, value: int): ...
    def read_byte(self) -> int: ...
    def write_int32(self, value: int): ...
    def read_int32(self) -> int: ...
    def write_int64(self, value: int): ...
    def read_int64(self) -> int: ...
    def write_bytes(self, data: bytes): ...
    def read_bytes(self, length: int) -> bytes: ...

Memory Management

Exception Handling

class FuryError(Exception):
    """Base exception for PyFury operations."""

class ClassNotCompatibleError(FuryError):
    """Raised when class compatibility checks fail."""

class CompileError(FuryError):
    """Raised when code generation/compilation fails."""

Performance Considerations

  • Reference Tracking: Enable only when dealing with circular references or shared objects
  • Class Registration: Required by default for security; impacts initial setup time but improves runtime performance
  • Language Mode: Use Language.XLANG for cross-language compatibility, Python mode for pure Python scenarios
  • Buffer Reuse: Reuse Buffer instances across serialization operations for optimal performance
  • Row Format: Use for zero-copy operations and efficient columnar data processing
  • Serializer Registration: Pre-register custom serializers to avoid runtime overhead

Security Considerations

PyFury enables class registration by default to prevent deserialization of untrusted classes. When require_class_registration=False, PyFury can deserialize arbitrary Python objects, which may execute malicious code through __init__, __new__, __eq__, or __hash__ methods. Only disable class registration in trusted environments.

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/pyfury@0.10.x
Publish Source
CLI
Badge
tessl/pypi-pyfury badge