or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

collections-utilities.mdconfiguration.mdcore-serialization.mdcustom-serializers.mdindex.mdmemory-management.mdsecurity.mdtype-system.md
tile.json

tessl/pypi-pyfury

Blazingly fast multi-language serialization framework powered by JIT and zero-copy

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/pyfury@0.10.x

To install, run

npx @tessl/cli install tessl/pypi-pyfury@0.10.0

index.mddocs/

PyFury

PyFury is the Python implementation of Apache Fury, a blazingly fast multi-language serialization framework powered by JIT compilation and zero-copy techniques. PyFury provides high-performance serialization for Python objects with support for cross-language compatibility, reference tracking, and row format operations.

Package Information

  • Package Name: pyfury
  • Package Type: pypi
  • Language: Python
  • Installation: pip install pyfury

Core Imports

import pyfury
from pyfury import Fury, Language

For row format operations:

from pyfury.format import encoder, RowData

Basic Usage

import pyfury

# Create Fury instance
fury = pyfury.Fury(ref_tracking=True)

# Register classes for cross-language serialization
fury.register_class(SomeClass, type_tag="example.SomeClass")

# Serialize object
obj = SomeClass()
bytes_data = fury.serialize(obj)

# Deserialize object
restored_obj = fury.deserialize(bytes_data)

# Cross-language serialization
xlang_fury = pyfury.Fury(language=pyfury.Language.XLANG, ref_tracking=True)
xlang_fury.register_class(SomeClass, type_tag="example.SomeClass")
xlang_bytes = xlang_fury.serialize(obj)

# Row format encoding for zero-copy operations
from dataclasses import dataclass

@dataclass
class DataObject:
    field1: int
    field2: str

row_encoder = pyfury.encoder(DataObject)
data_obj = DataObject(field1=42, field2="hello")
row_data = row_encoder.to_row(data_obj)

Architecture

PyFury is built around several key components:

  • Fury Engine: Core Python serialization engine with configurable language modes
  • Language Support: Python-native and cross-language (XLANG) serialization modes
  • Reference Tracking: Optional circular reference and shared object support
  • Class Registration: Security-focused type system with allowlists
  • Serializer Framework: Extensible system for custom types and built-in Python types
  • Row Format: Zero-copy columnar data format with Arrow integration
  • Buffer Management: Efficient binary I/O with memory buffer pooling
  • Meta Strings: Optimized string encoding and meta compression

Capabilities

Core Serialization

Primary serialization operations for converting Python objects to/from binary format with optional reference tracking and circular reference support.

class Fury:
    def __init__(
        self,
        language: Language = Language.XLANG,
        ref_tracking: bool = False,
        require_class_registration: bool = True,
    ): ...
    
    def serialize(
        self,
        obj,
        buffer: Buffer = None,
        buffer_callback=None,
        unsupported_callback=None,
    ) -> Union[Buffer, bytes]: ...
    
    def deserialize(
        self,
        buffer: Union[Buffer, bytes],
        buffers: Iterable = None,
        unsupported_objects: Iterable = None,
    ): ...

Core Serialization

Class Registration and Type System

Type registration system for security and cross-language compatibility with support for custom type tags and class IDs.

class Fury:
    def register_class(
        self, 
        cls, 
        *, 
        class_id: int = None, 
        type_tag: str = None
    ): ...
    
    def register_serializer(self, cls: type, serializer): ...

class Language(enum.Enum):
    XLANG = 0
    JAVA = 1

class OpaqueObject:
    def __init__(self, type_id: int, data: bytes): ...

Type System

Row Format and Arrow Integration

Zero-copy row format encoding with PyArrow integration for efficient columnar data operations and cross-language data exchange.

@dataclass
class RowData:
    def __init__(self, schema, data: bytes): ...

def encoder(cls_or_schema):
    """Create row encoder for a dataclass or Arrow schema."""

class Encoder:
    def to_row(self, obj) -> RowData: ...
    def from_row(self, row_data: RowData): ...
    @property
    def schema(self): ...

class ArrowWriter:
    def write(self, obj): ...

Row Format

Serializer Framework

Extensible serializer system for handling custom types, built-in Python types, and cross-language compatible serialization.

class Serializer:
    def __init__(self, fury, cls): ...
    def write(self, buffer, value): ...
    def read(self, buffer): ...

class CrossLanguageCompatibleSerializer(Serializer):
    """Base class for serializers that support cross-language serialization."""

class BufferObject:
    """Interface for objects that can provide buffer data."""
    def to_buffer(self) -> bytes: ...

Custom Serializers

Buffer and Memory Management

High-performance binary buffer operations with efficient memory allocation and platform-specific optimizations.

class Buffer:
    @staticmethod
    def allocate(size: int) -> Buffer: ...
    
    def write_byte(self, value: int): ...
    def read_byte(self) -> int: ...
    def write_int32(self, value: int): ...
    def read_int32(self) -> int: ...
    def write_int64(self, value: int): ...
    def read_int64(self) -> int: ...
    def write_bytes(self, data: bytes): ...
    def read_bytes(self, length: int) -> bytes: ...

Memory Management

Exception Handling

class FuryError(Exception):
    """Base exception for PyFury operations."""

class ClassNotCompatibleError(FuryError):
    """Raised when class compatibility checks fail."""

class CompileError(FuryError):
    """Raised when code generation/compilation fails."""

Performance Considerations

  • Reference Tracking: Enable only when dealing with circular references or shared objects
  • Class Registration: Required by default for security; impacts initial setup time but improves runtime performance
  • Language Mode: Use Language.XLANG for cross-language compatibility, Python mode for pure Python scenarios
  • Buffer Reuse: Reuse Buffer instances across serialization operations for optimal performance
  • Row Format: Use for zero-copy operations and efficient columnar data processing
  • Serializer Registration: Pre-register custom serializers to avoid runtime overhead

Security Considerations

PyFury enables class registration by default to prevent deserialization of untrusted classes. When require_class_registration=False, PyFury can deserialize arbitrary Python objects, which may execute malicious code through __init__, __new__, __eq__, or __hash__ methods. Only disable class registration in trusted environments.