0
# PyFury
1
2
PyFury is the Python implementation of Apache Fury, a blazingly fast multi-language serialization framework powered by JIT compilation and zero-copy techniques. PyFury provides high-performance serialization for Python objects with support for cross-language compatibility, reference tracking, and row format operations.
3
4
## Package Information
5
6
- **Package Name**: pyfury
7
- **Package Type**: pypi
8
- **Language**: Python
9
- **Installation**: `pip install pyfury`
10
11
## Core Imports
12
13
```python
14
import pyfury
15
from pyfury import Fury, Language
16
```
17
18
For row format operations:
19
20
```python
21
from pyfury.format import encoder, RowData
22
```
23
24
## Basic Usage
25
26
```python
27
import pyfury
28
29
# Create Fury instance
30
fury = pyfury.Fury(ref_tracking=True)
31
32
# Register classes for cross-language serialization
33
fury.register_class(SomeClass, type_tag="example.SomeClass")
34
35
# Serialize object
36
obj = SomeClass()
37
bytes_data = fury.serialize(obj)
38
39
# Deserialize object
40
restored_obj = fury.deserialize(bytes_data)
41
42
# Cross-language serialization
43
xlang_fury = pyfury.Fury(language=pyfury.Language.XLANG, ref_tracking=True)
44
xlang_fury.register_class(SomeClass, type_tag="example.SomeClass")
45
xlang_bytes = xlang_fury.serialize(obj)
46
47
# Row format encoding for zero-copy operations
48
from dataclasses import dataclass
49
50
@dataclass
51
class DataObject:
52
field1: int
53
field2: str
54
55
row_encoder = pyfury.encoder(DataObject)
56
data_obj = DataObject(field1=42, field2="hello")
57
row_data = row_encoder.to_row(data_obj)
58
```
59
60
## Architecture
61
62
PyFury is built around several key components:
63
64
- **Fury Engine**: Core Python serialization engine with configurable language modes
65
- **Language Support**: Python-native and cross-language (XLANG) serialization modes
66
- **Reference Tracking**: Optional circular reference and shared object support
67
- **Class Registration**: Security-focused type system with allowlists
68
- **Serializer Framework**: Extensible system for custom types and built-in Python types
69
- **Row Format**: Zero-copy columnar data format with Arrow integration
70
- **Buffer Management**: Efficient binary I/O with memory buffer pooling
71
- **Meta Strings**: Optimized string encoding and meta compression
72
73
## Capabilities
74
75
### Core Serialization
76
77
Primary serialization operations for converting Python objects to/from binary format with optional reference tracking and circular reference support.
78
79
```python { .api }
80
class Fury:
81
def __init__(
82
self,
83
language: Language = Language.XLANG,
84
ref_tracking: bool = False,
85
require_class_registration: bool = True,
86
): ...
87
88
def serialize(
89
self,
90
obj,
91
buffer: Buffer = None,
92
buffer_callback=None,
93
unsupported_callback=None,
94
) -> Union[Buffer, bytes]: ...
95
96
def deserialize(
97
self,
98
buffer: Union[Buffer, bytes],
99
buffers: Iterable = None,
100
unsupported_objects: Iterable = None,
101
): ...
102
```
103
104
[Core Serialization](./core-serialization.md)
105
106
### Class Registration and Type System
107
108
Type registration system for security and cross-language compatibility with support for custom type tags and class IDs.
109
110
```python { .api }
111
class Fury:
112
def register_class(
113
self,
114
cls,
115
*,
116
class_id: int = None,
117
type_tag: str = None
118
): ...
119
120
def register_serializer(self, cls: type, serializer): ...
121
122
class Language(enum.Enum):
123
XLANG = 0
124
JAVA = 1
125
126
class OpaqueObject:
127
def __init__(self, type_id: int, data: bytes): ...
128
```
129
130
[Type System](./type-system.md)
131
132
### Row Format and Arrow Integration
133
134
Zero-copy row format encoding with PyArrow integration for efficient columnar data operations and cross-language data exchange.
135
136
```python { .api }
137
@dataclass
138
class RowData:
139
def __init__(self, schema, data: bytes): ...
140
141
def encoder(cls_or_schema):
142
"""Create row encoder for a dataclass or Arrow schema."""
143
144
class Encoder:
145
def to_row(self, obj) -> RowData: ...
146
def from_row(self, row_data: RowData): ...
147
@property
148
def schema(self): ...
149
150
class ArrowWriter:
151
def write(self, obj): ...
152
```
153
154
[Row Format](./row-format.md)
155
156
### Serializer Framework
157
158
Extensible serializer system for handling custom types, built-in Python types, and cross-language compatible serialization.
159
160
```python { .api }
161
class Serializer:
162
def __init__(self, fury, cls): ...
163
def write(self, buffer, value): ...
164
def read(self, buffer): ...
165
166
class CrossLanguageCompatibleSerializer(Serializer):
167
"""Base class for serializers that support cross-language serialization."""
168
169
class BufferObject:
170
"""Interface for objects that can provide buffer data."""
171
def to_buffer(self) -> bytes: ...
172
```
173
174
[Custom Serializers](./custom-serializers.md)
175
176
### Buffer and Memory Management
177
178
High-performance binary buffer operations with efficient memory allocation and platform-specific optimizations.
179
180
```python { .api }
181
class Buffer:
182
@staticmethod
183
def allocate(size: int) -> Buffer: ...
184
185
def write_byte(self, value: int): ...
186
def read_byte(self) -> int: ...
187
def write_int32(self, value: int): ...
188
def read_int32(self) -> int: ...
189
def write_int64(self, value: int): ...
190
def read_int64(self) -> int: ...
191
def write_bytes(self, data: bytes): ...
192
def read_bytes(self, length: int) -> bytes: ...
193
```
194
195
[Memory Management](./memory-management.md)
196
197
198
## Exception Handling
199
200
```python { .api }
201
class FuryError(Exception):
202
"""Base exception for PyFury operations."""
203
204
class ClassNotCompatibleError(FuryError):
205
"""Raised when class compatibility checks fail."""
206
207
class CompileError(FuryError):
208
"""Raised when code generation/compilation fails."""
209
```
210
211
## Performance Considerations
212
213
- **Reference Tracking**: Enable only when dealing with circular references or shared objects
214
- **Class Registration**: Required by default for security; impacts initial setup time but improves runtime performance
215
- **Language Mode**: Use `Language.XLANG` for cross-language compatibility, Python mode for pure Python scenarios
216
- **Buffer Reuse**: Reuse Buffer instances across serialization operations for optimal performance
217
- **Row Format**: Use for zero-copy operations and efficient columnar data processing
218
- **Serializer Registration**: Pre-register custom serializers to avoid runtime overhead
219
220
## Security Considerations
221
222
PyFury enables class registration by default to prevent deserialization of untrusted classes. When `require_class_registration=False`, PyFury can deserialize arbitrary Python objects, which may execute malicious code through `__init__`, `__new__`, `__eq__`, or `__hash__` methods. Only disable class registration in trusted environments.