0
# Serialization
1
2
Efficient binary serialization of chunked arrays for storage and network transfer. This functionality provides optimized methods for converting array data to binary format with chunking support, essential for large-scale data processing and distributed computing workflows.
3
4
## Capabilities
5
6
### Chunked Binary Serialization
7
8
Compute binary representation of an image divided into a grid of cutouts, with optimized performance for specific memory layouts.
9
10
```python { .api }
11
def tobytes(
12
image: NDArray,
13
chunk_size: tuple[int, int, int],
14
order: str = "C"
15
) -> list[bytes]:
16
"""
17
Compute bytes with image divided into grid of cutouts.
18
19
Args:
20
image: Input image array
21
chunk_size: Size of each chunk (x, y, z)
22
order: Memory order ("C" or "F", default: "C")
23
24
Returns:
25
Resultant binaries indexed by gridpoint in fortran order
26
"""
27
```
28
29
**Usage Example:**
30
31
```python
32
import fastremap
33
import numpy as np
34
35
# Create a sample 3D image
36
image = np.random.randint(0, 255, size=(128, 128, 64), dtype=np.uint8)
37
38
# Divide into 64x64x64 chunks and serialize
39
chunk_size = (64, 64, 64)
40
binaries = fastremap.tobytes(image, chunk_size, order="C")
41
42
# Result is a list of bytes objects
43
print(f"Number of chunks: {len(binaries)}")
44
print(f"First chunk size: {len(binaries[0])} bytes")
45
46
# For Fortran-ordered output
47
binaries_f = fastremap.tobytes(image, chunk_size, order="F")
48
```
49
50
### Performance Optimization
51
52
The `tobytes` function is significantly optimized for specific conditions:
53
54
- **Matching memory layout**: When the input image is F-contiguous and F order is requested, or C-contiguous and C order is requested
55
- **Large images**: Performance benefits are most pronounced when the image is larger than a single chunk
56
- **Efficient chunking**: Avoids the overhead of iterating and calling `tobytes` on each chunk individually
57
58
**Performance Example:**
59
60
```python
61
import fastremap
62
import numpy as np
63
import time
64
65
# Large Fortran-ordered image
66
large_image = np.random.random((512, 512, 256)).astype(np.float32, order='F')
67
chunk_size = (64, 64, 64)
68
69
# Optimized fastremap approach
70
start = time.time()
71
fast_chunks = fastremap.tobytes(large_image, chunk_size, order="F")
72
fast_time = time.time() - start
73
74
# Manual chunking approach (for comparison)
75
start = time.time()
76
manual_chunks = []
77
for z in range(0, 256, 64):
78
for y in range(0, 512, 64):
79
for x in range(0, 512, 64):
80
chunk = large_image[x:x+64, y:y+64, z:z+64]
81
manual_chunks.append(chunk.tobytes(order='F'))
82
manual_time = time.time() - start
83
84
print(f"fastremap time: {fast_time:.3f}s")
85
print(f"Manual time: {manual_time:.3f}s")
86
print(f"Speedup: {manual_time/fast_time:.1f}x faster")
87
```
88
89
### Use Cases
90
91
#### Distributed Computing
92
93
```python
94
import fastremap
95
import numpy as np
96
97
# Prepare large dataset for distributed processing
98
dataset = np.random.random((1024, 1024, 512)).astype(np.float32)
99
100
# Chunk into manageable pieces for worker nodes
101
chunk_size = (128, 128, 128)
102
chunks = fastremap.tobytes(dataset, chunk_size, order="C")
103
104
# Each chunk can now be sent to different worker processes
105
for i, chunk_data in enumerate(chunks):
106
# Send chunk_data to worker i
107
# worker_pool.submit(process_chunk, chunk_data, i)
108
pass
109
```
110
111
#### Efficient Storage
112
113
```python
114
import fastremap
115
import numpy as np
116
import pickle
117
118
# Large scientific dataset
119
data = np.random.random((2048, 2048, 1024)).astype(np.float32)
120
121
# Chunk and serialize for efficient storage
122
chunk_size = (256, 256, 256)
123
serialized_chunks = fastremap.tobytes(data, chunk_size, order="F")
124
125
# Store chunks efficiently
126
metadata = {
127
'original_shape': data.shape,
128
'chunk_size': chunk_size,
129
'dtype': data.dtype,
130
'order': 'F',
131
'num_chunks': len(serialized_chunks)
132
}
133
134
# Save metadata and chunks
135
with open('data_metadata.pkl', 'wb') as f:
136
pickle.dump(metadata, f)
137
138
for i, chunk in enumerate(serialized_chunks):
139
with open(f'chunk_{i:04d}.bin', 'wb') as f:
140
f.write(chunk)
141
```
142
143
#### Memory Layout Considerations
144
145
```python
146
import fastremap
147
import numpy as np
148
149
# For C-contiguous arrays, use C order for best performance
150
c_array = np.random.random((100, 200, 300)).astype(np.float32, order='C')
151
c_chunks = fastremap.tobytes(c_array, (50, 50, 50), order="C") # Optimal
152
153
# For F-contiguous arrays, use F order for best performance
154
f_array = np.random.random((100, 200, 300)).astype(np.float32, order='F')
155
f_chunks = fastremap.tobytes(f_array, (50, 50, 50), order="F") # Optimal
156
157
# Mixed orders work but may be slower
158
mixed_chunks = fastremap.tobytes(c_array, (50, 50, 50), order="F") # Suboptimal
159
```
160
161
## Types
162
163
```python { .api }
164
NDArray = np.ndarray
165
```