or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

advanced-remapping.mdcore-operations.mddtype-management.mdindex.mdmasking.mdmemory-layout.mdsearch-analysis.mdserialization.md

serialization.mddocs/

0

# Serialization

1

2

Efficient binary serialization of chunked arrays for storage and network transfer. This functionality provides optimized methods for converting array data to binary format with chunking support, essential for large-scale data processing and distributed computing workflows.

3

4

## Capabilities

5

6

### Chunked Binary Serialization

7

8

Compute binary representation of an image divided into a grid of cutouts, with optimized performance for specific memory layouts.

9

10

```python { .api }

11

def tobytes(

12

image: NDArray,

13

chunk_size: tuple[int, int, int],

14

order: str = "C"

15

) -> list[bytes]:

16

"""

17

Compute bytes with image divided into grid of cutouts.

18

19

Args:

20

image: Input image array

21

chunk_size: Size of each chunk (x, y, z)

22

order: Memory order ("C" or "F", default: "C")

23

24

Returns:

25

Resultant binaries indexed by gridpoint in fortran order

26

"""

27

```

28

29

**Usage Example:**

30

31

```python

32

import fastremap

33

import numpy as np

34

35

# Create a sample 3D image

36

image = np.random.randint(0, 255, size=(128, 128, 64), dtype=np.uint8)

37

38

# Divide into 64x64x64 chunks and serialize

39

chunk_size = (64, 64, 64)

40

binaries = fastremap.tobytes(image, chunk_size, order="C")

41

42

# Result is a list of bytes objects

43

print(f"Number of chunks: {len(binaries)}")

44

print(f"First chunk size: {len(binaries[0])} bytes")

45

46

# For Fortran-ordered output

47

binaries_f = fastremap.tobytes(image, chunk_size, order="F")

48

```

49

50

### Performance Optimization

51

52

The `tobytes` function is significantly optimized for specific conditions:

53

54

- **Matching memory layout**: When the input image is F-contiguous and F order is requested, or C-contiguous and C order is requested

55

- **Large images**: Performance benefits are most pronounced when the image is larger than a single chunk

56

- **Efficient chunking**: Avoids the overhead of iterating and calling `tobytes` on each chunk individually

57

58

**Performance Example:**

59

60

```python

61

import fastremap

62

import numpy as np

63

import time

64

65

# Large Fortran-ordered image

66

large_image = np.random.random((512, 512, 256)).astype(np.float32, order='F')

67

chunk_size = (64, 64, 64)

68

69

# Optimized fastremap approach

70

start = time.time()

71

fast_chunks = fastremap.tobytes(large_image, chunk_size, order="F")

72

fast_time = time.time() - start

73

74

# Manual chunking approach (for comparison)

75

start = time.time()

76

manual_chunks = []

77

for z in range(0, 256, 64):

78

for y in range(0, 512, 64):

79

for x in range(0, 512, 64):

80

chunk = large_image[x:x+64, y:y+64, z:z+64]

81

manual_chunks.append(chunk.tobytes(order='F'))

82

manual_time = time.time() - start

83

84

print(f"fastremap time: {fast_time:.3f}s")

85

print(f"Manual time: {manual_time:.3f}s")

86

print(f"Speedup: {manual_time/fast_time:.1f}x faster")

87

```

88

89

### Use Cases

90

91

#### Distributed Computing

92

93

```python

94

import fastremap

95

import numpy as np

96

97

# Prepare large dataset for distributed processing

98

dataset = np.random.random((1024, 1024, 512)).astype(np.float32)

99

100

# Chunk into manageable pieces for worker nodes

101

chunk_size = (128, 128, 128)

102

chunks = fastremap.tobytes(dataset, chunk_size, order="C")

103

104

# Each chunk can now be sent to different worker processes

105

for i, chunk_data in enumerate(chunks):

106

# Send chunk_data to worker i

107

# worker_pool.submit(process_chunk, chunk_data, i)

108

pass

109

```

110

111

#### Efficient Storage

112

113

```python

114

import fastremap

115

import numpy as np

116

import pickle

117

118

# Large scientific dataset

119

data = np.random.random((2048, 2048, 1024)).astype(np.float32)

120

121

# Chunk and serialize for efficient storage

122

chunk_size = (256, 256, 256)

123

serialized_chunks = fastremap.tobytes(data, chunk_size, order="F")

124

125

# Store chunks efficiently

126

metadata = {

127

'original_shape': data.shape,

128

'chunk_size': chunk_size,

129

'dtype': data.dtype,

130

'order': 'F',

131

'num_chunks': len(serialized_chunks)

132

}

133

134

# Save metadata and chunks

135

with open('data_metadata.pkl', 'wb') as f:

136

pickle.dump(metadata, f)

137

138

for i, chunk in enumerate(serialized_chunks):

139

with open(f'chunk_{i:04d}.bin', 'wb') as f:

140

f.write(chunk)

141

```

142

143

#### Memory Layout Considerations

144

145

```python

146

import fastremap

147

import numpy as np

148

149

# For C-contiguous arrays, use C order for best performance

150

c_array = np.random.random((100, 200, 300)).astype(np.float32, order='C')

151

c_chunks = fastremap.tobytes(c_array, (50, 50, 50), order="C") # Optimal

152

153

# For F-contiguous arrays, use F order for best performance

154

f_array = np.random.random((100, 200, 300)).astype(np.float32, order='F')

155

f_chunks = fastremap.tobytes(f_array, (50, 50, 50), order="F") # Optimal

156

157

# Mixed orders work but may be slower

158

mixed_chunks = fastremap.tobytes(c_array, (50, 50, 50), order="F") # Suboptimal

159

```

160

161

## Types

162

163

```python { .api }

164

NDArray = np.ndarray

165

```