0
# Buffer Operations
1
2
Advanced buffer management for zero-copy operations, efficient batch processing, and high-performance data handling in compression and decompression workflows.
3
4
## Capabilities
5
6
### Buffer Segments
7
8
Individual buffer segments that provide efficient access to portions of larger buffers without copying data.
9
10
```python { .api }
11
class BufferSegment:
12
@property
13
def offset(self) -> int:
14
"""Offset of this segment within the parent buffer."""
15
16
def __len__(self) -> int:
17
"""Get segment length in bytes."""
18
19
def tobytes(self) -> bytes:
20
"""
21
Convert segment to bytes.
22
23
Returns:
24
bytes: Copy of segment data
25
"""
26
```
27
28
**Usage Example:**
29
30
```python
31
import zstandard as zstd
32
33
# Buffer segments are typically returned by compression operations
34
compressor = zstd.ZstdCompressor()
35
result = compressor.multi_compress_to_buffer([b"data1", b"data2", b"data3"])
36
37
# Access individual segments
38
for i, segment in enumerate(result):
39
print(f"Segment {i}: offset={segment.offset}, length={len(segment)}")
40
data = segment.tobytes()
41
process_data(data)
42
```
43
44
### Buffer Collections
45
46
Collections of buffer segments that provide efficient iteration and access patterns.
47
48
```python { .api }
49
class BufferSegments:
50
def __len__(self) -> int:
51
"""Get number of segments in collection."""
52
53
def __getitem__(self, i: int) -> BufferSegment:
54
"""
55
Get segment by index.
56
57
Parameters:
58
- i: int, segment index
59
60
Returns:
61
BufferSegment: Segment at index
62
"""
63
```
64
65
**Usage Example:**
66
67
```python
68
import zstandard as zstd
69
70
# BufferSegments collections are returned by some operations
71
compressor = zstd.ZstdCompressor()
72
result = compressor.multi_compress_to_buffer([b"data1", b"data2"])
73
74
# Iterate over segments
75
for segment in result:
76
data = segment.tobytes()
77
print(f"Segment data: {len(data)} bytes")
78
79
# Access by index
80
first_segment = result[0]
81
second_segment = result[1]
82
```
83
84
### Buffers with Segments
85
86
Buffers that contain multiple segments, providing both the raw data and segment boundary information.
87
88
```python { .api }
89
class BufferWithSegments:
90
@property
91
def size(self) -> int:
92
"""Total buffer size in bytes."""
93
94
def __init__(self, data: bytes, segments: bytes):
95
"""
96
Create buffer with segment information.
97
98
Parameters:
99
- data: bytes, raw buffer data
100
- segments: bytes, segment boundary information
101
"""
102
103
def __len__(self) -> int:
104
"""Get number of segments."""
105
106
def __getitem__(self, i: int) -> BufferSegment:
107
"""
108
Get segment by index.
109
110
Parameters:
111
- i: int, segment index
112
113
Returns:
114
BufferSegment: Segment at index
115
"""
116
117
def segments(self):
118
"""Get segments iterator."""
119
120
def tobytes(self) -> bytes:
121
"""
122
Convert entire buffer to bytes.
123
124
Returns:
125
bytes: Complete buffer data
126
"""
127
```
128
129
**Usage Example:**
130
131
```python
132
import zstandard as zstd
133
134
# Create buffer with segments manually (advanced usage)
135
data = b"concatenated data from multiple sources"
136
# segments contains boundary information (format is internal)
137
segments = b"..." # segment boundary data
138
139
buffer = zstd.BufferWithSegments(data, segments)
140
141
print(f"Buffer size: {buffer.size} bytes")
142
print(f"Number of segments: {len(buffer)}")
143
144
# Access segments
145
for i in range(len(buffer)):
146
segment = buffer[i]
147
segment_data = segment.tobytes()
148
print(f"Segment {i}: {len(segment_data)} bytes")
149
150
# Get all data
151
all_data = buffer.tobytes()
152
```
153
154
### Buffer Collections
155
156
Collections of multiple buffers with segments, used for batch operations and efficient data management.
157
158
```python { .api }
159
class BufferWithSegmentsCollection:
160
def __init__(self, *args):
161
"""
162
Create collection of buffers with segments.
163
164
Parameters:
165
- *args: BufferWithSegments objects
166
"""
167
168
def __len__(self) -> int:
169
"""Get number of buffers in collection."""
170
171
def __getitem__(self, i: int) -> BufferSegment:
172
"""
173
Get segment by global index across all buffers.
174
175
Parameters:
176
- i: int, global segment index
177
178
Returns:
179
BufferSegment: Segment at index
180
"""
181
182
def size(self) -> int:
183
"""
184
Get total size of all buffers.
185
186
Returns:
187
int: Total size in bytes
188
"""
189
```
190
191
**Usage Example:**
192
193
```python
194
import zstandard as zstd
195
196
# Collections are typically returned by multi-threaded operations
197
compressor = zstd.ZstdCompressor()
198
data_items = [b"item1", b"item2", b"item3", b"item4"]
199
200
# Multi-compress returns a collection
201
collection = compressor.multi_compress_to_buffer(data_items, threads=2)
202
203
print(f"Collection size: {collection.size()} bytes")
204
print(f"Number of items: {len(collection)}")
205
206
# Access compressed items
207
for i in range(len(collection)):
208
segment = collection[i]
209
compressed_data = segment.tobytes()
210
print(f"Item {i}: {len(compressed_data)} bytes compressed")
211
```
212
213
### Batch Compression with Buffers
214
215
Efficient batch compression that returns results in buffer collections for optimal memory usage.
216
217
```python { .api }
218
class ZstdCompressor:
219
def multi_compress_to_buffer(
220
self,
221
data,
222
threads: int = 0
223
) -> BufferWithSegmentsCollection:
224
"""
225
Compress multiple data items to buffer collection.
226
227
Parameters:
228
- data: list[bytes], BufferWithSegments, or BufferWithSegmentsCollection
229
- threads: int, number of threads (0 = auto)
230
231
Returns:
232
BufferWithSegmentsCollection: Compressed data in buffer collection
233
"""
234
```
235
236
**Usage Example:**
237
238
```python
239
import zstandard as zstd
240
241
compressor = zstd.ZstdCompressor(level=5)
242
243
# Prepare data for batch compression
244
documents = [
245
b'{"id": 1, "text": "First document"}',
246
b'{"id": 2, "text": "Second document"}',
247
b'{"id": 3, "text": "Third document"}',
248
b'{"id": 4, "text": "Fourth document"}'
249
]
250
251
# Compress in parallel
252
result = compressor.multi_compress_to_buffer(documents, threads=4)
253
254
# Process results efficiently
255
total_original = sum(len(doc) for doc in documents)
256
total_compressed = result.size()
257
258
print(f"Compressed {total_original} bytes to {total_compressed} bytes")
259
print(f"Compression ratio: {total_original/total_compressed:.2f}:1")
260
261
# Extract individual compressed documents
262
compressed_docs = []
263
for i in range(len(result)):
264
segment = result[i]
265
compressed_docs.append(segment.tobytes())
266
```
267
268
### Batch Decompression with Buffers
269
270
Efficient batch decompression using buffer collections for high-throughput processing.
271
272
```python { .api }
273
class ZstdDecompressor:
274
def multi_decompress_to_buffer(
275
self,
276
frames,
277
decompressed_sizes: bytes = b"",
278
threads: int = 0
279
) -> BufferWithSegmentsCollection:
280
"""
281
Decompress multiple frames to buffer collection.
282
283
Parameters:
284
- frames: list[bytes], BufferWithSegments, or BufferWithSegmentsCollection
285
- decompressed_sizes: bytes, expected decompressed sizes (optional optimization)
286
- threads: int, number of threads (0 = auto)
287
288
Returns:
289
BufferWithSegmentsCollection: Decompressed data in buffer collection
290
"""
291
```
292
293
**Usage Example:**
294
295
```python
296
import zstandard as zstd
297
298
decompressor = zstd.ZstdDecompressor()
299
300
# Compressed frames from previous example
301
compressed_frames = compressed_docs
302
303
# Decompress in parallel
304
result = decompressor.multi_decompress_to_buffer(compressed_frames, threads=4)
305
306
print(f"Decompressed {len(compressed_frames)} frames")
307
print(f"Total decompressed size: {result.size()} bytes")
308
309
# Extract decompressed data
310
decompressed_docs = []
311
for i in range(len(result)):
312
segment = result[i]
313
decompressed_docs.append(segment.tobytes())
314
315
# Verify round-trip
316
for i, (original, decompressed) in enumerate(zip(documents, decompressed_docs)):
317
assert original == decompressed, f"Mismatch in document {i}"
318
```
319
320
### Zero-Copy Operations
321
322
Advanced usage patterns that minimize memory copying for maximum performance.
323
324
**Usage Example:**
325
326
```python
327
import zstandard as zstd
328
329
def process_large_dataset(data_items):
330
"""Process large dataset with minimal memory copying."""
331
compressor = zstd.ZstdCompressor(level=3)
332
333
# Compress in batches to manage memory
334
batch_size = 1000
335
all_results = []
336
337
for i in range(0, len(data_items), batch_size):
338
batch = data_items[i:i+batch_size]
339
340
# Multi-compress returns BufferWithSegmentsCollection
341
compressed_batch = compressor.multi_compress_to_buffer(batch, threads=4)
342
343
# Process segments without copying unless necessary
344
for j in range(len(compressed_batch)):
345
segment = compressed_batch[j]
346
347
# Only copy if we need to persist the data
348
if need_to_store(j):
349
data = segment.tobytes()
350
store_data(i + j, data)
351
else:
352
# Use segment directly for temporary operations
353
process_segment_in_place(segment)
354
355
return all_results
356
357
def stream_compress_with_buffers(input_stream, output_stream):
358
"""Stream compression using buffers for efficiency."""
359
compressor = zstd.ZstdCompressor()
360
361
# Read chunks and compress in batches
362
chunks = []
363
chunk_size = 64 * 1024 # 64KB chunks
364
365
while True:
366
chunk = input_stream.read(chunk_size)
367
if not chunk:
368
break
369
370
chunks.append(chunk)
371
372
# Process in batches of 100 chunks
373
if len(chunks) >= 100:
374
result = compressor.multi_compress_to_buffer(chunks, threads=2)
375
376
# Write compressed data
377
for i in range(len(result)):
378
segment = result[i]
379
output_stream.write(segment.tobytes())
380
381
chunks = []
382
383
# Process remaining chunks
384
if chunks:
385
result = compressor.multi_compress_to_buffer(chunks, threads=2)
386
for i in range(len(result)):
387
segment = result[i]
388
output_stream.write(segment.tobytes())
389
```
390
391
### Memory Management
392
393
Buffer operations provide efficient memory usage patterns for high-performance applications.
394
395
**Memory Usage Example:**
396
397
```python
398
import zstandard as zstd
399
400
def analyze_buffer_memory():
401
"""Analyze memory usage of buffer operations."""
402
compressor = zstd.ZstdCompressor()
403
404
# Large dataset
405
data = [b"x" * 1024 for _ in range(1000)] # 1000 x 1KB items
406
407
print(f"Original data: {sum(len(item) for item in data)} bytes")
408
print(f"Compressor memory: {compressor.memory_size()} bytes")
409
410
# Compress to buffer collection
411
result = compressor.multi_compress_to_buffer(data, threads=4)
412
413
print(f"Compressed size: {result.size()} bytes")
414
print(f"Number of segments: {len(result)}")
415
416
# Efficient iteration without copying
417
for i, segment in enumerate(result):
418
# segment.tobytes() copies data - avoid if possible
419
size = len(segment) # No copy required
420
offset = segment.offset # No copy required
421
422
if i < 5: # Show first few
423
print(f"Segment {i}: size={size}, offset={offset}")
424
```
425
426
## Performance Considerations
427
428
- Buffer operations minimize memory copying for better performance
429
- Multi-threaded operations return buffer collections for efficient parallel processing
430
- Segments provide zero-copy access to portions of larger buffers
431
- Use `tobytes()` only when you need a copy of the data
432
- Buffer collections enable efficient batch processing of large datasets
433
- Memory usage is optimized for high-throughput scenarios