Tessl Tile for pypi/zstandard@0.24.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

advanced-compression.md advanced-decompression.md buffer-operations.md dictionary-compression.md frame-analysis.md index.md simple-operations.md

buffer-operations.mddocs/

0
# Buffer Operations
1

2
Advanced buffer management for zero-copy operations, efficient batch processing, and high-performance data handling in compression and decompression workflows.
3

4
## Capabilities
5

6
### Buffer Segments
7

8
Individual buffer segments that provide efficient access to portions of larger buffers without copying data.
9

10
```python { .api }
11
class BufferSegment:
12
    @property
13
    def offset(self) -> int:
14
        """Offset of this segment within the parent buffer."""
15

16
    def __len__(self) -> int:
17
        """Get segment length in bytes."""
18

19
    def tobytes(self) -> bytes:
20
        """
21
        Convert segment to bytes.
22

23
        Returns:
24
        bytes: Copy of segment data
25
        """
26
```
27

28
**Usage Example:**
29

30
```python
31
import zstandard as zstd
32

33
# Buffer segments are typically returned by compression operations
34
compressor = zstd.ZstdCompressor()
35
result = compressor.multi_compress_to_buffer([b"data1", b"data2", b"data3"])
36

37
# Access individual segments
38
for i, segment in enumerate(result):
39
    print(f"Segment {i}: offset={segment.offset}, length={len(segment)}")
40
    data = segment.tobytes()
41
    process_data(data)
42
```
43

44
### Buffer Collections
45

46
Collections of buffer segments that provide efficient iteration and access patterns.
47

48
```python { .api }
49
class BufferSegments:
50
    def __len__(self) -> int:
51
        """Get number of segments in collection."""
52

53
    def __getitem__(self, i: int) -> BufferSegment:
54
        """
55
        Get segment by index.
56

57
        Parameters:
58
        - i: int, segment index
59

60
        Returns:
61
        BufferSegment: Segment at index
62
        """
63
```
64

65
**Usage Example:**
66

67
```python
68
import zstandard as zstd
69

70
# BufferSegments collections are returned by some operations
71
compressor = zstd.ZstdCompressor()
72
result = compressor.multi_compress_to_buffer([b"data1", b"data2"])
73

74
# Iterate over segments
75
for segment in result:
76
    data = segment.tobytes()
77
    print(f"Segment data: {len(data)} bytes")
78

79
# Access by index
80
first_segment = result[0]
81
second_segment = result[1]
82
```
83

84
### Buffers with Segments
85

86
Buffers that contain multiple segments, providing both the raw data and segment boundary information.
87

88
```python { .api }
89
class BufferWithSegments:
90
    @property
91
    def size(self) -> int:
92
        """Total buffer size in bytes."""
93

94
    def __init__(self, data: bytes, segments: bytes):
95
        """
96
        Create buffer with segment information.
97

98
        Parameters:
99
        - data: bytes, raw buffer data
100
        - segments: bytes, segment boundary information
101
        """
102

103
    def __len__(self) -> int:
104
        """Get number of segments."""
105

106
    def __getitem__(self, i: int) -> BufferSegment:
107
        """
108
        Get segment by index.
109

110
        Parameters:
111
        - i: int, segment index
112

113
        Returns:
114
        BufferSegment: Segment at index
115
        """
116

117
    def segments(self):
118
        """Get segments iterator."""
119

120
    def tobytes(self) -> bytes:
121
        """
122
        Convert entire buffer to bytes.
123

124
        Returns:
125
        bytes: Complete buffer data
126
        """
127
```
128

129
**Usage Example:**
130

131
```python
132
import zstandard as zstd
133

134
# Create buffer with segments manually (advanced usage)
135
data = b"concatenated data from multiple sources"
136
# segments contains boundary information (format is internal)
137
segments = b"..."  # segment boundary data
138

139
buffer = zstd.BufferWithSegments(data, segments)
140

141
print(f"Buffer size: {buffer.size} bytes")
142
print(f"Number of segments: {len(buffer)}")
143

144
# Access segments
145
for i in range(len(buffer)):
146
    segment = buffer[i]
147
    segment_data = segment.tobytes()
148
    print(f"Segment {i}: {len(segment_data)} bytes")
149

150
# Get all data
151
all_data = buffer.tobytes()
152
```
153

154
### Buffer Collections
155

156
Collections of multiple buffers with segments, used for batch operations and efficient data management.
157

158
```python { .api }
159
class BufferWithSegmentsCollection:
160
    def __init__(self, *args):
161
        """
162
        Create collection of buffers with segments.
163

164
        Parameters:
165
        - *args: BufferWithSegments objects
166
        """
167

168
    def __len__(self) -> int:
169
        """Get number of buffers in collection."""
170

171
    def __getitem__(self, i: int) -> BufferSegment:
172
        """
173
        Get segment by global index across all buffers.
174

175
        Parameters:
176
        - i: int, global segment index
177

178
        Returns: 
179
        BufferSegment: Segment at index
180
        """
181

182
    def size(self) -> int:
183
        """
184
        Get total size of all buffers.
185

186
        Returns:
187
        int: Total size in bytes
188
        """
189
```
190

191
**Usage Example:**
192

193
```python
194
import zstandard as zstd
195

196
# Collections are typically returned by multi-threaded operations
197
compressor = zstd.ZstdCompressor()
198
data_items = [b"item1", b"item2", b"item3", b"item4"]
199

200
# Multi-compress returns a collection
201
collection = compressor.multi_compress_to_buffer(data_items, threads=2)
202

203
print(f"Collection size: {collection.size()} bytes")
204
print(f"Number of items: {len(collection)}")
205

206
# Access compressed items
207
for i in range(len(collection)):
208
    segment = collection[i]
209
    compressed_data = segment.tobytes()
210
    print(f"Item {i}: {len(compressed_data)} bytes compressed")
211
```
212

213
### Batch Compression with Buffers
214

215
Efficient batch compression that returns results in buffer collections for optimal memory usage.
216

217
```python { .api }
218
class ZstdCompressor:
219
    def multi_compress_to_buffer(
220
        self,
221
        data,
222
        threads: int = 0
223
    ) -> BufferWithSegmentsCollection:
224
        """
225
        Compress multiple data items to buffer collection.
226

227
        Parameters:
228
        - data: list[bytes], BufferWithSegments, or BufferWithSegmentsCollection
229
        - threads: int, number of threads (0 = auto)
230

231
        Returns:
232
        BufferWithSegmentsCollection: Compressed data in buffer collection
233
        """
234
```
235

236
**Usage Example:**
237

238
```python
239
import zstandard as zstd
240

241
compressor = zstd.ZstdCompressor(level=5)
242

243
# Prepare data for batch compression
244
documents = [
245
    b'{"id": 1, "text": "First document"}',
246
    b'{"id": 2, "text": "Second document"}',
247
    b'{"id": 3, "text": "Third document"}',
248
    b'{"id": 4, "text": "Fourth document"}'
249
]
250

251
# Compress in parallel
252
result = compressor.multi_compress_to_buffer(documents, threads=4)
253

254
# Process results efficiently
255
total_original = sum(len(doc) for doc in documents)
256
total_compressed = result.size()
257

258
print(f"Compressed {total_original} bytes to {total_compressed} bytes")
259
print(f"Compression ratio: {total_original/total_compressed:.2f}:1")
260

261
# Extract individual compressed documents
262
compressed_docs = []
263
for i in range(len(result)):
264
    segment = result[i]
265
    compressed_docs.append(segment.tobytes())
266
```
267

268
### Batch Decompression with Buffers
269

270
Efficient batch decompression using buffer collections for high-throughput processing.
271

272
```python { .api }
273
class ZstdDecompressor:
274
    def multi_decompress_to_buffer(
275
        self,
276
        frames,
277
        decompressed_sizes: bytes = b"",
278
        threads: int = 0
279
    ) -> BufferWithSegmentsCollection:
280
        """
281
        Decompress multiple frames to buffer collection.
282

283
        Parameters:
284
        - frames: list[bytes], BufferWithSegments, or BufferWithSegmentsCollection
285
        - decompressed_sizes: bytes, expected decompressed sizes (optional optimization)
286
        - threads: int, number of threads (0 = auto)
287

288
        Returns:
289
        BufferWithSegmentsCollection: Decompressed data in buffer collection
290
        """
291
```
292

293
**Usage Example:**
294

295
```python
296
import zstandard as zstd
297

298
decompressor = zstd.ZstdDecompressor()
299

300
# Compressed frames from previous example
301
compressed_frames = compressed_docs
302

303
# Decompress in parallel
304
result = decompressor.multi_decompress_to_buffer(compressed_frames, threads=4)
305

306
print(f"Decompressed {len(compressed_frames)} frames")
307
print(f"Total decompressed size: {result.size()} bytes")
308

309
# Extract decompressed data
310
decompressed_docs = []
311
for i in range(len(result)):
312
    segment = result[i]
313
    decompressed_docs.append(segment.tobytes())
314

315
# Verify round-trip
316
for i, (original, decompressed) in enumerate(zip(documents, decompressed_docs)):
317
    assert original == decompressed, f"Mismatch in document {i}"
318
```
319

320
### Zero-Copy Operations
321

322
Advanced usage patterns that minimize memory copying for maximum performance.
323

324
**Usage Example:**
325

326
```python
327
import zstandard as zstd
328

329
def process_large_dataset(data_items):
330
    """Process large dataset with minimal memory copying."""
331
    compressor = zstd.ZstdCompressor(level=3)
332
    
333
    # Compress in batches to manage memory
334
    batch_size = 1000
335
    all_results = []
336
    
337
    for i in range(0, len(data_items), batch_size):
338
        batch = data_items[i:i+batch_size]
339
        
340
        # Multi-compress returns BufferWithSegmentsCollection
341
        compressed_batch = compressor.multi_compress_to_buffer(batch, threads=4)
342
        
343
        # Process segments without copying unless necessary
344
        for j in range(len(compressed_batch)):
345
            segment = compressed_batch[j]
346
            
347
            # Only copy if we need to persist the data
348
            if need_to_store(j):
349
                data = segment.tobytes()
350
                store_data(i + j, data)
351
            else:
352
                # Use segment directly for temporary operations
353
                process_segment_in_place(segment)
354
    
355
    return all_results
356

357
def stream_compress_with_buffers(input_stream, output_stream):
358
    """Stream compression using buffers for efficiency."""
359
    compressor = zstd.ZstdCompressor()
360
    
361
    # Read chunks and compress in batches
362
    chunks = []
363
    chunk_size = 64 * 1024  # 64KB chunks
364
    
365
    while True:
366
        chunk = input_stream.read(chunk_size)
367
        if not chunk:
368
            break
369
            
370
        chunks.append(chunk)
371
        
372
        # Process in batches of 100 chunks
373
        if len(chunks) >= 100:
374
            result = compressor.multi_compress_to_buffer(chunks, threads=2)
375
            
376
            # Write compressed data
377
            for i in range(len(result)):
378
                segment = result[i]
379
                output_stream.write(segment.tobytes())
380
            
381
            chunks = []
382
    
383
    # Process remaining chunks
384
    if chunks:
385
        result = compressor.multi_compress_to_buffer(chunks, threads=2)
386
        for i in range(len(result)):
387
            segment = result[i]
388
            output_stream.write(segment.tobytes())
389
```
390

391
### Memory Management
392

393
Buffer operations provide efficient memory usage patterns for high-performance applications.
394

395
**Memory Usage Example:**
396

397
```python
398
import zstandard as zstd
399

400
def analyze_buffer_memory():
401
    """Analyze memory usage of buffer operations."""
402
    compressor = zstd.ZstdCompressor()
403
    
404
    # Large dataset
405
    data = [b"x" * 1024 for _ in range(1000)]  # 1000 x 1KB items
406
    
407
    print(f"Original data: {sum(len(item) for item in data)} bytes")
408
    print(f"Compressor memory: {compressor.memory_size()} bytes")
409
    
410
    # Compress to buffer collection
411
    result = compressor.multi_compress_to_buffer(data, threads=4)
412
    
413
    print(f"Compressed size: {result.size()} bytes")
414
    print(f"Number of segments: {len(result)}")
415
    
416
    # Efficient iteration without copying
417
    for i, segment in enumerate(result):
418
        # segment.tobytes() copies data - avoid if possible
419
        size = len(segment)  # No copy required
420
        offset = segment.offset  # No copy required
421
        
422
        if i < 5:  # Show first few
423
            print(f"Segment {i}: size={size}, offset={offset}")
424
```
425

426
## Performance Considerations
427

428
- Buffer operations minimize memory copying for better performance
429
- Multi-threaded operations return buffer collections for efficient parallel processing
430
- Segments provide zero-copy access to portions of larger buffers
431
- Use `tobytes()` only when you need a copy of the data
432
- Buffer collections enable efficient batch processing of large datasets
433
- Memory usage is optimized for high-throughput scenarios

Version

Tile

Files

buffer-operations.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

buffer-operations.mddocs/