Tessl Tile for pypi/zstandard@0.24.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

advanced-compression.md advanced-decompression.md buffer-operations.md dictionary-compression.md frame-analysis.md index.md simple-operations.md

frame-analysis.mddocs/

0
# Frame Analysis
1

2
Utilities for analyzing zstd frames and extracting metadata without full decompression, enabling efficient frame inspection and validation.
3

4
## Capabilities
5

6
### Frame Content Size
7

8
Extract the original content size from a zstd frame header without decompressing the data.
9

10
```python { .api }
11
def frame_content_size(data: bytes) -> int:
12
    """
13
    Get the original content size from a zstd frame.
14

15
    Parameters:
16
    - data: bytes, zstd frame data (at least frame header)
17

18
    Returns:
19
    int: Original content size in bytes, or special values:
20
         - CONTENTSIZE_UNKNOWN: Content size not stored in frame
21
         - CONTENTSIZE_ERROR: Invalid frame or unable to determine size
22
    """
23
```
24

25
**Usage Example:**
26

27
```python
28
import zstandard as zstd
29

30
# Compressed data with content size in header
31
compressor = zstd.ZstdCompressor(write_content_size=True)
32
original_data = b"Hello, World!" * 1000
33
compressed = compressor.compress(original_data)
34

35
# Get content size without decompressing
36
content_size = zstd.frame_content_size(compressed)
37

38
if content_size == zstd.CONTENTSIZE_UNKNOWN:
39
    print("Content size not stored in frame")
40
elif content_size == zstd.CONTENTSIZE_ERROR:
41
    print("Error reading frame")
42
else:
43
    print(f"Original size: {content_size} bytes")
44
    print(f"Compressed size: {len(compressed)} bytes")
45
    print(f"Compression ratio: {len(original_data)/len(compressed):.2f}:1")
46
```
47

48
### Frame Header Size
49

50
Get the size of a zstd frame header to skip to the compressed payload.
51

52
```python { .api }
53
def frame_header_size(data: bytes) -> int:
54
    """
55
    Get the size of a zstd frame header.
56

57
    Parameters:
58
    - data: bytes, zstd frame data (at least frame header)
59

60
    Returns:
61
    int: Frame header size in bytes
62
    """
63
```
64

65
**Usage Example:**
66

67
```python
68
import zstandard as zstd
69

70
compressed_data = b"..."  # zstd compressed data
71

72
# Get header size
73
header_size = zstd.frame_header_size(compressed_data)
74
print(f"Frame header size: {header_size} bytes")
75

76
# Split header and payload
77
header = compressed_data[:header_size]
78
payload = compressed_data[header_size:]
79

80
print(f"Header: {len(header)} bytes")
81
print(f"Payload: {len(payload)} bytes")
82
```
83

84
### Frame Parameters
85

86
Extract detailed parameters and metadata from a zstd frame header.
87

88
```python { .api }
89
def get_frame_parameters(data: bytes, format: int = FORMAT_ZSTD1) -> FrameParameters:
90
    """
91
    Extract frame parameters from zstd frame header.
92

93
    Parameters:
94
    - data: bytes, zstd frame data (at least frame header)
95
    - format: int, expected frame format (FORMAT_ZSTD1, FORMAT_ZSTD1_MAGICLESS)
96

97
    Returns:
98
    FrameParameters: Object containing frame metadata
99
    """
100

101
class FrameParameters:
102
    """Container for zstd frame parameters and metadata."""
103
    
104
    @property
105
    def content_size(self) -> int:
106
        """Original content size (-1 if unknown)."""
107
    
108
    @property
109
    def window_size(self) -> int:
110
        """Window size used for compression."""
111
    
112
    @property
113
    def dict_id(self) -> int:
114
        """Dictionary ID (0 if no dictionary)."""
115
    
116
    @property
117
    def has_checksum(self) -> bool:
118
        """Whether frame includes content checksum."""
119
```
120

121
**Usage Example:**
122

123
```python
124
import zstandard as zstd
125

126
# Create compressed data with various options
127
compressor = zstd.ZstdCompressor(
128
    level=5,
129
    write_content_size=True,
130
    write_checksum=True,
131
    write_dict_id=True
132
)
133

134
data = b"Sample data for frame analysis"
135
compressed = compressor.compress(data)
136

137
# Analyze frame parameters
138
params = zstd.get_frame_parameters(compressed)
139

140
print(f"Content size: {params.content_size}")
141
print(f"Window size: {params.window_size}")
142
print(f"Dictionary ID: {params.dict_id}")
143
print(f"Has checksum: {params.has_checksum}")
144

145
# Validate expectations
146
assert params.content_size == len(data)
147
assert params.has_checksum == True
148
```
149

150
### Frame Format Detection
151

152
Handle different zstd frame formats including standard and magicless frames.
153

154
**Usage Example:**
155

156
```python
157
import zstandard as zstd
158

159
# Standard frame with magic number
160
standard_compressor = zstd.ZstdCompressor()
161
standard_compressed = standard_compressor.compress(b"Standard frame data")
162

163
# Magicless frame  
164
magicless_params = zstd.ZstdCompressionParameters(format=zstd.FORMAT_ZSTD1_MAGICLESS)
165
magicless_compressor = zstd.ZstdCompressor(compression_params=magicless_params)
166
magicless_compressed = magicless_compressor.compress(b"Magicless frame data")
167

168
# Analyze different formats
169
standard_params = zstd.get_frame_parameters(standard_compressed, zstd.FORMAT_ZSTD1)
170
magicless_params = zstd.get_frame_parameters(magicless_compressed, zstd.FORMAT_ZSTD1_MAGICLESS)
171

172
print("Standard frame:")
173
print(f"  Content size: {standard_params.content_size}")
174
print(f"  Window size: {standard_params.window_size}")
175

176
print("Magicless frame:")
177
print(f"  Content size: {magicless_params.content_size}")
178
print(f"  Window size: {magicless_params.window_size}")
179
```
180

181
### Multi-Frame Analysis
182

183
Analyze compressed data containing multiple zstd frames.
184

185
**Usage Example:**
186

187
```python
188
import zstandard as zstd
189

190
def analyze_multi_frame_data(data: bytes):
191
    """Analyze compressed data that may contain multiple frames."""
192
    frames = []
193
    offset = 0
194
    
195
    while offset < len(data):
196
        try:
197
            # Try to get frame parameters
198
            remaining_data = data[offset:]
199
            params = zstd.get_frame_parameters(remaining_data)
200
            
201
            # Get frame header size
202
            header_size = zstd.frame_header_size(remaining_data)
203
            
204
            # Calculate frame size (header + compressed payload)
205
            # This is simplified - real implementation would need to parse the frame
206
            if params.content_size > 0:
207
                # Estimate compressed size (not exact)
208
                estimated_compressed_size = params.content_size // 4  # rough estimate
209
                frame_size = header_size + estimated_compressed_size
210
            else:
211
                # For unknown content size, would need full frame parsing
212
                break
213
            
214
            frame_info = {
215
                'offset': offset,
216
                'header_size': header_size,
217
                'content_size': params.content_size,
218
                'window_size': params.window_size,
219
                'dict_id': params.dict_id,
220
                'has_checksum': params.has_checksum
221
            }
222
            frames.append(frame_info)
223
            
224
            offset += frame_size
225
            
226
        except Exception as e:
227
            print(f"Error analyzing frame at offset {offset}: {e}")
228
            break
229
    
230
    return frames
231

232
# Example usage
233
compressor = zstd.ZstdCompressor(write_content_size=True)
234
frame1 = compressor.compress(b"First frame data")
235
frame2 = compressor.compress(b"Second frame data")
236
frame3 = compressor.compress(b"Third frame data")
237

238
multi_frame_data = frame1 + frame2 + frame3
239
frames = analyze_multi_frame_data(multi_frame_data)
240

241
for i, frame in enumerate(frames):
242
    print(f"Frame {i+1}:")
243
    print(f"  Offset: {frame['offset']}")
244
    print(f"  Header size: {frame['header_size']}")
245
    print(f"  Content size: {frame['content_size']}")
246
    print(f"  Window size: {frame['window_size']}")
247
```
248

249
### Frame Validation
250

251
Validate frame integrity and format without full decompression.
252

253
**Usage Example:**
254

255
```python
256
import zstandard as zstd
257

258
def validate_frame(data: bytes) -> dict:
259
    """Validate a zstd frame and return analysis results."""
260
    result = {
261
        'valid': False,
262
        'error': None,
263
        'analysis': None
264
    }
265
    
266
    try:
267
        # Check minimum size
268
        if len(data) < 4:
269
            result['error'] = "Data too short for zstd frame"
270
            return result
271
        
272
        # Check magic number
273
        if data[:4] != zstd.FRAME_HEADER:
274
            result['error'] = "Invalid zstd magic number"
275
            return result
276
        
277
        # Get frame parameters
278
        params = zstd.get_frame_parameters(data)
279
        
280
        # Validate parameters
281
        if params.content_size == zstd.CONTENTSIZE_ERROR:
282
            result['error'] = "Error reading frame parameters"
283
            return result
284
        
285
        # Get header size
286
        header_size = zstd.frame_header_size(data)
287
        
288
        if header_size <= 0 or header_size > len(data):
289
            result['error'] = f"Invalid header size: {header_size}"
290
            return result
291
        
292
        result['valid'] = True
293
        result['analysis'] = {
294
            'header_size': header_size,
295
            'content_size': params.content_size,
296
            'window_size': params.window_size,
297
            'dict_id': params.dict_id,
298
            'has_checksum': params.has_checksum,
299
            'total_size': len(data)
300
        }
301
        
302
    except Exception as e:
303
        result['error'] = str(e)
304
    
305
    return result
306

307
# Example usage
308
compressor = zstd.ZstdCompressor(write_checksum=True)
309
valid_data = compressor.compress(b"Valid frame data")
310
invalid_data = b"Invalid frame data"
311

312
# Validate frames
313
valid_result = validate_frame(valid_data)
314
invalid_result = validate_frame(invalid_data)
315

316
print("Valid frame:", valid_result['valid'])
317
if valid_result['valid']:
318
    analysis = valid_result['analysis']
319
    print(f"  Header size: {analysis['header_size']}")
320
    print(f"  Content size: {analysis['content_size']}")
321
    print(f"  Has checksum: {analysis['has_checksum']}")
322

323
print("Invalid frame:", invalid_result['valid'])
324
if not invalid_result['valid']:
325
    print(f"  Error: {invalid_result['error']}")
326
```
327

328
### Decompression Context Estimation
329

330
Estimate memory requirements for decompression without actually decompressing.
331

332
```python { .api }
333
def estimate_decompression_context_size() -> int:
334
    """
335
    Estimate memory usage for decompression context.
336

337
    Returns:
338
    int: Estimated memory usage in bytes
339
    """
340
```
341

342
**Usage Example:**
343

344
```python
345
import zstandard as zstd
346

347
# Estimate memory usage
348
estimated_memory = zstd.estimate_decompression_context_size()
349
print(f"Estimated decompression context size: {estimated_memory} bytes")
350

351
# Use for memory planning
352
def plan_decompression(compressed_frames: list[bytes]) -> dict:
353
    """Plan memory usage for batch decompression."""
354
    base_memory = zstd.estimate_decompression_context_size()
355
    
356
    total_compressed = sum(len(frame) for frame in compressed_frames)
357
    total_content_size = 0
358
    
359
    for frame in compressed_frames:
360
        try:
361
            content_size = zstd.frame_content_size(frame)
362
            if content_size > 0:
363
                total_content_size += content_size
364
        except:
365
            # Estimate if content size unknown
366
            total_content_size += len(frame) * 4  # rough estimate
367
    
368
    return {
369
        'base_memory': base_memory,
370
        'total_compressed': total_compressed,
371
        'estimated_decompressed': total_content_size,
372
        'peak_memory_estimate': base_memory + total_content_size
373
    }
374

375
# Example
376
frames = [compressed1, compressed2, compressed3]
377
plan = plan_decompression(frames)
378
print(f"Peak memory estimate: {plan['peak_memory_estimate']} bytes")
379
```
380

381
## Constants
382

383
Frame analysis uses several constants for special values and format identification:
384

385
```python { .api }
386
# Content size special values
387
CONTENTSIZE_UNKNOWN: int  # Content size not stored in frame
388
CONTENTSIZE_ERROR: int    # Error reading content size
389

390
# Frame format constants
391
FORMAT_ZSTD1: int           # Standard zstd format with magic number
392
FORMAT_ZSTD1_MAGICLESS: int # Zstd format without magic number
393

394
# Frame header magic number
395
FRAME_HEADER: bytes         # b"\x28\xb5\x2f\xfd"
396
MAGIC_NUMBER: int          # Magic number as integer
397
```
398

399
## Performance Notes
400

401
- Frame analysis operations are very fast as they only read headers
402
- No decompression is performed, making these operations suitable for large-scale analysis
403
- Use frame analysis to validate data before attempting decompression
404
- Content size information enables memory pre-allocation for better performance
405
- Frame parameter analysis helps choose appropriate decompression settings

Version

Tile

Files

frame-analysis.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

frame-analysis.mddocs/