0
# Frame Analysis
1
2
Utilities for analyzing zstd frames and extracting metadata without full decompression, enabling efficient frame inspection and validation.
3
4
## Capabilities
5
6
### Frame Content Size
7
8
Extract the original content size from a zstd frame header without decompressing the data.
9
10
```python { .api }
11
def frame_content_size(data: bytes) -> int:
12
"""
13
Get the original content size from a zstd frame.
14
15
Parameters:
16
- data: bytes, zstd frame data (at least frame header)
17
18
Returns:
19
int: Original content size in bytes, or special values:
20
- CONTENTSIZE_UNKNOWN: Content size not stored in frame
21
- CONTENTSIZE_ERROR: Invalid frame or unable to determine size
22
"""
23
```
24
25
**Usage Example:**
26
27
```python
28
import zstandard as zstd
29
30
# Compressed data with content size in header
31
compressor = zstd.ZstdCompressor(write_content_size=True)
32
original_data = b"Hello, World!" * 1000
33
compressed = compressor.compress(original_data)
34
35
# Get content size without decompressing
36
content_size = zstd.frame_content_size(compressed)
37
38
if content_size == zstd.CONTENTSIZE_UNKNOWN:
39
print("Content size not stored in frame")
40
elif content_size == zstd.CONTENTSIZE_ERROR:
41
print("Error reading frame")
42
else:
43
print(f"Original size: {content_size} bytes")
44
print(f"Compressed size: {len(compressed)} bytes")
45
print(f"Compression ratio: {len(original_data)/len(compressed):.2f}:1")
46
```
47
48
### Frame Header Size
49
50
Get the size of a zstd frame header to skip to the compressed payload.
51
52
```python { .api }
53
def frame_header_size(data: bytes) -> int:
54
"""
55
Get the size of a zstd frame header.
56
57
Parameters:
58
- data: bytes, zstd frame data (at least frame header)
59
60
Returns:
61
int: Frame header size in bytes
62
"""
63
```
64
65
**Usage Example:**
66
67
```python
68
import zstandard as zstd
69
70
compressed_data = b"..." # zstd compressed data
71
72
# Get header size
73
header_size = zstd.frame_header_size(compressed_data)
74
print(f"Frame header size: {header_size} bytes")
75
76
# Split header and payload
77
header = compressed_data[:header_size]
78
payload = compressed_data[header_size:]
79
80
print(f"Header: {len(header)} bytes")
81
print(f"Payload: {len(payload)} bytes")
82
```
83
84
### Frame Parameters
85
86
Extract detailed parameters and metadata from a zstd frame header.
87
88
```python { .api }
89
def get_frame_parameters(data: bytes, format: int = FORMAT_ZSTD1) -> FrameParameters:
90
"""
91
Extract frame parameters from zstd frame header.
92
93
Parameters:
94
- data: bytes, zstd frame data (at least frame header)
95
- format: int, expected frame format (FORMAT_ZSTD1, FORMAT_ZSTD1_MAGICLESS)
96
97
Returns:
98
FrameParameters: Object containing frame metadata
99
"""
100
101
class FrameParameters:
102
"""Container for zstd frame parameters and metadata."""
103
104
@property
105
def content_size(self) -> int:
106
"""Original content size (-1 if unknown)."""
107
108
@property
109
def window_size(self) -> int:
110
"""Window size used for compression."""
111
112
@property
113
def dict_id(self) -> int:
114
"""Dictionary ID (0 if no dictionary)."""
115
116
@property
117
def has_checksum(self) -> bool:
118
"""Whether frame includes content checksum."""
119
```
120
121
**Usage Example:**
122
123
```python
124
import zstandard as zstd
125
126
# Create compressed data with various options
127
compressor = zstd.ZstdCompressor(
128
level=5,
129
write_content_size=True,
130
write_checksum=True,
131
write_dict_id=True
132
)
133
134
data = b"Sample data for frame analysis"
135
compressed = compressor.compress(data)
136
137
# Analyze frame parameters
138
params = zstd.get_frame_parameters(compressed)
139
140
print(f"Content size: {params.content_size}")
141
print(f"Window size: {params.window_size}")
142
print(f"Dictionary ID: {params.dict_id}")
143
print(f"Has checksum: {params.has_checksum}")
144
145
# Validate expectations
146
assert params.content_size == len(data)
147
assert params.has_checksum == True
148
```
149
150
### Frame Format Detection
151
152
Handle different zstd frame formats including standard and magicless frames.
153
154
**Usage Example:**
155
156
```python
157
import zstandard as zstd
158
159
# Standard frame with magic number
160
standard_compressor = zstd.ZstdCompressor()
161
standard_compressed = standard_compressor.compress(b"Standard frame data")
162
163
# Magicless frame
164
magicless_params = zstd.ZstdCompressionParameters(format=zstd.FORMAT_ZSTD1_MAGICLESS)
165
magicless_compressor = zstd.ZstdCompressor(compression_params=magicless_params)
166
magicless_compressed = magicless_compressor.compress(b"Magicless frame data")
167
168
# Analyze different formats
169
standard_params = zstd.get_frame_parameters(standard_compressed, zstd.FORMAT_ZSTD1)
170
magicless_params = zstd.get_frame_parameters(magicless_compressed, zstd.FORMAT_ZSTD1_MAGICLESS)
171
172
print("Standard frame:")
173
print(f" Content size: {standard_params.content_size}")
174
print(f" Window size: {standard_params.window_size}")
175
176
print("Magicless frame:")
177
print(f" Content size: {magicless_params.content_size}")
178
print(f" Window size: {magicless_params.window_size}")
179
```
180
181
### Multi-Frame Analysis
182
183
Analyze compressed data containing multiple zstd frames.
184
185
**Usage Example:**
186
187
```python
188
import zstandard as zstd
189
190
def analyze_multi_frame_data(data: bytes):
191
"""Analyze compressed data that may contain multiple frames."""
192
frames = []
193
offset = 0
194
195
while offset < len(data):
196
try:
197
# Try to get frame parameters
198
remaining_data = data[offset:]
199
params = zstd.get_frame_parameters(remaining_data)
200
201
# Get frame header size
202
header_size = zstd.frame_header_size(remaining_data)
203
204
# Calculate frame size (header + compressed payload)
205
# This is simplified - real implementation would need to parse the frame
206
if params.content_size > 0:
207
# Estimate compressed size (not exact)
208
estimated_compressed_size = params.content_size // 4 # rough estimate
209
frame_size = header_size + estimated_compressed_size
210
else:
211
# For unknown content size, would need full frame parsing
212
break
213
214
frame_info = {
215
'offset': offset,
216
'header_size': header_size,
217
'content_size': params.content_size,
218
'window_size': params.window_size,
219
'dict_id': params.dict_id,
220
'has_checksum': params.has_checksum
221
}
222
frames.append(frame_info)
223
224
offset += frame_size
225
226
except Exception as e:
227
print(f"Error analyzing frame at offset {offset}: {e}")
228
break
229
230
return frames
231
232
# Example usage
233
compressor = zstd.ZstdCompressor(write_content_size=True)
234
frame1 = compressor.compress(b"First frame data")
235
frame2 = compressor.compress(b"Second frame data")
236
frame3 = compressor.compress(b"Third frame data")
237
238
multi_frame_data = frame1 + frame2 + frame3
239
frames = analyze_multi_frame_data(multi_frame_data)
240
241
for i, frame in enumerate(frames):
242
print(f"Frame {i+1}:")
243
print(f" Offset: {frame['offset']}")
244
print(f" Header size: {frame['header_size']}")
245
print(f" Content size: {frame['content_size']}")
246
print(f" Window size: {frame['window_size']}")
247
```
248
249
### Frame Validation
250
251
Validate frame integrity and format without full decompression.
252
253
**Usage Example:**
254
255
```python
256
import zstandard as zstd
257
258
def validate_frame(data: bytes) -> dict:
259
"""Validate a zstd frame and return analysis results."""
260
result = {
261
'valid': False,
262
'error': None,
263
'analysis': None
264
}
265
266
try:
267
# Check minimum size
268
if len(data) < 4:
269
result['error'] = "Data too short for zstd frame"
270
return result
271
272
# Check magic number
273
if data[:4] != zstd.FRAME_HEADER:
274
result['error'] = "Invalid zstd magic number"
275
return result
276
277
# Get frame parameters
278
params = zstd.get_frame_parameters(data)
279
280
# Validate parameters
281
if params.content_size == zstd.CONTENTSIZE_ERROR:
282
result['error'] = "Error reading frame parameters"
283
return result
284
285
# Get header size
286
header_size = zstd.frame_header_size(data)
287
288
if header_size <= 0 or header_size > len(data):
289
result['error'] = f"Invalid header size: {header_size}"
290
return result
291
292
result['valid'] = True
293
result['analysis'] = {
294
'header_size': header_size,
295
'content_size': params.content_size,
296
'window_size': params.window_size,
297
'dict_id': params.dict_id,
298
'has_checksum': params.has_checksum,
299
'total_size': len(data)
300
}
301
302
except Exception as e:
303
result['error'] = str(e)
304
305
return result
306
307
# Example usage
308
compressor = zstd.ZstdCompressor(write_checksum=True)
309
valid_data = compressor.compress(b"Valid frame data")
310
invalid_data = b"Invalid frame data"
311
312
# Validate frames
313
valid_result = validate_frame(valid_data)
314
invalid_result = validate_frame(invalid_data)
315
316
print("Valid frame:", valid_result['valid'])
317
if valid_result['valid']:
318
analysis = valid_result['analysis']
319
print(f" Header size: {analysis['header_size']}")
320
print(f" Content size: {analysis['content_size']}")
321
print(f" Has checksum: {analysis['has_checksum']}")
322
323
print("Invalid frame:", invalid_result['valid'])
324
if not invalid_result['valid']:
325
print(f" Error: {invalid_result['error']}")
326
```
327
328
### Decompression Context Estimation
329
330
Estimate memory requirements for decompression without actually decompressing.
331
332
```python { .api }
333
def estimate_decompression_context_size() -> int:
334
"""
335
Estimate memory usage for decompression context.
336
337
Returns:
338
int: Estimated memory usage in bytes
339
"""
340
```
341
342
**Usage Example:**
343
344
```python
345
import zstandard as zstd
346
347
# Estimate memory usage
348
estimated_memory = zstd.estimate_decompression_context_size()
349
print(f"Estimated decompression context size: {estimated_memory} bytes")
350
351
# Use for memory planning
352
def plan_decompression(compressed_frames: list[bytes]) -> dict:
353
"""Plan memory usage for batch decompression."""
354
base_memory = zstd.estimate_decompression_context_size()
355
356
total_compressed = sum(len(frame) for frame in compressed_frames)
357
total_content_size = 0
358
359
for frame in compressed_frames:
360
try:
361
content_size = zstd.frame_content_size(frame)
362
if content_size > 0:
363
total_content_size += content_size
364
except:
365
# Estimate if content size unknown
366
total_content_size += len(frame) * 4 # rough estimate
367
368
return {
369
'base_memory': base_memory,
370
'total_compressed': total_compressed,
371
'estimated_decompressed': total_content_size,
372
'peak_memory_estimate': base_memory + total_content_size
373
}
374
375
# Example
376
frames = [compressed1, compressed2, compressed3]
377
plan = plan_decompression(frames)
378
print(f"Peak memory estimate: {plan['peak_memory_estimate']} bytes")
379
```
380
381
## Constants
382
383
Frame analysis uses several constants for special values and format identification:
384
385
```python { .api }
386
# Content size special values
387
CONTENTSIZE_UNKNOWN: int # Content size not stored in frame
388
CONTENTSIZE_ERROR: int # Error reading content size
389
390
# Frame format constants
391
FORMAT_ZSTD1: int # Standard zstd format with magic number
392
FORMAT_ZSTD1_MAGICLESS: int # Zstd format without magic number
393
394
# Frame header magic number
395
FRAME_HEADER: bytes # b"\x28\xb5\x2f\xfd"
396
MAGIC_NUMBER: int # Magic number as integer
397
```
398
399
## Performance Notes
400
401
- Frame analysis operations are very fast as they only read headers
402
- No decompression is performed, making these operations suitable for large-scale analysis
403
- Use frame analysis to validate data before attempting decompression
404
- Content size information enables memory pre-allocation for better performance
405
- Frame parameter analysis helps choose appropriate decompression settings