0
# zipp
1
2
A pathlib-compatible zipfile object wrapper that provides an intuitive, Path-like interface for working with ZIP archives. This library serves as the official backport of the standard library Path object for zipfile operations, enabling seamless integration between file system operations and ZIP archive manipulation using familiar pathlib syntax.
3
4
## Package Information
5
6
- **Package Name**: zipp
7
- **Package Type**: pypi
8
- **Language**: Python
9
- **Installation**: `pip install zipp`
10
- **Python Version**: >= 3.9
11
12
## Core Imports
13
14
```python
15
import zipp
16
```
17
18
Standard usage:
19
20
```python
21
from zipp import Path
22
```
23
24
Advanced usage:
25
26
```python
27
from zipp import Path, CompleteDirs, FastLookup
28
from zipp.glob import Translator
29
```
30
31
Compatibility functions:
32
33
```python
34
from zipp.compat.py310 import text_encoding
35
```
36
37
## Basic Usage
38
39
```python
40
import zipfile
41
from zipp import Path
42
43
# Create or open a zip file
44
with zipfile.ZipFile('example.zip', 'w') as zf:
45
zf.writestr('data/file1.txt', 'content of file1')
46
zf.writestr('data/subdir/file2.txt', 'content of file2')
47
zf.writestr('config.json', '{"key": "value"}')
48
49
# Use zipp.Path to work with the zip file
50
zip_path = Path('example.zip')
51
52
# Check if paths exist
53
print(zip_path.exists()) # True
54
print((zip_path / 'data').exists()) # True
55
print((zip_path / 'missing.txt').exists()) # False
56
57
# Read file contents
58
config_path = zip_path / 'config.json'
59
config_content = config_path.read_text()
60
print(config_content) # {"key": "value"}
61
62
# Iterate through directory contents
63
data_dir = zip_path / 'data'
64
for item in data_dir.iterdir():
65
print(f"{item.name}: {'directory' if item.is_dir() else 'file'}")
66
67
# Use glob patterns to find files
68
txt_files = list(zip_path.glob('**/*.txt'))
69
for txt_file in txt_files:
70
print(f"Found: {txt_file}")
71
content = txt_file.read_text()
72
print(f"Content: {content}")
73
```
74
75
## Architecture
76
77
zipp implements a layered architecture that extends zipfile functionality:
78
79
- **Path**: Main user-facing class providing pathlib-compatible interface
80
- **CompleteDirs**: ZipFile subclass that automatically includes implied directories in file listings
81
- **FastLookup**: Performance-optimized subclass with cached name lookups
82
- **Translator**: Glob pattern to regex conversion for file pattern matching
83
84
This design ensures that ZIP archives behave consistently with file system paths while maintaining high performance for large archives.
85
86
## Capabilities
87
88
### Path Operations
89
90
Core pathlib-compatible interface for navigating and manipulating paths within ZIP archives.
91
92
```python { .api }
93
class Path:
94
def __init__(self, root, at: str = ""):
95
"""
96
Construct a Path from a ZipFile or filename.
97
98
Note: When the source is an existing ZipFile object, its type
99
(__class__) will be mutated to a specialized type. If the caller
100
wishes to retain the original type, create a separate ZipFile
101
object or pass a filename.
102
103
Args:
104
root: ZipFile object or path to zip file
105
at (str): Path within the zip file, defaults to root
106
"""
107
108
def __eq__(self, other) -> bool:
109
"""
110
Test path equality.
111
112
Args:
113
other: Other object to compare
114
115
Returns:
116
bool: True if paths are equal, NotImplemented for different types
117
"""
118
119
def __hash__(self) -> int:
120
"""Return hash of path for use in sets and dicts."""
121
122
@property
123
def name(self) -> str:
124
"""Name of the path entry (final component)."""
125
126
@property
127
def suffix(self) -> str:
128
"""File suffix (extension including the dot)."""
129
130
@property
131
def suffixes(self) -> list[str]:
132
"""List of all file suffixes."""
133
134
@property
135
def stem(self) -> str:
136
"""Filename without the final suffix."""
137
138
@property
139
def filename(self) -> pathlib.Path:
140
"""Full filesystem path including zip file path and internal path."""
141
142
@property
143
def parent(self) -> "Path":
144
"""Parent directory path within the ZIP file."""
145
146
def joinpath(self, *other) -> "Path":
147
"""
148
Join path components.
149
150
Args:
151
*other: Path components to join
152
153
Returns:
154
Path: New Path object with joined components
155
"""
156
157
def __truediv__(self, other) -> "Path":
158
"""
159
Path joining using / operator.
160
161
Args:
162
other: Path component to join
163
164
Returns:
165
Path: New Path object with joined component
166
"""
167
168
def relative_to(self, other, *extra) -> str:
169
"""
170
Return relative path from other path.
171
172
Args:
173
other: Base path for relative calculation
174
*extra: Additional path components for base
175
176
Returns:
177
str: Relative path string
178
"""
179
180
def __str__(self) -> str:
181
"""String representation combining zip filename and internal path."""
182
183
def __repr__(self) -> str:
184
"""Detailed string representation showing class, zip file, and internal path."""
185
```
186
187
### File Operations
188
189
Read and write operations for files within ZIP archives.
190
191
```python { .api }
192
def open(self, mode: str = 'r', *args, pwd=None, **kwargs):
193
"""
194
Open file for reading or writing following pathlib.Path.open() semantics.
195
196
Text mode arguments are passed through to io.TextIOWrapper().
197
198
Args:
199
mode (str): File mode ('r', 'rb', 'w', 'wb'). Defaults to 'r'.
200
pwd (bytes, optional): Password for encrypted ZIP files
201
*args: Additional positional arguments for TextIOWrapper (text mode only)
202
**kwargs: Additional keyword arguments for TextIOWrapper (text mode only)
203
204
Returns:
205
IO: File-like object (TextIOWrapper for text mode, raw stream for binary)
206
207
Raises:
208
IsADirectoryError: If path is a directory
209
FileNotFoundError: If file doesn't exist in read mode
210
ValueError: If encoding args provided for binary mode
211
"""
212
213
def read_text(self, *args, **kwargs) -> str:
214
"""
215
Read file contents as text with proper encoding handling.
216
217
Args:
218
*args: Positional arguments for text encoding (encoding, errors, newline)
219
**kwargs: Keyword arguments for text processing
220
221
Returns:
222
str: File contents as decoded text
223
"""
224
225
def read_bytes(self) -> bytes:
226
"""
227
Read file contents as bytes.
228
229
Returns:
230
bytes: Raw file contents without any encoding
231
"""
232
```
233
234
### Path Testing
235
236
Methods to test path properties and existence.
237
238
```python { .api }
239
def exists(self) -> bool:
240
"""Check if path exists in the zip file."""
241
242
def is_file(self) -> bool:
243
"""Check if path is a file."""
244
245
def is_dir(self) -> bool:
246
"""Check if path is a directory."""
247
248
def is_symlink(self) -> bool:
249
"""Check if path is a symbolic link."""
250
```
251
252
### Directory Operations
253
254
Navigate and list directory contents within ZIP archives.
255
256
```python { .api }
257
def iterdir(self) -> Iterator["Path"]:
258
"""
259
Iterate over immediate children of this directory.
260
261
Returns:
262
Iterator[Path]: Path objects for immediate directory contents only
263
264
Raises:
265
ValueError: If path is not a directory
266
"""
267
```
268
269
### Pattern Matching
270
271
Find files using glob patterns and path matching.
272
273
```python { .api }
274
def match(self, path_pattern: str) -> bool:
275
"""
276
Test if path matches the given pattern using pathlib-style matching.
277
278
Args:
279
path_pattern (str): Pattern to match against (e.g., '*.txt', 'data/*')
280
281
Returns:
282
bool: True if path matches pattern
283
"""
284
285
def glob(self, pattern: str) -> Iterator["Path"]:
286
"""
287
Find all paths matching a glob pattern starting from this path.
288
289
Args:
290
pattern (str): Glob pattern to match (e.g., '*.txt', 'data/*.json')
291
292
Returns:
293
Iterator[Path]: Path objects matching the pattern
294
295
Raises:
296
ValueError: If pattern is empty or invalid
297
"""
298
299
def rglob(self, pattern: str) -> Iterator["Path"]:
300
"""
301
Recursively find all paths matching a glob pattern.
302
303
Equivalent to calling glob(f'**/{pattern}').
304
305
Args:
306
pattern (str): Glob pattern to match recursively
307
308
Returns:
309
Iterator[Path]: Path objects matching the pattern recursively
310
"""
311
```
312
313
### Advanced ZipFile Classes
314
315
Enhanced ZipFile subclasses for specialized use cases.
316
317
```python { .api }
318
class InitializedState:
319
"""
320
Mix-in to save the initialization state for pickling.
321
322
Preserves constructor arguments for proper serialization/deserialization.
323
"""
324
325
def __init__(self, *args, **kwargs):
326
"""Initialize and save constructor arguments."""
327
328
def __getstate__(self):
329
"""Return state for pickling."""
330
331
def __setstate__(self, state):
332
"""Restore state from pickle."""
333
334
class CompleteDirs(InitializedState, zipfile.ZipFile):
335
"""
336
ZipFile subclass that ensures implied directories are included.
337
338
Automatically includes parent directories for files in the namelist,
339
enabling proper directory traversal even when directories aren't
340
explicitly stored in the ZIP file.
341
"""
342
343
@classmethod
344
def make(cls, source):
345
"""
346
Create appropriate CompleteDirs subclass from source.
347
348
Args:
349
source: ZipFile object or filename
350
351
Returns:
352
CompleteDirs: CompleteDirs or FastLookup instance
353
"""
354
355
@classmethod
356
def inject(cls, zf: zipfile.ZipFile) -> zipfile.ZipFile:
357
"""
358
Inject directory entries for implied directories.
359
360
Args:
361
zf (zipfile.ZipFile): Writable ZipFile to modify
362
363
Returns:
364
zipfile.ZipFile: Modified zip file with directory entries
365
"""
366
367
def namelist(self) -> list[str]:
368
"""Return file list including implied directories."""
369
370
def resolve_dir(self, name: str) -> str:
371
"""
372
Resolve directory name with proper trailing slash.
373
374
Args:
375
name (str): Directory name to resolve
376
377
Returns:
378
str: Directory name with trailing slash if it's a directory
379
"""
380
381
def getinfo(self, name: str) -> zipfile.ZipInfo:
382
"""
383
Get ZipInfo for file, including implied directories.
384
385
Args:
386
name (str): File or directory name
387
388
Returns:
389
zipfile.ZipInfo: File information object
390
391
Raises:
392
KeyError: If file doesn't exist and isn't an implied directory
393
"""
394
395
@staticmethod
396
def _implied_dirs(names: list[str]):
397
"""
398
Generate implied parent directories from file list.
399
400
Args:
401
names (list[str]): List of file names in ZIP
402
403
Returns:
404
Iterator[str]: Implied directory names with trailing slashes
405
"""
406
407
class FastLookup(CompleteDirs):
408
"""
409
CompleteDirs subclass with cached lookups for performance.
410
411
Uses functools.cached_property for efficient repeated access
412
to namelist and name set operations.
413
"""
414
415
def namelist(self) -> list[str]:
416
"""Cached access to file list."""
417
418
@property
419
def _namelist(self) -> list[str]:
420
"""Cached property for namelist."""
421
422
def _name_set(self) -> set[str]:
423
"""Cached access to name set."""
424
425
@property
426
def _name_set_prop(self) -> set[str]:
427
"""Cached property for name set."""
428
```
429
430
### Pattern Translation
431
432
Convert glob patterns to regular expressions for file matching.
433
434
```python { .api }
435
class Translator:
436
"""
437
Translate glob patterns to regex patterns for ZIP file path matching.
438
439
Handles platform-specific path separators and converts shell-style
440
wildcards into regular expressions suitable for matching ZIP entries.
441
"""
442
443
def __init__(self, seps: str = None):
444
"""
445
Initialize translator with path separators.
446
447
Args:
448
seps (str, optional): Path separator characters.
449
Defaults to os.sep + os.altsep if available.
450
451
Raises:
452
AssertionError: If separators are invalid or empty
453
"""
454
455
def translate(self, pattern: str) -> str:
456
"""
457
Convert glob pattern to regex.
458
459
Args:
460
pattern (str): Glob pattern to convert (e.g., '*.txt', '**/data/*.json')
461
462
Returns:
463
str: Regular expression pattern with full match semantics
464
465
Raises:
466
ValueError: If ** appears incorrectly in pattern (not alone in path segment)
467
"""
468
469
def extend(self, pattern: str) -> str:
470
"""
471
Extend regex for pattern-wide concerns.
472
473
Applies non-matching group for newline matching and fullmatch semantics.
474
475
Args:
476
pattern (str): Base regex pattern
477
478
Returns:
479
str: Extended regex with (?s:pattern)\\z format
480
"""
481
482
def match_dirs(self, pattern: str) -> str:
483
"""
484
Ensure ZIP directory names are matched.
485
486
ZIP directories always end with '/', this makes patterns match
487
both with and without trailing slash.
488
489
Args:
490
pattern (str): Regex pattern
491
492
Returns:
493
str: Pattern with optional trailing slash
494
"""
495
496
def translate_core(self, pattern: str) -> str:
497
"""
498
Core glob to regex translation logic.
499
500
Args:
501
pattern (str): Glob pattern
502
503
Returns:
504
str: Base regex pattern before extension
505
"""
506
507
def replace(self, match) -> str:
508
"""
509
Perform regex replacements for glob wildcards.
510
511
Args:
512
match: Regex match object from separate()
513
514
Returns:
515
str: Replacement string for the match
516
"""
517
518
def restrict_rglob(self, pattern: str) -> None:
519
"""
520
Validate ** usage in pattern.
521
522
Args:
523
pattern (str): Glob pattern to validate
524
525
Raises:
526
ValueError: If ** appears in partial path segments
527
"""
528
529
def star_not_empty(self, pattern: str) -> str:
530
"""
531
Ensure * will not match empty segments.
532
533
Args:
534
pattern (str): Glob pattern
535
536
Returns:
537
str: Modified pattern where * becomes ?*
538
"""
539
540
def separate(pattern: str):
541
"""
542
Separate character sets to avoid translating their contents.
543
544
Args:
545
pattern (str): Glob pattern with potential character sets
546
547
Returns:
548
Iterator: Match objects for pattern segments
549
"""
550
```
551
552
## Usage Examples
553
554
### Working with Complex Directory Structures
555
556
```python
557
from zipp import Path
558
import zipfile
559
560
# Create a zip with complex structure
561
with zipfile.ZipFile('project.zip', 'w') as zf:
562
zf.writestr('src/main.py', 'print("Hello World")')
563
zf.writestr('src/utils/helpers.py', 'def helper(): pass')
564
zf.writestr('tests/test_main.py', 'def test_main(): assert True')
565
zf.writestr('docs/README.md', '# Project Documentation')
566
zf.writestr('config/settings.json', '{"debug": true}')
567
568
# Navigate the zip file structure
569
project = Path('project.zip')
570
571
# Find all Python files
572
python_files = list(project.rglob('*.py'))
573
print(f"Found {len(python_files)} Python files:")
574
for py_file in python_files:
575
print(f" {py_file}")
576
577
# Read configuration
578
config = project / 'config' / 'settings.json'
579
if config.exists():
580
settings = config.read_text()
581
print(f"Settings: {settings}")
582
583
# List directory contents with details
584
src_dir = project / 'src'
585
print(f"Contents of {src_dir}:")
586
for item in src_dir.iterdir():
587
item_type = "directory" if item.is_dir() else "file"
588
print(f" {item.name} ({item_type})")
589
```
590
591
### Pattern Matching and Filtering
592
593
```python
594
from zipp import Path
595
596
zip_path = Path('archive.zip')
597
598
# Find files by extension
599
text_files = list(zip_path.glob('**/*.txt'))
600
image_files = list(zip_path.glob('**/*.{jpg,png,gif}'))
601
602
# Find files in specific directories
603
src_files = list(zip_path.glob('src/**/*'))
604
test_files = list(zip_path.glob('**/test_*.py'))
605
606
# Check for specific patterns
607
has_readme = any(zip_path.glob('**/README*'))
608
config_files = list(zip_path.glob('**/config.*'))
609
610
print(f"Text files: {len(text_files)}")
611
print(f"Image files: {len(image_files)}")
612
print(f"Has README: {has_readme}")
613
```
614
615
### Error Handling
616
617
```python
618
from zipp import Path
619
620
try:
621
zip_path = Path('example.zip')
622
623
# Check if file exists before reading
624
target_file = zip_path / 'data' / 'important.txt'
625
if target_file.exists():
626
content = target_file.read_text()
627
print(content)
628
else:
629
print("File not found in archive")
630
631
# Handle directory operations
632
try:
633
directory = zip_path / 'folder'
634
with directory.open('r') as f: # This will raise IsADirectoryError
635
content = f.read()
636
except IsADirectoryError:
637
print("Cannot open directory as file")
638
# List directory contents instead
639
for item in directory.iterdir():
640
print(f"Directory contains: {item.name}")
641
642
except FileNotFoundError:
643
print("Zip file not found")
644
except Exception as e:
645
print(f"Error working with zip file: {e}")
646
```
647
648
### Utility Functions
649
650
Low-level utility functions for path manipulation and data processing.
651
652
```python { .api }
653
def _parents(path: str):
654
"""
655
Generate all parent paths of the given path.
656
657
Args:
658
path (str): Path with posixpath.sep-separated elements
659
660
Returns:
661
Iterator[str]: Parent paths in order from immediate to root
662
663
Examples:
664
>>> list(_parents('b/d/f/'))
665
['b/d', 'b']
666
>>> list(_parents('b'))
667
[]
668
"""
669
670
def _ancestry(path: str):
671
"""
672
Generate all elements of a path including itself.
673
674
Args:
675
path (str): Path with posixpath.sep-separated elements
676
677
Returns:
678
Iterator[str]: Path elements from full path to root
679
680
Examples:
681
>>> list(_ancestry('b/d/f/'))
682
['b/d/f', 'b/d', 'b']
683
"""
684
685
def _difference(minuend, subtrahend):
686
"""
687
Return items in minuend not in subtrahend, retaining order.
688
689
Uses O(1) lookup for efficient filtering of large sequences.
690
691
Args:
692
minuend: Items to filter from
693
subtrahend: Items to exclude
694
695
Returns:
696
Iterator: Filtered items in original order
697
"""
698
699
def _dedupe(iterable):
700
"""
701
Deduplicate an iterable in original order.
702
703
Implemented as dict.fromkeys for efficiency.
704
705
Args:
706
iterable: Items to deduplicate
707
708
Returns:
709
dict_keys: Unique items in original order
710
"""
711
```
712
713
### Compatibility Functions
714
715
Cross-version compatibility utilities for different Python versions.
716
717
```python { .api }
718
def text_encoding(encoding=None, stacklevel=2):
719
"""
720
Handle text encoding with proper warnings (Python 3.10+ compatibility).
721
722
Args:
723
encoding (str, optional): Text encoding to use
724
stacklevel (int): Stack level for warnings
725
726
Returns:
727
str: Encoding string to use for text operations
728
"""
729
730
def save_method_args(method):
731
"""
732
Decorator to save method arguments for serialization.
733
734
Used by InitializedState mixin for pickle support.
735
736
Args:
737
method: Method to wrap
738
739
Returns:
740
function: Wrapped method that saves args/kwargs
741
"""
742
```
743
744
## Types
745
746
```python { .api }
747
class Path:
748
"""
749
A pathlib-compatible interface for zip file paths.
750
751
Main user-facing class that provides familiar pathlib.Path-like
752
operations for navigating and manipulating ZIP file contents.
753
"""
754
755
class CompleteDirs(InitializedState, zipfile.ZipFile):
756
"""
757
ZipFile subclass ensuring implied directories are included.
758
759
Extends zipfile.ZipFile to automatically handle parent directories
760
that aren't explicitly stored in ZIP files.
761
"""
762
763
class FastLookup(CompleteDirs):
764
"""
765
Performance-optimized CompleteDirs with cached operations.
766
767
Uses functools.cached_property for efficient repeated access
768
to file listings and lookups in large ZIP archives.
769
"""
770
771
class Translator:
772
"""
773
Glob pattern to regex translator for ZIP file matching.
774
775
Converts shell-style wildcard patterns into regular expressions
776
suitable for matching ZIP file paths.
777
"""
778
779
class InitializedState:
780
"""
781
Mixin class for preserving initialization state in pickle operations.
782
783
Saves constructor arguments to enable proper serialization and
784
deserialization of ZipFile subclasses.
785
"""
786
787
# Exception types that may be raised
788
IsADirectoryError: Raised when trying to open a directory as a file
789
FileNotFoundError: Raised when a file doesn't exist
790
ValueError: Raised for invalid patterns or operations
791
KeyError: Raised for missing zip file entries (handled internally)
792
TypeError: Raised when zipfile has no filename for certain operations
793
```