or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

index.md
tile.json

tessl/pypi-zipp

Backport of pathlib-compatible object wrapper for zip files

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/zipp@3.23.x

To install, run

npx @tessl/cli install tessl/pypi-zipp@3.23.0

index.mddocs/

zipp

A pathlib-compatible zipfile object wrapper that provides an intuitive, Path-like interface for working with ZIP archives. This library serves as the official backport of the standard library Path object for zipfile operations, enabling seamless integration between file system operations and ZIP archive manipulation using familiar pathlib syntax.

Package Information

  • Package Name: zipp
  • Package Type: pypi
  • Language: Python
  • Installation: pip install zipp
  • Python Version: >= 3.9

Core Imports

import zipp

Standard usage:

from zipp import Path

Advanced usage:

from zipp import Path, CompleteDirs, FastLookup
from zipp.glob import Translator

Compatibility functions:

from zipp.compat.py310 import text_encoding

Basic Usage

import zipfile
from zipp import Path

# Create or open a zip file
with zipfile.ZipFile('example.zip', 'w') as zf:
    zf.writestr('data/file1.txt', 'content of file1')
    zf.writestr('data/subdir/file2.txt', 'content of file2')
    zf.writestr('config.json', '{"key": "value"}')

# Use zipp.Path to work with the zip file
zip_path = Path('example.zip')

# Check if paths exist
print(zip_path.exists())  # True
print((zip_path / 'data').exists())  # True
print((zip_path / 'missing.txt').exists())  # False

# Read file contents
config_path = zip_path / 'config.json'
config_content = config_path.read_text()
print(config_content)  # {"key": "value"}

# Iterate through directory contents
data_dir = zip_path / 'data'
for item in data_dir.iterdir():
    print(f"{item.name}: {'directory' if item.is_dir() else 'file'}")

# Use glob patterns to find files
txt_files = list(zip_path.glob('**/*.txt'))
for txt_file in txt_files:
    print(f"Found: {txt_file}")
    content = txt_file.read_text()
    print(f"Content: {content}")

Architecture

zipp implements a layered architecture that extends zipfile functionality:

  • Path: Main user-facing class providing pathlib-compatible interface
  • CompleteDirs: ZipFile subclass that automatically includes implied directories in file listings
  • FastLookup: Performance-optimized subclass with cached name lookups
  • Translator: Glob pattern to regex conversion for file pattern matching

This design ensures that ZIP archives behave consistently with file system paths while maintaining high performance for large archives.

Capabilities

Path Operations

Core pathlib-compatible interface for navigating and manipulating paths within ZIP archives.

class Path:
    def __init__(self, root, at: str = ""):
        """
        Construct a Path from a ZipFile or filename.
        
        Note: When the source is an existing ZipFile object, its type
        (__class__) will be mutated to a specialized type. If the caller
        wishes to retain the original type, create a separate ZipFile
        object or pass a filename.
        
        Args:
            root: ZipFile object or path to zip file
            at (str): Path within the zip file, defaults to root
        """

    def __eq__(self, other) -> bool:
        """
        Test path equality.
        
        Args:
            other: Other object to compare
            
        Returns:
            bool: True if paths are equal, NotImplemented for different types
        """

    def __hash__(self) -> int:
        """Return hash of path for use in sets and dicts."""

    @property
    def name(self) -> str:
        """Name of the path entry (final component)."""

    @property
    def suffix(self) -> str:
        """File suffix (extension including the dot)."""

    @property
    def suffixes(self) -> list[str]:
        """List of all file suffixes."""

    @property  
    def stem(self) -> str:
        """Filename without the final suffix."""

    @property
    def filename(self) -> pathlib.Path:
        """Full filesystem path including zip file path and internal path."""

    @property
    def parent(self) -> "Path":
        """Parent directory path within the ZIP file."""

    def joinpath(self, *other) -> "Path":
        """
        Join path components.
        
        Args:
            *other: Path components to join
            
        Returns:
            Path: New Path object with joined components
        """

    def __truediv__(self, other) -> "Path":
        """
        Path joining using / operator.
        
        Args:
            other: Path component to join
            
        Returns:
            Path: New Path object with joined component
        """

    def relative_to(self, other, *extra) -> str:
        """
        Return relative path from other path.
        
        Args:
            other: Base path for relative calculation
            *extra: Additional path components for base
            
        Returns:
            str: Relative path string
        """

    def __str__(self) -> str:
        """String representation combining zip filename and internal path."""

    def __repr__(self) -> str:
        """Detailed string representation showing class, zip file, and internal path."""

File Operations

Read and write operations for files within ZIP archives.

def open(self, mode: str = 'r', *args, pwd=None, **kwargs):
    """
    Open file for reading or writing following pathlib.Path.open() semantics.
    
    Text mode arguments are passed through to io.TextIOWrapper().
    
    Args:
        mode (str): File mode ('r', 'rb', 'w', 'wb'). Defaults to 'r'.
        pwd (bytes, optional): Password for encrypted ZIP files
        *args: Additional positional arguments for TextIOWrapper (text mode only)
        **kwargs: Additional keyword arguments for TextIOWrapper (text mode only)
        
    Returns:
        IO: File-like object (TextIOWrapper for text mode, raw stream for binary)
        
    Raises:
        IsADirectoryError: If path is a directory
        FileNotFoundError: If file doesn't exist in read mode
        ValueError: If encoding args provided for binary mode
    """

def read_text(self, *args, **kwargs) -> str:
    """
    Read file contents as text with proper encoding handling.
    
    Args:
        *args: Positional arguments for text encoding (encoding, errors, newline)
        **kwargs: Keyword arguments for text processing
        
    Returns:
        str: File contents as decoded text
    """

def read_bytes(self) -> bytes:
    """
    Read file contents as bytes.
    
    Returns:
        bytes: Raw file contents without any encoding
    """

Path Testing

Methods to test path properties and existence.

def exists(self) -> bool:
    """Check if path exists in the zip file."""

def is_file(self) -> bool:
    """Check if path is a file."""

def is_dir(self) -> bool:
    """Check if path is a directory."""

def is_symlink(self) -> bool:
    """Check if path is a symbolic link."""

Directory Operations

Navigate and list directory contents within ZIP archives.

def iterdir(self) -> Iterator["Path"]:
    """
    Iterate over immediate children of this directory.
    
    Returns:
        Iterator[Path]: Path objects for immediate directory contents only
        
    Raises:
        ValueError: If path is not a directory
    """

Pattern Matching

Find files using glob patterns and path matching.

def match(self, path_pattern: str) -> bool:
    """
    Test if path matches the given pattern using pathlib-style matching.
    
    Args:
        path_pattern (str): Pattern to match against (e.g., '*.txt', 'data/*')
        
    Returns:
        bool: True if path matches pattern
    """

def glob(self, pattern: str) -> Iterator["Path"]:
    """
    Find all paths matching a glob pattern starting from this path.
    
    Args:
        pattern (str): Glob pattern to match (e.g., '*.txt', 'data/*.json')
        
    Returns:
        Iterator[Path]: Path objects matching the pattern
        
    Raises:
        ValueError: If pattern is empty or invalid
    """

def rglob(self, pattern: str) -> Iterator["Path"]:
    """
    Recursively find all paths matching a glob pattern.
    
    Equivalent to calling glob(f'**/{pattern}').
    
    Args:
        pattern (str): Glob pattern to match recursively
        
    Returns:
        Iterator[Path]: Path objects matching the pattern recursively
    """

Advanced ZipFile Classes

Enhanced ZipFile subclasses for specialized use cases.

class InitializedState:
    """
    Mix-in to save the initialization state for pickling.
    
    Preserves constructor arguments for proper serialization/deserialization.
    """
    
    def __init__(self, *args, **kwargs):
        """Initialize and save constructor arguments."""
    
    def __getstate__(self):
        """Return state for pickling."""
    
    def __setstate__(self, state):
        """Restore state from pickle."""

class CompleteDirs(InitializedState, zipfile.ZipFile):
    """
    ZipFile subclass that ensures implied directories are included.
    
    Automatically includes parent directories for files in the namelist,
    enabling proper directory traversal even when directories aren't
    explicitly stored in the ZIP file.
    """
    
    @classmethod
    def make(cls, source):
        """
        Create appropriate CompleteDirs subclass from source.
        
        Args:
            source: ZipFile object or filename
            
        Returns:
            CompleteDirs: CompleteDirs or FastLookup instance
        """
    
    @classmethod
    def inject(cls, zf: zipfile.ZipFile) -> zipfile.ZipFile:
        """
        Inject directory entries for implied directories.
        
        Args:
            zf (zipfile.ZipFile): Writable ZipFile to modify
            
        Returns:
            zipfile.ZipFile: Modified zip file with directory entries
        """
    
    def namelist(self) -> list[str]:
        """Return file list including implied directories."""
    
    def resolve_dir(self, name: str) -> str:
        """
        Resolve directory name with proper trailing slash.
        
        Args:
            name (str): Directory name to resolve
            
        Returns:
            str: Directory name with trailing slash if it's a directory
        """
    
    def getinfo(self, name: str) -> zipfile.ZipInfo:
        """
        Get ZipInfo for file, including implied directories.
        
        Args:
            name (str): File or directory name
            
        Returns:
            zipfile.ZipInfo: File information object
            
        Raises:
            KeyError: If file doesn't exist and isn't an implied directory
        """
    
    @staticmethod
    def _implied_dirs(names: list[str]):
        """
        Generate implied parent directories from file list.
        
        Args:
            names (list[str]): List of file names in ZIP
            
        Returns:
            Iterator[str]: Implied directory names with trailing slashes
        """

class FastLookup(CompleteDirs):
    """
    CompleteDirs subclass with cached lookups for performance.
    
    Uses functools.cached_property for efficient repeated access
    to namelist and name set operations.
    """
    
    def namelist(self) -> list[str]:
        """Cached access to file list."""
    
    @property
    def _namelist(self) -> list[str]:
        """Cached property for namelist."""
    
    def _name_set(self) -> set[str]:
        """Cached access to name set."""
    
    @property
    def _name_set_prop(self) -> set[str]:
        """Cached property for name set."""

Pattern Translation

Convert glob patterns to regular expressions for file matching.

class Translator:
    """
    Translate glob patterns to regex patterns for ZIP file path matching.
    
    Handles platform-specific path separators and converts shell-style
    wildcards into regular expressions suitable for matching ZIP entries.
    """
    
    def __init__(self, seps: str = None):
        """
        Initialize translator with path separators.
        
        Args:
            seps (str, optional): Path separator characters. 
                                Defaults to os.sep + os.altsep if available.
                                
        Raises:
            AssertionError: If separators are invalid or empty
        """
    
    def translate(self, pattern: str) -> str:
        """
        Convert glob pattern to regex.
        
        Args:
            pattern (str): Glob pattern to convert (e.g., '*.txt', '**/data/*.json')
            
        Returns:
            str: Regular expression pattern with full match semantics
            
        Raises:
            ValueError: If ** appears incorrectly in pattern (not alone in path segment)
        """
    
    def extend(self, pattern: str) -> str:
        """
        Extend regex for pattern-wide concerns.
        
        Applies non-matching group for newline matching and fullmatch semantics.
        
        Args:
            pattern (str): Base regex pattern
            
        Returns:
            str: Extended regex with (?s:pattern)\\z format
        """
    
    def match_dirs(self, pattern: str) -> str:
        """
        Ensure ZIP directory names are matched.
        
        ZIP directories always end with '/', this makes patterns match
        both with and without trailing slash.
        
        Args:
            pattern (str): Regex pattern
            
        Returns:
            str: Pattern with optional trailing slash
        """
    
    def translate_core(self, pattern: str) -> str:
        """
        Core glob to regex translation logic.
        
        Args:
            pattern (str): Glob pattern
            
        Returns:
            str: Base regex pattern before extension
        """
    
    def replace(self, match) -> str:
        """
        Perform regex replacements for glob wildcards.
        
        Args:
            match: Regex match object from separate()
            
        Returns:
            str: Replacement string for the match
        """
    
    def restrict_rglob(self, pattern: str) -> None:
        """
        Validate ** usage in pattern.
        
        Args:
            pattern (str): Glob pattern to validate
            
        Raises:
            ValueError: If ** appears in partial path segments
        """
    
    def star_not_empty(self, pattern: str) -> str:
        """
        Ensure * will not match empty segments.
        
        Args:
            pattern (str): Glob pattern
            
        Returns:
            str: Modified pattern where * becomes ?*
        """

def separate(pattern: str):
    """
    Separate character sets to avoid translating their contents.
    
    Args:
        pattern (str): Glob pattern with potential character sets
        
    Returns:
        Iterator: Match objects for pattern segments
    """

Usage Examples

Working with Complex Directory Structures

from zipp import Path
import zipfile

# Create a zip with complex structure
with zipfile.ZipFile('project.zip', 'w') as zf:
    zf.writestr('src/main.py', 'print("Hello World")')
    zf.writestr('src/utils/helpers.py', 'def helper(): pass')
    zf.writestr('tests/test_main.py', 'def test_main(): assert True')
    zf.writestr('docs/README.md', '# Project Documentation')
    zf.writestr('config/settings.json', '{"debug": true}')

# Navigate the zip file structure
project = Path('project.zip')

# Find all Python files
python_files = list(project.rglob('*.py'))
print(f"Found {len(python_files)} Python files:")
for py_file in python_files:
    print(f"  {py_file}")

# Read configuration
config = project / 'config' / 'settings.json'
if config.exists():
    settings = config.read_text()
    print(f"Settings: {settings}")

# List directory contents with details
src_dir = project / 'src'
print(f"Contents of {src_dir}:")
for item in src_dir.iterdir():
    item_type = "directory" if item.is_dir() else "file"
    print(f"  {item.name} ({item_type})")

Pattern Matching and Filtering

from zipp import Path

zip_path = Path('archive.zip')

# Find files by extension
text_files = list(zip_path.glob('**/*.txt'))
image_files = list(zip_path.glob('**/*.{jpg,png,gif}'))

# Find files in specific directories
src_files = list(zip_path.glob('src/**/*'))
test_files = list(zip_path.glob('**/test_*.py'))

# Check for specific patterns
has_readme = any(zip_path.glob('**/README*'))
config_files = list(zip_path.glob('**/config.*'))

print(f"Text files: {len(text_files)}")
print(f"Image files: {len(image_files)}")
print(f"Has README: {has_readme}")

Error Handling

from zipp import Path

try:
    zip_path = Path('example.zip')
    
    # Check if file exists before reading
    target_file = zip_path / 'data' / 'important.txt'
    if target_file.exists():
        content = target_file.read_text()
        print(content)
    else:
        print("File not found in archive")
    
    # Handle directory operations
    try:
        directory = zip_path / 'folder'
        with directory.open('r') as f:  # This will raise IsADirectoryError
            content = f.read()
    except IsADirectoryError:
        print("Cannot open directory as file")
        # List directory contents instead
        for item in directory.iterdir():
            print(f"Directory contains: {item.name}")
            
except FileNotFoundError:
    print("Zip file not found")
except Exception as e:
    print(f"Error working with zip file: {e}")

Utility Functions

Low-level utility functions for path manipulation and data processing.

def _parents(path: str):
    """
    Generate all parent paths of the given path.
    
    Args:
        path (str): Path with posixpath.sep-separated elements
        
    Returns:
        Iterator[str]: Parent paths in order from immediate to root
        
    Examples:
        >>> list(_parents('b/d/f/'))
        ['b/d', 'b']
        >>> list(_parents('b'))
        []
    """

def _ancestry(path: str):
    """
    Generate all elements of a path including itself.
    
    Args:
        path (str): Path with posixpath.sep-separated elements
        
    Returns:
        Iterator[str]: Path elements from full path to root
        
    Examples:
        >>> list(_ancestry('b/d/f/'))
        ['b/d/f', 'b/d', 'b']
    """

def _difference(minuend, subtrahend):
    """
    Return items in minuend not in subtrahend, retaining order.
    
    Uses O(1) lookup for efficient filtering of large sequences.
    
    Args:
        minuend: Items to filter from
        subtrahend: Items to exclude
        
    Returns:
        Iterator: Filtered items in original order
    """

def _dedupe(iterable):
    """
    Deduplicate an iterable in original order.
    
    Implemented as dict.fromkeys for efficiency.
    
    Args:
        iterable: Items to deduplicate
        
    Returns:
        dict_keys: Unique items in original order
    """

Compatibility Functions

Cross-version compatibility utilities for different Python versions.

def text_encoding(encoding=None, stacklevel=2):
    """
    Handle text encoding with proper warnings (Python 3.10+ compatibility).
    
    Args:
        encoding (str, optional): Text encoding to use
        stacklevel (int): Stack level for warnings
        
    Returns:
        str: Encoding string to use for text operations
    """

def save_method_args(method):
    """
    Decorator to save method arguments for serialization.
    
    Used by InitializedState mixin for pickle support.
    
    Args:
        method: Method to wrap
        
    Returns:
        function: Wrapped method that saves args/kwargs
    """

Types

class Path:
    """
    A pathlib-compatible interface for zip file paths.
    
    Main user-facing class that provides familiar pathlib.Path-like
    operations for navigating and manipulating ZIP file contents.
    """

class CompleteDirs(InitializedState, zipfile.ZipFile):
    """
    ZipFile subclass ensuring implied directories are included.
    
    Extends zipfile.ZipFile to automatically handle parent directories
    that aren't explicitly stored in ZIP files.
    """

class FastLookup(CompleteDirs):
    """
    Performance-optimized CompleteDirs with cached operations.
    
    Uses functools.cached_property for efficient repeated access
    to file listings and lookups in large ZIP archives.
    """

class Translator:
    """
    Glob pattern to regex translator for ZIP file matching.
    
    Converts shell-style wildcard patterns into regular expressions
    suitable for matching ZIP file paths.
    """

class InitializedState:
    """
    Mixin class for preserving initialization state in pickle operations.
    
    Saves constructor arguments to enable proper serialization and
    deserialization of ZipFile subclasses.
    """

# Exception types that may be raised
IsADirectoryError: Raised when trying to open a directory as a file
FileNotFoundError: Raised when a file doesn't exist
ValueError: Raised for invalid patterns or operations
KeyError: Raised for missing zip file entries (handled internally)
TypeError: Raised when zipfile has no filename for certain operations