CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/pypi-fsspec

Unified pythonic interface for diverse file systems and storage backends

Pending
Overview
Eval results
Files

registry.mddocs/

Filesystem Registry

Plugin system for registering, discovering, and instantiating filesystem implementations. The registry enables dynamic loading of storage backend drivers and provides centralized access to available protocols through a consistent interface.

Capabilities

Filesystem Instantiation

Creates filesystem instances by protocol name with storage-specific options. The primary way to get a filesystem object for direct manipulation.

def filesystem(protocol, **storage_options):
    """
    Create a filesystem instance for the given protocol.
    
    Parameters:
    - protocol: str, protocol name ('s3', 'gcs', 'local', 'http', etc.)
    - **storage_options: keyword arguments passed to filesystem constructor
    
    Returns:
    AbstractFileSystem instance
    """

Usage example:

# Create S3 filesystem
s3 = fsspec.filesystem('s3', key='ACCESS_KEY', secret='SECRET_KEY')
files = s3.ls('bucket-name/')

# Create local filesystem  
local = fsspec.filesystem('file')
local.mkdir('/tmp/new_directory')

# Create HTTP filesystem
http = fsspec.filesystem('http')
content = http.cat('https://example.com/data.json')

Filesystem Class Resolution

Retrieves the filesystem class without instantiating it. Useful for inspection, subclassing, or custom instantiation patterns.

def get_filesystem_class(protocol):
    """
    Get the filesystem class for a protocol.
    
    Parameters:
    - protocol: str, protocol name
    
    Returns:
    type, AbstractFileSystem subclass
    """

Usage example:

# Get S3 filesystem class
S3FileSystem = fsspec.get_filesystem_class('s3')

# Check available methods
print(dir(S3FileSystem))

# Custom instantiation
s3 = S3FileSystem(key='...', secret='...', client_kwargs={'region_name': 'us-west-2'})

Protocol Registration

Registers new filesystem implementations, enabling plugin-style extensions. Allows third-party packages to integrate with fsspec's unified interface.

def register_implementation(name, cls, clobber=False, errtxt=None):
    """
    Register a filesystem implementation.
    
    Parameters:
    - name: str, protocol name
    - cls: str or type, filesystem class or import path
    - clobber: bool, whether to overwrite existing registrations
    - errtxt: str, error message for import failures
    """

Usage example:

# Register a custom filesystem
class MyCustomFS(fsspec.AbstractFileSystem):
    protocol = 'custom'
    
    def _open(self, path, mode='rb', **kwargs):
        # Custom implementation
        pass

fsspec.register_implementation('custom', MyCustomFS)

# Register by import path
fsspec.register_implementation(
    'myprotocol',
    'mypackage.MyFileSystem',
    errtxt='Please install mypackage for myprotocol support'
)

Available Protocols

Lists all registered protocols, including both built-in and third-party implementations. Useful for discovering available storage backends.

def available_protocols():
    """
    List all available protocol names.
    
    Returns:
    list of str, protocol names
    """

Usage example:

# See all available protocols
protocols = fsspec.available_protocols()
print(protocols)
# ['file', 'local', 's3', 'gcs', 'http', 'https', 'ftp', 'sftp', ...]

# Check if a protocol is available
if 's3' in fsspec.available_protocols():
    s3 = fsspec.filesystem('s3')

Built-in Protocol Implementations

Local Filesystems

  • file, local: Local filesystem access
  • memory: In-memory filesystem for testing

Cloud Storage

  • s3: Amazon S3 (requires s3fs)
  • gcs, gs: Google Cloud Storage (requires gcsfs)
  • az, abfs: Azure Blob Storage (requires adlfs)
  • adl: Azure Data Lake Gen1 (requires adlfs)
  • oci: Oracle Cloud Infrastructure (requires ocifs)

Network Protocols

  • http, https: HTTP/HTTPS access
  • ftp: FTP protocol
  • sftp, ssh: SSH/SFTP access (requires paramiko)
  • smb: SMB/CIFS network shares (requires smbprotocol)
  • webdav: WebDAV protocol

Archive Formats

  • zip: ZIP archive access
  • tar: TAR archive access
  • libarchive: Multiple archive formats (requires libarchive-c)

Specialized

  • cached: Caching wrapper for other filesystems
  • reference: Reference filesystem for Zarr/Kerchunk
  • dask: Integration with Dask distributed computing
  • git: Git repository access (requires pygit2)
  • github: GitHub repository access (requires requests)
  • jupyter: Jupyter server filesystem access

Database/Analytics

  • hdfs, webhdfs: Hadoop Distributed File System (requires pyarrow)
  • arrow_hdfs: Arrow-based HDFS access (requires pyarrow)

Registry Constants and Variables

registry: dict
    """Read-only mapping of protocol names to filesystem classes"""

known_implementations: dict
    """Mapping of protocol names to import specifications"""

default: str
    """Default protocol name ('file')"""

Usage example:

# Inspect registry
print(fsspec.registry.keys())

# Check if protocol is known but not loaded
if 's3' in fsspec.known_implementations:
    # Will trigger import and registration
    s3_fs = fsspec.filesystem('s3')

Usage Patterns

Dynamic Protocol Loading

fsspec uses lazy loading for optional dependencies:

# These will only import the required package when first used
s3 = fsspec.filesystem('s3')  # Imports s3fs
gcs = fsspec.filesystem('gcs')  # Imports gcsfs

Custom Filesystem Integration

Creating and registering custom filesystems:

import fsspec
from fsspec.spec import AbstractFileSystem

class DatabaseFS(AbstractFileSystem):
    protocol = 'db'
    
    def __init__(self, connection_string, **kwargs):
        super().__init__(**kwargs)
        self.connection_string = connection_string
    
    def _open(self, path, mode='rb', **kwargs):
        # Implement database table/query access
        pass
    
    def ls(self, path, detail=True, **kwargs):
        # List tables/views
        pass

# Register the custom filesystem
fsspec.register_implementation('db', DatabaseFS)

# Use it like any other filesystem
db = fsspec.filesystem('db', connection_string='postgresql://...')

Protocol Discovery

Checking available protocols at runtime:

def get_cloud_protocols():
    """Get all available cloud storage protocols"""
    all_protocols = fsspec.available_protocols()
    cloud_protocols = [p for p in all_protocols 
                      if p in ['s3', 'gcs', 'gs', 'az', 'abfs', 'adl', 'oci']]
    return cloud_protocols

Install with Tessl CLI

npx tessl i tessl/pypi-fsspec

docs

caching.md

callbacks.md

compression.md

core-operations.md

filesystem-interface.md

index.md

mapping.md

registry.md

utilities.md

tile.json