Unified pythonic interface for diverse file systems and storage backends
—
Essential file and directory operations that provide the primary interface for interacting with files across all supported storage backends. These functions handle URL parsing, protocol resolution, and file opening with support for compression, encoding, and various access patterns.
Opens single files with automatic protocol detection, compression handling, and encoding support. Returns a file-like object that can be used with context managers.
def open(urlpath, mode='rb', compression=None, encoding='utf8', errors=None, protocol=None, newline=None, expand=None, **kwargs):
"""
Open a file for reading or writing.
Parameters:
- urlpath: str, URL or path to file (supports all registered protocols)
- mode: str, file opening mode ('r', 'w', 'a', 'rb', 'wb', etc.)
- compression: str or None, compression format ('gzip', 'bz2', 'lzma', etc.)
- encoding: str, text encoding for text mode (default 'utf8')
- errors: str or None, error handling mode for text encoding
- protocol: str or None, force specific protocol
- newline: str or None, newline handling for text mode
- expand: bool or None, expand glob patterns in paths
- **kwargs: additional options passed to filesystem
Returns:
OpenFile object (context manager)
"""Usage example:
# Open remote file with compression
with fsspec.open('s3://bucket/data.txt.gz', 'rt', compression='gzip') as f:
content = f.read()
# Open local file
with fsspec.open('/path/to/file.json', 'r') as f:
data = json.load(f)Opens multiple files simultaneously, supporting glob patterns and parallel access. Useful for batch processing of file collections.
def open_files(urlpath, mode='rb', compression=None, encoding='utf8', errors=None, name_function=None, num=1, protocol=None, newline=None, auto_mkdir=True, expand=True, **kwargs):
"""
Open multiple files for reading or writing.
Parameters:
- urlpath: str or list, URL pattern or list of URLs
- mode: str, file opening mode
- compression: str or None, compression format
- encoding: str, text encoding for text mode
- errors: str or None, error handling mode for text encoding
- name_function: callable, function to generate filenames for num > 1
- num: int, number of files to create for write operations
- protocol: str or None, force specific protocol
- newline: str or None, newline handling for text mode
- auto_mkdir: bool, automatically create parent directories
- expand: bool, expand glob patterns in paths
- **kwargs: additional options passed to filesystem
Returns:
List of OpenFile objects
"""Usage example:
# Open multiple files matching pattern
files = fsspec.open_files('s3://bucket/data/*.csv', 'rt')
for f in files:
with f as file:
df = pd.read_csv(file)
# Create multiple output files
outputs = fsspec.open_files('output-*.json', 'w', num=4)Ensures files are available locally, downloading remote files to temporary locations if necessary. Returns the local path for direct access.
def open_local(url, mode='rb', **kwargs):
"""
Open a file ensuring it's available locally.
Parameters:
- url: str, URL or path to file
- mode: str, file opening mode
- **kwargs: additional options passed to filesystem
Returns:
str, local file path
"""Usage example:
# Ensure remote file is available locally
local_path = fsspec.open_local('s3://bucket/model.pkl')
with open(local_path, 'rb') as f:
model = pickle.load(f)Parses URLs to extract the appropriate filesystem instance and normalized path. Core function for protocol resolution and filesystem instantiation.
def url_to_fs(url, **kwargs):
"""
Parse URL and return filesystem instance and path.
Parameters:
- url: str, URL to parse
- **kwargs: storage options passed to filesystem constructor
Returns:
tuple: (AbstractFileSystem instance, str path)
"""Usage example:
# Parse S3 URL
fs, path = fsspec.url_to_fs('s3://bucket/path/file.txt', key='...', secret='...')
files = fs.ls(path.rsplit('/', 1)[0]) # List directory
# Parse HTTP URL
fs, path = fsspec.url_to_fs('https://example.com/data.csv')
content = fs.cat_file(path)Processes multiple URLs and paths, returning a single filesystem instance and list of paths. Optimizes for cases where multiple files share the same storage backend.
def get_fs_token_paths(urls, mode='rb', num=1, name_function=None, **kwargs):
"""
Parse multiple URLs and return filesystem with paths.
Parameters:
- urls: str or list, URLs or paths to process
- mode: str, file opening mode
- num: int, number of files to create for write operations
- name_function: callable, function to generate filenames
- **kwargs: storage options passed to filesystem constructor
Returns:
tuple: (AbstractFileSystem instance, str token, list of paths)
"""Usage example:
# Process multiple S3 files
fs, token, paths = fsspec.get_fs_token_paths([
's3://bucket/file1.txt',
's3://bucket/file2.txt'
], key='...', secret='...')
# Read all files
contents = [fs.cat_file(path) for path in paths]The preferred way to work with fsspec files:
with fsspec.open('protocol://path/file.ext', 'r') as f:
data = f.read()Processing multiple files efficiently:
files = fsspec.open_files('s3://bucket/data/*.parquet')
datasets = []
for f in files:
with f as file:
datasets.append(pd.read_parquet(file))fsspec automatically detects protocols from URLs:
# These all work with the same interface
fsspec.open('file:///local/path.txt') # Local filesystem
fsspec.open('/local/path.txt') # Local filesystem (implicit)
fsspec.open('s3://bucket/file.txt') # Amazon S3
fsspec.open('gcs://bucket/file.txt') # Google Cloud Storage
fsspec.open('https://example.com/api') # HTTPAutomatic compression based on file extensions or explicit specification:
# Auto-detect compression from extension
with fsspec.open('data.csv.gz', 'rt') as f:
content = f.read()
# Explicit compression
with fsspec.open('data.csv', 'rt', compression='gzip') as f:
content = f.read()Install with Tessl CLI
npx tessl i tessl/pypi-fsspec