0
# fsspec
1
2
A unified, pythonic interface specification for working with diverse file systems and storage backends. fsspec enables developers to write storage-agnostic code that works seamlessly across different backends (local, S3, GCS, Azure, HTTP, SSH, etc.) through a consistent API that abstracts away implementation-specific details.
3
4
## Package Information
5
6
- **Package Name**: fsspec
7
- **Language**: Python
8
- **Installation**: `pip install fsspec`
9
10
## Core Imports
11
12
```python
13
import fsspec
14
```
15
16
For direct filesystem instantiation:
17
18
```python
19
from fsspec import filesystem
20
```
21
22
For file opening and URL parsing:
23
24
```python
25
from fsspec import open, open_files, url_to_fs
26
```
27
28
For mapping interface:
29
30
```python
31
from fsspec import get_mapper, FSMap
32
```
33
34
## Basic Usage
35
36
```python
37
import fsspec
38
39
# Open a file from any supported backend
40
with fsspec.open('s3://bucket/file.txt', 'r') as f:
41
content = f.read()
42
43
# Open local file with same interface
44
with fsspec.open('/path/to/local/file.txt', 'r') as f:
45
content = f.read()
46
47
# Get filesystem instance directly
48
fs = fsspec.filesystem('s3', key='...', secret='...')
49
files = fs.ls('bucket/path/')
50
51
# Dictionary-like interface to filesystem
52
mapper = fsspec.get_mapper('s3://bucket/path/')
53
mapper['key'] = b'data'
54
data = mapper['key']
55
56
# Parse URL to get filesystem and path
57
fs, path = fsspec.url_to_fs('s3://bucket/file.txt')
58
```
59
60
## Architecture
61
62
fsspec uses a plugin-based architecture with consistent interfaces:
63
64
- **AbstractFileSystem**: Base class defining the common interface for all storage backends
65
- **Registry System**: Plugin system for discovering and instantiating filesystem implementations
66
- **Protocol Handlers**: Specific implementations for different storage types (local, S3, HTTP, etc.)
67
- **Caching Layer**: Multiple caching strategies for performance optimization
68
- **Transaction Support**: Atomic operations across filesystem boundaries
69
70
This design enables fsspec to serve as the foundation for major data processing libraries including Dask, pandas, xarray, and Hugging Face Datasets, providing unified storage access across the Python data ecosystem.
71
72
## Capabilities
73
74
### Core File Operations
75
76
Essential file and directory operations including opening files, reading/writing data, listing directories, and managing file metadata. These functions provide the primary interface for interacting with files across all supported storage backends.
77
78
```python { .api }
79
def open(urlpath, mode='rb', compression=None, encoding='utf8', errors=None, protocol=None, newline=None, expand=None, **kwargs): ...
80
def open_files(urlpath, mode='rb', compression=None, encoding='utf8', errors=None, name_function=None, num=1, protocol=None, newline=None, auto_mkdir=True, expand=True, **kwargs): ...
81
def open_local(url, mode='rb', **kwargs): ...
82
def url_to_fs(url, **kwargs): ...
83
def get_fs_token_paths(urls, mode='rb', num=1, name_function=None, **kwargs): ...
84
```
85
86
[Core Operations](./core-operations.md)
87
88
### Filesystem Registry
89
90
Plugin system for registering, discovering, and instantiating filesystem implementations. Enables dynamic loading of storage backend drivers and provides centralized access to available protocols.
91
92
```python { .api }
93
def filesystem(protocol, **storage_options): ...
94
def get_filesystem_class(protocol): ...
95
def register_implementation(name, cls, clobber=False, errtxt=None): ...
96
def available_protocols(): ...
97
```
98
99
[Registry System](./registry.md)
100
101
### AbstractFileSystem Interface
102
103
Complete filesystem abstraction providing consistent methods for file operations, directory management, and metadata access across all storage backends. Serves as the base class for all filesystem implementations.
104
105
```python { .api }
106
class AbstractFileSystem:
107
def open(self, path, mode='rb', **kwargs): ...
108
def ls(self, path, detail=True, **kwargs): ...
109
def info(self, path, **kwargs): ...
110
def exists(self, path, **kwargs): ...
111
def cat_file(self, path, start=None, end=None, **kwargs): ...
112
def pipe_file(self, path, value, **kwargs): ...
113
def copy(self, path1, path2, recursive=False, **kwargs): ...
114
def rm(self, path, recursive=False, maxdepth=None): ...
115
```
116
117
[AbstractFileSystem](./filesystem-interface.md)
118
119
### Dictionary Mapping Interface
120
121
Key-value store interface that presents filesystem paths as dictionary keys, enabling intuitive data access patterns and integration with mapping-based workflows.
122
123
```python { .api }
124
class FSMap:
125
def __getitem__(self, key): ...
126
def __setitem__(self, key, value): ...
127
def __delitem__(self, key): ...
128
def getitems(self, keys, on_error='raise'): ...
129
def setitems(self, d): ...
130
131
def get_mapper(url='', check=False, create=False, **kwargs): ...
132
```
133
134
[Mapping Interface](./mapping.md)
135
136
### Caching System
137
138
Multiple caching strategies for optimizing filesystem access patterns, including memory mapping, block caching, read-ahead caching, and background prefetching for improved performance with remote storage.
139
140
```python { .api }
141
class BaseCache: ...
142
class MMapCache(BaseCache): ...
143
class ReadAheadCache(BaseCache): ...
144
class BlockCache(BaseCache): ...
145
class BytesCache(BaseCache): ...
146
class BackgroundBlockCache(BaseCache): ...
147
```
148
149
[Caching](./caching.md)
150
151
### Progress Callbacks
152
153
Extensible callback system for monitoring file transfer progress, supporting both built-in progress indicators and custom callback implementations for long-running operations.
154
155
```python { .api }
156
class Callback:
157
def __call__(self, size_or_none=None, value_or_none=None): ...
158
def set_size(self, size): ...
159
def relative_update(self, inc=1): ...
160
def branched(self, path_1, path_2, **kwargs): ...
161
162
class TqdmCallback(Callback): ...
163
class DotPrinterCallback(Callback): ...
164
```
165
166
[Callbacks](./callbacks.md)
167
168
### Compression Support
169
170
Automatic compression/decompression support for multiple formats, enabling transparent handling of compressed files across all filesystem backends.
171
172
```python { .api }
173
def register_compression(name, callback, extensions, force=False): ...
174
def available_compressions(): ...
175
```
176
177
[Compression](./compression.md)
178
179
### Utilities and Configuration
180
181
Helper functions for URL parsing, path manipulation, tokenization, and configuration management that support the core filesystem operations.
182
183
```python { .api }
184
def infer_storage_options(urlpath, inherit_storage_options=None): ...
185
def infer_compression(filename): ...
186
def stringify_path(filepath): ...
187
def tokenize(*args, **kwargs): ...
188
def get_protocol(url): ...
189
```
190
191
[Utilities](./utilities.md)
192
193
## Types
194
195
```python { .api }
196
class FSTimeoutError(Exception):
197
"""Timeout in filesystem operations"""
198
199
class BlocksizeMismatchError(ValueError):
200
"""Raised when cached file is reopened with different blocksize than original"""
201
202
class OpenFile:
203
"""File-like object for deferred opening"""
204
def __init__(self, fs, path, mode='rb', compression=None, encoding=None, errors=None, newline=None): ...
205
def __enter__(self): ...
206
def __exit__(self, *args): ...
207
def open(self): ...
208
209
class Transaction:
210
"""Filesystem transaction context"""
211
def __init__(self, fs, **kwargs): ...
212
def __enter__(self): ...
213
def __exit__(self, exc_type, exc_val, exc_tb): ...
214
def start(self): ...
215
def complete(self, commit=True): ...
216
```