0
# Filesystem Registry
1
2
Plugin system for registering, discovering, and instantiating filesystem implementations. The registry enables dynamic loading of storage backend drivers and provides centralized access to available protocols through a consistent interface.
3
4
## Capabilities
5
6
### Filesystem Instantiation
7
8
Creates filesystem instances by protocol name with storage-specific options. The primary way to get a filesystem object for direct manipulation.
9
10
```python { .api }
11
def filesystem(protocol, **storage_options):
12
"""
13
Create a filesystem instance for the given protocol.
14
15
Parameters:
16
- protocol: str, protocol name ('s3', 'gcs', 'local', 'http', etc.)
17
- **storage_options: keyword arguments passed to filesystem constructor
18
19
Returns:
20
AbstractFileSystem instance
21
"""
22
```
23
24
Usage example:
25
```python
26
# Create S3 filesystem
27
s3 = fsspec.filesystem('s3', key='ACCESS_KEY', secret='SECRET_KEY')
28
files = s3.ls('bucket-name/')
29
30
# Create local filesystem
31
local = fsspec.filesystem('file')
32
local.mkdir('/tmp/new_directory')
33
34
# Create HTTP filesystem
35
http = fsspec.filesystem('http')
36
content = http.cat('https://example.com/data.json')
37
```
38
39
### Filesystem Class Resolution
40
41
Retrieves the filesystem class without instantiating it. Useful for inspection, subclassing, or custom instantiation patterns.
42
43
```python { .api }
44
def get_filesystem_class(protocol):
45
"""
46
Get the filesystem class for a protocol.
47
48
Parameters:
49
- protocol: str, protocol name
50
51
Returns:
52
type, AbstractFileSystem subclass
53
"""
54
```
55
56
Usage example:
57
```python
58
# Get S3 filesystem class
59
S3FileSystem = fsspec.get_filesystem_class('s3')
60
61
# Check available methods
62
print(dir(S3FileSystem))
63
64
# Custom instantiation
65
s3 = S3FileSystem(key='...', secret='...', client_kwargs={'region_name': 'us-west-2'})
66
```
67
68
### Protocol Registration
69
70
Registers new filesystem implementations, enabling plugin-style extensions. Allows third-party packages to integrate with fsspec's unified interface.
71
72
```python { .api }
73
def register_implementation(name, cls, clobber=False, errtxt=None):
74
"""
75
Register a filesystem implementation.
76
77
Parameters:
78
- name: str, protocol name
79
- cls: str or type, filesystem class or import path
80
- clobber: bool, whether to overwrite existing registrations
81
- errtxt: str, error message for import failures
82
"""
83
```
84
85
Usage example:
86
```python
87
# Register a custom filesystem
88
class MyCustomFS(fsspec.AbstractFileSystem):
89
protocol = 'custom'
90
91
def _open(self, path, mode='rb', **kwargs):
92
# Custom implementation
93
pass
94
95
fsspec.register_implementation('custom', MyCustomFS)
96
97
# Register by import path
98
fsspec.register_implementation(
99
'myprotocol',
100
'mypackage.MyFileSystem',
101
errtxt='Please install mypackage for myprotocol support'
102
)
103
```
104
105
### Available Protocols
106
107
Lists all registered protocols, including both built-in and third-party implementations. Useful for discovering available storage backends.
108
109
```python { .api }
110
def available_protocols():
111
"""
112
List all available protocol names.
113
114
Returns:
115
list of str, protocol names
116
"""
117
```
118
119
Usage example:
120
```python
121
# See all available protocols
122
protocols = fsspec.available_protocols()
123
print(protocols)
124
# ['file', 'local', 's3', 'gcs', 'http', 'https', 'ftp', 'sftp', ...]
125
126
# Check if a protocol is available
127
if 's3' in fsspec.available_protocols():
128
s3 = fsspec.filesystem('s3')
129
```
130
131
## Built-in Protocol Implementations
132
133
### Local Filesystems
134
- **file**, **local**: Local filesystem access
135
- **memory**: In-memory filesystem for testing
136
137
### Cloud Storage
138
- **s3**: Amazon S3 (requires s3fs)
139
- **gcs**, **gs**: Google Cloud Storage (requires gcsfs)
140
- **az**, **abfs**: Azure Blob Storage (requires adlfs)
141
- **adl**: Azure Data Lake Gen1 (requires adlfs)
142
- **oci**: Oracle Cloud Infrastructure (requires ocifs)
143
144
### Network Protocols
145
- **http**, **https**: HTTP/HTTPS access
146
- **ftp**: FTP protocol
147
- **sftp**, **ssh**: SSH/SFTP access (requires paramiko)
148
- **smb**: SMB/CIFS network shares (requires smbprotocol)
149
- **webdav**: WebDAV protocol
150
151
### Archive Formats
152
- **zip**: ZIP archive access
153
- **tar**: TAR archive access
154
- **libarchive**: Multiple archive formats (requires libarchive-c)
155
156
### Specialized
157
- **cached**: Caching wrapper for other filesystems
158
- **reference**: Reference filesystem for Zarr/Kerchunk
159
- **dask**: Integration with Dask distributed computing
160
- **git**: Git repository access (requires pygit2)
161
- **github**: GitHub repository access (requires requests)
162
- **jupyter**: Jupyter server filesystem access
163
164
### Database/Analytics
165
- **hdfs**, **webhdfs**: Hadoop Distributed File System (requires pyarrow)
166
- **arrow_hdfs**: Arrow-based HDFS access (requires pyarrow)
167
168
## Registry Constants and Variables
169
170
```python { .api }
171
registry: dict
172
"""Read-only mapping of protocol names to filesystem classes"""
173
174
known_implementations: dict
175
"""Mapping of protocol names to import specifications"""
176
177
default: str
178
"""Default protocol name ('file')"""
179
```
180
181
Usage example:
182
```python
183
# Inspect registry
184
print(fsspec.registry.keys())
185
186
# Check if protocol is known but not loaded
187
if 's3' in fsspec.known_implementations:
188
# Will trigger import and registration
189
s3_fs = fsspec.filesystem('s3')
190
```
191
192
## Usage Patterns
193
194
### Dynamic Protocol Loading
195
196
fsspec uses lazy loading for optional dependencies:
197
198
```python
199
# These will only import the required package when first used
200
s3 = fsspec.filesystem('s3') # Imports s3fs
201
gcs = fsspec.filesystem('gcs') # Imports gcsfs
202
```
203
204
### Custom Filesystem Integration
205
206
Creating and registering custom filesystems:
207
208
```python
209
import fsspec
210
from fsspec.spec import AbstractFileSystem
211
212
class DatabaseFS(AbstractFileSystem):
213
protocol = 'db'
214
215
def __init__(self, connection_string, **kwargs):
216
super().__init__(**kwargs)
217
self.connection_string = connection_string
218
219
def _open(self, path, mode='rb', **kwargs):
220
# Implement database table/query access
221
pass
222
223
def ls(self, path, detail=True, **kwargs):
224
# List tables/views
225
pass
226
227
# Register the custom filesystem
228
fsspec.register_implementation('db', DatabaseFS)
229
230
# Use it like any other filesystem
231
db = fsspec.filesystem('db', connection_string='postgresql://...')
232
```
233
234
### Protocol Discovery
235
236
Checking available protocols at runtime:
237
238
```python
239
def get_cloud_protocols():
240
"""Get all available cloud storage protocols"""
241
all_protocols = fsspec.available_protocols()
242
cloud_protocols = [p for p in all_protocols
243
if p in ['s3', 'gcs', 'gs', 'az', 'abfs', 'adl', 'oci']]
244
return cloud_protocols
245
```